FinQA

Paper

Progress Over Time

Interactive timeline showing model performance evolution on FinQA

State-of-the-art frontier
Open
Proprietary

FinQA Leaderboard

3 models
ContextCostLicense
1
Amazon
Amazon
2
Amazon
Amazon
3
Notice missing or incorrect data?
About this benchmark

What is FinQA?

A large-scale dataset for numerical reasoning over financial data with question-answering pairs written by financial experts, featuring complex numerical reasoning and understanding of heterogeneous representations with annotated gold reasoning programs for full explainability

FinQA is a text benchmark evaluating models on math, reasoning, finance, and economics tasks. LLM Stats tracks 3 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.8.

Compare leaders on the best AI for math, best AI for reasoning, best AI for finance and best AI for economics leaderboards.

Current leaders

Nova Pro from Amazon currently leads the FinQA leaderboard with a score of 0.772 across 3 evaluated AI models.

1Nova ProAmazon77.2%
2Nova LiteAmazon73.6%
3Nova MicroAmazon65.2%

Source paper

Title
FinQA: A Dataset of Numerical Reasoning over Financial Data
Authors
Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, and 7 others
Published
Abstract

The sheer volume of financial statements makes it difficult for humans to access and analyze a business's financials. Robust numerical reasoning likewise faces unique challenges in this domain. In this work, we focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. In contrast to existing tasks on general domain, the finance domain includes complex numerical reasoning and understanding of heterogeneous representations. To facilitate analytical progress, we propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. We also annotate the gold reasoning programs to ensure full explainability. We further introduce baselines and conduct comprehensive experiments in our dataset. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge and in complex multi-step numerical reasoning on that knowledge. Our dataset -- the first of its kind -- should therefore enable significant, new community research into complex application domains. The dataset and code are publicly available\url{https://github.com/czyssrs/FinQA}.

FAQ

Common questions about the FinQA benchmark and leaderboard.

What is the FinQA benchmark?

A large-scale dataset for numerical reasoning over financial data with question-answering pairs written by financial experts, featuring complex numerical reasoning and understanding of heterogeneous representations with annotated gold reasoning programs for full explainability

What is the FinQA leaderboard?

The FinQA leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Nova Pro by Amazon leads with a score of 0.772. The average score across all models is 0.720.

What is the highest FinQA score?

The highest FinQA score is 0.772, achieved by Nova Pro from Amazon.

How many models are evaluated on FinQA?

3 models have been evaluated on the FinQA benchmark, with 0 verified results and 3 self-reported results.

Where can I find the FinQA paper?

The FinQA paper is available at https://arxiv.org/abs/2109.00122. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does FinQA cover?

FinQA is categorized under math, reasoning, finance, and economics. The benchmark evaluates text models.

How recent are the FinQA leaderboard results?

The FinQA leaderboard was last updated in July 2026 and currently includes 3 evaluated models.