RealKIE-FCC

Name: RealKIE-FCC Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Paper

Progress Over Time

Interactive timeline showing model performance evolution on RealKIE-FCC

State-of-the-art frontier

Open

Proprietary

RealKIE-FCC Leaderboard

3 models

			Context	Cost
1	Nova 2 Pro Amazon	—	—	—
2	Nova 2 Lite Amazon	—	1.0M	$0.30 / $2.50
3	Nova 2 Omni Amazon	—	—	—

Notice missing or incorrect data?

About this benchmark

What is RealKIE-FCC?

RealKIE-FCC is a key information extraction benchmark drawn from real enterprise documents (FCC filings), part of the RealKIE suite of five novel datasets for enterprise key information extraction. Models must convert documents to markdown and extract structured fields against a specified JSON schema. Nova 2 reports results on a human-verified version of the dataset.

RealKIE-FCC is a multimodal benchmark evaluating models on multimodal, document understanding, and vision tasks. LLM Stats tracks 3 models on this benchmark, scored on a 0–1 scale. The current average is 0.6, with the leader at 0.7.

Compare leaders on the best AI for multimodal, best AI for document understanding and best AI for vision leaderboards.

Current leaders

Nova 2 Pro from Amazon currently leads the RealKIE-FCC leaderboard with a score of 0.670 across 3 evaluated AI models.

Nova 2 ProAmazon67.0%

Nova 2 LiteAmazon62.1%

Nova 2 OmniAmazon59.8%

Source paper

Title: RealKIE: Five Novel Datasets for Enterprise Key Information Extraction
Authors: Benjamin Townsend, Madison May, Katherine Mackowiak, Christopher Wells
Published: March 29, 2024
arXiv: 2403.20101

Abstract

We introduce RealKIE, a benchmark of five challenging datasets aimed at advancing key information extraction methods, with an emphasis on enterprise applications. The datasets include a diverse range of documents including SEC S1 Filings, US Non-disclosure Agreements, UK Charity Reports, FCC Invoices, and Resource Contracts. Each presents unique challenges: poor text serialization, sparse annotations in long documents, and complex tabular layouts. These datasets provide a realistic testing ground for key information extraction tasks like investment analysis and contract analysis. In addition to presenting these datasets, we offer an in-depth description of the annotation process, document processing techniques, and baseline modeling approaches. This contribution facilitates the development of NLP models capable of handling practical challenges and supports further research into information extraction technologies applicable to industry-specific problems. The annotated data, OCR outputs, and code to reproduce baselines are available to download at https://indicodatasolutions.github.io/RealKIE/.

FAQ

Common questions about the RealKIE-FCC benchmark and leaderboard.

What is the RealKIE-FCC benchmark?

What is the RealKIE-FCC leaderboard?

The RealKIE-FCC leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Nova 2 Pro by Amazon leads with a score of 0.670. The average score across all models is 0.630.

What is the highest RealKIE-FCC score?

The highest RealKIE-FCC score is 0.670, achieved by Nova 2 Pro from Amazon.

How many models are evaluated on RealKIE-FCC?

3 models have been evaluated on the RealKIE-FCC benchmark, with 0 verified results and 3 self-reported results.

Where can I find the RealKIE-FCC paper?

The RealKIE-FCC paper is available at https://arxiv.org/abs/2403.20101. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does RealKIE-FCC cover?

RealKIE-FCC is categorized under multimodal, document understanding, and vision. The benchmark evaluates multimodal models.

Which model offers the best value on RealKIE-FCC?

Among models scoring within 10% of the leader, Nova 2 Lite from Amazon is the cheapest, at $0.30 per million input tokens with a score of 0.621.

How recent are the RealKIE-FCC leaderboard results?

The RealKIE-FCC leaderboard was last updated in July 2026 and currently includes 3 evaluated models.