
Jonathan co-founded LLM Stats to build independent, reproducible measurement infrastructure for AI models. He leads the platform's benchmark evaluation methodology, arena design, and data pipeline architecture. His work focuses on eliminating bias from AI model evaluation — designing blind voting systems, standardizing benchmark collection across providers, and publishing transparent ranking methodologies that frontier labs and Fortune 500 teams rely on for model selection decisions.
Expertise
Articles (13)
Gemini 3.5 Flash: Benchmarks, Pricing, and Complete Specs
May 19, 2026
What Is a CUDA Kernel? A Visual Explainer
Apr 30, 2026
What Is a Contaminated LLM? Detection, Famous Cases, 2026 Guide
Apr 30, 2026
Is Fine-Tuning Better Than Prompt Engineering in 2026?
Apr 30, 2026
GPT-5.5 vs Claude Opus 4.7: Pricing, Speed, Benchmarks
Apr 23, 2026
GPT-5.5 vs GPT-5.4: Pricing, Speed, Context, Benchmarks
Apr 23, 2026
Claude Opus 4.7 vs Opus 4.6
Apr 17, 2026
Claude Opus 4.7: Benchmarks, Pricing, Context & What's New
Apr 16, 2026
Claude Mythos Preview: Benchmarks, Pricing & Project Glasswing
Apr 7, 2026
How to Calculate Hardware Requirements for Running LLMs Locally
Apr 3, 2026
Post-Training in 2026: GRPO, DAPO, RLVR & Beyond
Mar 11, 2026
Nemotron 3 Super: Pricing, Benchmarks, Architecture & API
Mar 11, 2026
Model Quantization Across Providers
Nov 28, 2024