GPT-4o vs Qwen3.5-27B Comparison

Comparing GPT-4o and Qwen3.5-27B across benchmarks, pricing, and capabilities.

Want to compare interactively?Try the playground

Performance Benchmarks

Comparative analysis across standard metrics

12 benchmarks

GPT-4o outperforms in 1 benchmarks (AI2D), while Qwen3.5-27B is better at 11 benchmarks (CharXiv-R, ERQA, GPQA, Humanity's Last Exam, IFEval, MMLU-Pro, MMMLU, MMMU, MMMU-Pro, SWE-Bench Verified, VideoMMMU).

Qwen3.5-27B significantly outperforms across most benchmarks.

Sat Mar 14 2026 • llm-stats.com

Arena Performance

Human preference votes

Pricing Analysis

Price comparison per million tokens

Cost data unavailable.

Lowest available price from all providers

Sat Mar 14 2026 • llm-stats.com

GPT-4o

Input tokens$2.50

Output tokens$10.00

Best providerAzure

Qwen3.5-27B

Input tokens$0.00

Output tokens$0.00

Best providerUnknown Organization

Notice missing or incorrect data?Start an Issue→

Context Window

Maximum input and output token capacity

Only GPT-4o specifies input context (128,000 tokens). Only GPT-4o specifies output context (16,384 tokens).

GPT-4o

Input128,000 tokens

Output16,384 tokens

Qwen3.5-27B

Input- tokens

Output- tokens

Sat Mar 14 2026 • llm-stats.com

Input Capabilities

Supported data types and modalities

Both GPT-4o and Qwen3.5-27B support multimodal inputs.

They are both capable of processing various types of data, offering versatility in application.

GPT-4o

Text

Images

Audio

Video

Qwen3.5-27B

Text

Images

Audio

Video

License

Usage and distribution terms

GPT-4o is licensed under a proprietary license, while Qwen3.5-27B uses Apache 2.0.

License differences may affect how you can use these models in commercial or open-source projects.

GPT-4o

Proprietary

Closed source

Qwen3.5-27B

Apache 2.0

Open weights

Release Timeline

When each model was launched

GPT-4o was released on 2024-08-06, while Qwen3.5-27B was released on 2026-02-24.

Qwen3.5-27B is 19 months newer than GPT-4o.

GPT-4o

Aug 6, 2024

1.6 years ago

Qwen3.5-27B

Feb 24, 2026

2 weeks ago

1.6yr newer

Knowledge Cutoff

When training data ends

Neither model specifies a knowledge cutoff date.

Unable to compare the recency of their training data.

No cutoff dates available

Outputs Comparison

Notice missing or incorrect data?Start an Issue discussion→

Key Takeaways

GPT-4o

View details

OpenAI

Larger context window (128,000 tokens)

Higher AI2D score (94.2% vs 92.9%)

Qwen3.5-27B

View details

Alibaba Cloud / Qwen Team

Has open weights

Higher CharXiv-R score (79.5% vs 58.8%)

Higher ERQA score (60.5% vs 35.2%)

Higher GPQA score (85.5% vs 70.1%)

Higher Humanity's Last Exam score (48.5% vs 5.3%)

Higher IFEval score (95.0% vs 81.0%)

Higher MMLU-Pro score (86.1% vs 74.7%)

Higher MMMLU score (85.9% vs 81.4%)

Higher MMMU score (82.3% vs 72.2%)

Higher MMMU-Pro score (75.0% vs 59.9%)

Higher SWE-Bench Verified score (72.4% vs 33.2%)

Higher VideoMMMU score (82.3% vs 61.2%)

Detailed Comparison

AI Model Comparison Table
Feature	GPT-4o	Qwen3.5-27B