Which LLM Should You Use?

A practical comparison of frontier language models for business applications. Cut through the marketing and find the right model for your specific needs.

Last updated: June 2026

Model Comparison

Model	Context	API Pricing (in/out)	Consumer Access
Claude Opus 4.8 Anthropic	1M tokens	$5 / $25 per 1M tokens	Claude Pro $20/mo
Claude Sonnet 4.6 Anthropic	1M tokens	$3 / $15 per 1M tokens	Claude Pro $20/mo
Claude Haiku 4.5 Anthropic	200K tokens	$1 / $5 per 1M tokens	Claude Pro $20/mo
GPT-5.5 OpenAI	1M tokens	$5 / $30 per 1M tokens	ChatGPT Plus $20/mo
o3 OpenAI	200K tokens	$2 / $8 per 1M tokens	ChatGPT Plus $20/mo
Gemini 2.5 Pro Google	1M tokens	$1.25 / $10 per 1M tokens	Gemini Advanced $20/mo
Gemini 3.5 Flash Google	1M tokens	$1.50 / $9.00 per 1M tokens	Gemini Advanced $20/mo
Llama 4 Maverick Meta	1M tokens	Self-hosted / ~$0.30–$0.49 per 1M (blended)	Open weights (API varies by provider)
DeepSeek V4 Pro DeepSeek	1M tokens	$0.435 / $0.87 per 1M tokens	API only

Strengths & Limitations

Claude Opus 4.8

Anthropic

Strengths

Best-in-class complex reasoning and agentic tasks
Long-horizon coding with high autonomy (69.2% SWE-bench Pro)
Full 1M context at standard pricing
Leads EQ-Bench Creative Writing (Elo 2216)

Limitations

Higher API cost than Sonnet
Fast Mode adds further cost ($10/$50 per 1M tokens)
No image generation

Claude Sonnet 4.6

Anthropic

Strengths

79.6% SWE-bench Verified — near Opus-level coding at lower cost
Improved instruction-following and tool reliability
Full 1M context at standard pricing
Best speed-to-quality ratio for production workloads

Limitations

Less depth on highly complex multi-step reasoning
No image generation
Slightly less nuanced than Opus on creative tasks

Claude Haiku 4.5

Anthropic

Strengths

Fastest and cheapest Anthropic model
Low latency for real-time and high-volume applications
Strong on classification, extraction, and summarisation
90% cheaper than Opus 4.8 on input tokens

Limitations

Smaller context window (200K vs 1M for Opus/Sonnet)
Less capable on complex multi-step reasoning
Not suited for nuanced creative or deep analytical work

GPT-5.5

OpenAI

Strengths

Strong multimodal capabilities (images, documents, audio)
Large ecosystem of tools and integrations
Broad general knowledge and creative tasks
Batch and Flex processing at 50% discount

Limitations

Input pricing doubles above 272K tokens ($10/M)
GPT-5.5 Pro variant very expensive ($30/$180 per 1M)
Can be verbose on structured or constrained tasks

o3

OpenAI

Strengths

Best-in-class for math, science, coding, and visual reasoning
80% price reduction in early 2026 — now accessible at $2/$8 per 1M
Strong on legal analysis, structured logic, and complex multi-step tasks
200K context with 100K max output and integrated tool use

Limitations

Slower than standard models due to chain-of-thought inference
Not optimised for creative or conversational tasks
Smaller context window than flagship models

Gemini 2.5 Pro

Google

Strengths

Most affordable frontier model at ≤200K context
Native multimodal processing (text, images, video)
Strong research synthesis and document analysis
Deep Think mode for nuanced, multi-step reasoning

Limitations

Input pricing doubles above 200K tokens ($2.50/M)
Less consistent than Claude/GPT on complex coding tasks
Smaller third-party tool ecosystem

Gemini 3.5 Flash

Google

Strengths

Launched Google I/O May 2026 — outperforms Gemini 3.1 Pro on coding and agentic benchmarks
4× faster token output than competing frontier models
Built for agentic workloads: tool calling, subagent orchestration, multi-step workflows
Native multimodal: text, images, video, audio, and PDFs

Limitations

5× more expensive input than the outgoing Gemini 2.5 Flash
Not ideal for long-form nuanced writing or deep analytical work
Fewer third-party integrations than OpenAI

Llama 4 Maverick

Strengths

Open weights for complete data sovereignty
Natively multimodal (128-expert MoE architecture)
Competitive with GPT-4o and Gemini 2.0 Flash on benchmarks
Full fine-tuning and deployment flexibility

Limitations

Requires own infrastructure to deploy at scale
Resource-intensive (400B total parameters)
No official vendor support; commercial licence restrictions apply

DeepSeek V4 Pro

DeepSeek

Strengths

Extraordinary value — ~10× cheaper than Claude Sonnet 4.6
1M token context window with thinking and standard modes
Strong coding and reasoning performance
Open weights available for self-hosted deployment

Limitations

Data stored on China-based servers — significant privacy risk
Not suitable for sensitive, regulated, or enterprise data
Smaller ecosystem and third-party integration support

Use Case Recommendations

Different tasks demand different trade-offs. Here are our recommendations based on common business scenarios.

Use Case	Recommended	Alternatives	Notes
Complex Analysis & Research	Claude Opus 4.8	GPT-5.5Gemini 2.5 Pro	When accuracy and depth matter more than speed or cost
Production Applications	Claude Sonnet 4.6	GPT-5.5DeepSeek V4 Pro	Balance of quality, speed, and cost for real workloads
Long Document Processing	Gemini 2.5 Pro	Claude Opus 4.8Claude Sonnet 4.6	Most cost-effective at ≤200K context; 1M window available
Reasoning, Math & Science	o3	Claude Opus 4.8Gemini 2.5 Pro	Best-in-class on math, science, and coding after 80% price cut in early 2026
Customer Service & Chatbots	Gemini 3.5 Flash	Claude Haiku 4.5DeepSeek V4 Pro	4× faster token output than competing frontier models; handles complex queries with tool use
Budget-Conscious Projects	DeepSeek V4 Pro	Llama 4 MaverickGemini 3.5 Flash	Near-frontier performance at a fraction of the cost
On-Premise / Air-Gapped	Llama 4 Maverick	DeepSeek V4 Pro (self-hosted)	When data cannot leave your infrastructure
Creative Writing	Claude Opus 4.8	GPT-5.5Claude Sonnet 4.6	Leads EQ-Bench Creative Writing leaderboard (Elo 2216)
Code Generation	Claude Sonnet 4.6	Claude Opus 4.8DeepSeek V4 Pro	79.6% SWE-bench at lower cost than Opus; fast iteration cycle
Multimodal (Images/Documents)	GPT-5.5	Gemini 2.5 ProLlama 4 Maverick	Native multimodal understanding across formats and file types

The Model is Only Part of the Equation

Choosing the right LLM matters, but how you architect your system, design your prompts, and integrate AI into your workflows determines success. We help organisations move from model selection to production deployment.

Discuss Your AI Project

* Pricing reflects June 2026 rates and may change. Check provider websites for current pricing.

* Model capabilities and context windows are based on publicly available documentation.

* Recommendations reflect our experience across client engagements. Your specific requirements may differ.

Get AI insights in your inbox

Practical analysis on AI strategy, products, and technical leadership

No more than one newsletter a month