Which LLM Should You Use?

A practical comparison of frontier language models for business applications. Cut through the marketing and find the right model for your specific needs.

Last updated: June 2026

Model Comparison

ModelContextAPI Pricing (in/out)Consumer Access
Claude Opus 4.8
Anthropic
1M tokens$5 / $25 per 1M tokensClaude Pro $20/mo
Claude Sonnet 4.6
Anthropic
1M tokens$3 / $15 per 1M tokensClaude Pro $20/mo
Claude Haiku 4.5
Anthropic
200K tokens$1 / $5 per 1M tokensClaude Pro $20/mo
GPT-5.5
OpenAI
1M tokens$5 / $30 per 1M tokensChatGPT Plus $20/mo
o3
OpenAI
200K tokens$2 / $8 per 1M tokensChatGPT Plus $20/mo
Gemini 2.5 Pro
Google
1M tokens$1.25 / $10 per 1M tokensGemini Advanced $20/mo
Gemini 3.5 Flash
Google
1M tokens$1.50 / $9.00 per 1M tokensGemini Advanced $20/mo
Llama 4 Maverick
Meta
1M tokensSelf-hosted / ~$0.30–$0.49 per 1M (blended)Open weights (API varies by provider)
DeepSeek V4 Pro
DeepSeek
1M tokens$0.435 / $0.87 per 1M tokensAPI only

Strengths & Limitations

Claude Opus 4.8

Anthropic

Strengths

  • Best-in-class complex reasoning and agentic tasks
  • Long-horizon coding with high autonomy (69.2% SWE-bench Pro)
  • Full 1M context at standard pricing
  • Leads EQ-Bench Creative Writing (Elo 2216)

Limitations

  • Higher API cost than Sonnet
  • Fast Mode adds further cost ($10/$50 per 1M tokens)
  • No image generation

Claude Sonnet 4.6

Anthropic

Strengths

  • 79.6% SWE-bench Verified — near Opus-level coding at lower cost
  • Improved instruction-following and tool reliability
  • Full 1M context at standard pricing
  • Best speed-to-quality ratio for production workloads

Limitations

  • Less depth on highly complex multi-step reasoning
  • No image generation
  • Slightly less nuanced than Opus on creative tasks

Claude Haiku 4.5

Anthropic

Strengths

  • Fastest and cheapest Anthropic model
  • Low latency for real-time and high-volume applications
  • Strong on classification, extraction, and summarisation
  • 90% cheaper than Opus 4.8 on input tokens

Limitations

  • Smaller context window (200K vs 1M for Opus/Sonnet)
  • Less capable on complex multi-step reasoning
  • Not suited for nuanced creative or deep analytical work

GPT-5.5

OpenAI

Strengths

  • Strong multimodal capabilities (images, documents, audio)
  • Large ecosystem of tools and integrations
  • Broad general knowledge and creative tasks
  • Batch and Flex processing at 50% discount

Limitations

  • Input pricing doubles above 272K tokens ($10/M)
  • GPT-5.5 Pro variant very expensive ($30/$180 per 1M)
  • Can be verbose on structured or constrained tasks

o3

OpenAI

Strengths

  • Best-in-class for math, science, coding, and visual reasoning
  • 80% price reduction in early 2026 — now accessible at $2/$8 per 1M
  • Strong on legal analysis, structured logic, and complex multi-step tasks
  • 200K context with 100K max output and integrated tool use

Limitations

  • Slower than standard models due to chain-of-thought inference
  • Not optimised for creative or conversational tasks
  • Smaller context window than flagship models

Gemini 2.5 Pro

Google

Strengths

  • Most affordable frontier model at ≤200K context
  • Native multimodal processing (text, images, video)
  • Strong research synthesis and document analysis
  • Deep Think mode for nuanced, multi-step reasoning

Limitations

  • Input pricing doubles above 200K tokens ($2.50/M)
  • Less consistent than Claude/GPT on complex coding tasks
  • Smaller third-party tool ecosystem

Gemini 3.5 Flash

Google

Strengths

  • Launched Google I/O May 2026 — outperforms Gemini 3.1 Pro on coding and agentic benchmarks
  • 4× faster token output than competing frontier models
  • Built for agentic workloads: tool calling, subagent orchestration, multi-step workflows
  • Native multimodal: text, images, video, audio, and PDFs

Limitations

  • 5× more expensive input than the outgoing Gemini 2.5 Flash
  • Not ideal for long-form nuanced writing or deep analytical work
  • Fewer third-party integrations than OpenAI

Llama 4 Maverick

Meta

Strengths

  • Open weights for complete data sovereignty
  • Natively multimodal (128-expert MoE architecture)
  • Competitive with GPT-4o and Gemini 2.0 Flash on benchmarks
  • Full fine-tuning and deployment flexibility

Limitations

  • Requires own infrastructure to deploy at scale
  • Resource-intensive (400B total parameters)
  • No official vendor support; commercial licence restrictions apply

DeepSeek V4 Pro

DeepSeek

Strengths

  • Extraordinary value — ~10× cheaper than Claude Sonnet 4.6
  • 1M token context window with thinking and standard modes
  • Strong coding and reasoning performance
  • Open weights available for self-hosted deployment

Limitations

  • Data stored on China-based servers — significant privacy risk
  • Not suitable for sensitive, regulated, or enterprise data
  • Smaller ecosystem and third-party integration support

Use Case Recommendations

Different tasks demand different trade-offs. Here are our recommendations based on common business scenarios.

Use CaseRecommendedAlternativesNotes
Complex Analysis & ResearchClaude Opus 4.8
GPT-5.5Gemini 2.5 Pro
When accuracy and depth matter more than speed or cost
Production ApplicationsClaude Sonnet 4.6
GPT-5.5DeepSeek V4 Pro
Balance of quality, speed, and cost for real workloads
Long Document ProcessingGemini 2.5 Pro
Claude Opus 4.8Claude Sonnet 4.6
Most cost-effective at ≤200K context; 1M window available
Reasoning, Math & Scienceo3
Claude Opus 4.8Gemini 2.5 Pro
Best-in-class on math, science, and coding after 80% price cut in early 2026
Customer Service & ChatbotsGemini 3.5 Flash
Claude Haiku 4.5DeepSeek V4 Pro
4× faster token output than competing frontier models; handles complex queries with tool use
Budget-Conscious ProjectsDeepSeek V4 Pro
Llama 4 MaverickGemini 3.5 Flash
Near-frontier performance at a fraction of the cost
On-Premise / Air-GappedLlama 4 Maverick
DeepSeek V4 Pro (self-hosted)
When data cannot leave your infrastructure
Creative WritingClaude Opus 4.8
GPT-5.5Claude Sonnet 4.6
Leads EQ-Bench Creative Writing leaderboard (Elo 2216)
Code GenerationClaude Sonnet 4.6
Claude Opus 4.8DeepSeek V4 Pro
79.6% SWE-bench at lower cost than Opus; fast iteration cycle
Multimodal (Images/Documents)GPT-5.5
Gemini 2.5 ProLlama 4 Maverick
Native multimodal understanding across formats and file types

The Model is Only Part of the Equation

Choosing the right LLM matters, but how you architect your system, design your prompts, and integrate AI into your workflows determines success. We help organisations move from model selection to production deployment.

Discuss Your AI Project

* Pricing reflects June 2026 rates and may change. Check provider websites for current pricing.

* Model capabilities and context windows are based on publicly available documentation.

* Recommendations reflect our experience across client engagements. Your specific requirements may differ.

Get AI insights in your inbox
Practical analysis on AI strategy, products, and technical leadership
No more than one newsletter a month