Agathon: The Investor's Technical Due Diligence Playbook: What Most AI Assessments Actually Miss

Much technical due diligence on AI companies is performed at pace, by people with technical knowledge but without the time or incentive to find the dirty laundry. They report to deal teams who don't have the depth to interrogate their findings, and the result is a rubber stamp dressed up as rigour.

I've seen it from both sides. Deeply technical teams spend weeks preparing detailed presentations on the ins and outs of how their technology works: the architecture decisions, the data pipelines, the model evaluation frameworks. Then a well-meaning, bright, but time-poor technical assessor arrives to evaluate the entire portfolio in a single day. The intricacies of a machine learning pipeline that took months to build get reduced to a throwaway bullet on someone's slide deck, glanced at for a few seconds before moving on. The assessment gets filed, the deal progresses, and six months post-close, the real picture emerges.

This isn't because the people involved are incompetent. It's because the standard technical due diligence frameworks were built for traditional software, and AI introduces a fundamentally different category of risk that those frameworks weren't designed to catch. When your investment thesis rests on a company's AI capabilities, you need a different playbook entirely.

The Problem With How AI Gets Evaluated

The typical technical DD process covers sensible ground: architecture review, codebase quality, infrastructure scalability, security posture, team assessment, IP ownership. These things matter. But when applied to AI companies, they miss the questions that actually determine whether the technology is defensible, valuable, and real.

Here's what I mean. A standard assessment might confirm that the codebase is well-structured, the infrastructure scales, and the team has relevant experience. All green lights. But it won't tell you whether the company's "proprietary AI" is a thin wrapper around OpenAI API calls with a custom prompt, something a competitor could replicate in a weekend. It won't tell you whether the training data was properly licensed, whether the model drifts without constant retraining, or whether the entire value proposition collapses when the next foundation model update changes the underlying capabilities.

These aren't edge cases. They're the questions that determine whether you're investing in a technology company or a marketing story.

A Framework That Actually Helps: The AI Investment Matrix

To cut through the noise, I use a framework that maps two dimensions most DD processes evaluate separately but rarely connect: the strategic impact the AI targets (is it transforming a market, or reshaping one entirely?) against the technical depth of what's actually been built (is it genuinely novel technology, or a clever application of off-the-shelf capabilities?).

This creates four quadrants, each with very different investment implications.

The Category Creators (High Market Impact × Deep Technical Depth)

These companies target revolutionary market change and have built genuinely novel technology to deliver it: proprietary models trained on proprietary data, original architecture, defensible technical moats. This is where the biggest returns live, but also where evaluation is hardest. The team isn't just using AI; they're advancing it.

What to look for: original research output from the team, proprietary training datasets with clear provenance, model performance that demonstrably exceeds what's achievable with publicly available tools, and, crucially, a cost structure and retraining pipeline that can sustain the advantage as the field moves. The risk here is that the moat erodes as foundation models improve.

The Emperor's New Clothes (High Market Impact × Shallow Technical Depth)

This is where investors lose money. The company claims to be revolutionising a market, the pitch deck is compelling, the demo is impressive, but the underlying technology is a wrapper around existing capabilities that a well-resourced competitor could replicate in weeks. This is Builder.ai territory: a UK unicorn backed by Microsoft that claimed AI-automated software development. In reality, 700 human developers were doing the work. The company collapsed in 2025 with $37 million in frozen assets.

Builder.ai isn't an outlier. An MMC Ventures study found that 40% of European startups identifying as AI companies had minimal actual AI integration. The SEC has now charged multiple firms for "AI washing": making misleading claims about AI capabilities to attract investment. This quadrant is where rigorous technical DD pays for itself many times over.

The Quiet Compounders (Moderate Market Impact × Deep Technical Depth)

Often the best risk-adjusted investments in AI. These companies apply genuine technical depth (for example: real ML pipelines, proprietary data assets, sophisticated model architectures) to transform existing workflows rather than create entirely new markets. They're not headline-grabbing, but they're defensible. A competitor can't just spin up an API integration and match them.

What makes these attractive is that the technical depth creates compounding advantages. Their models improve with usage data. Their training pipelines get more efficient. Their domain-specific performance widens the gap with generic alternatives. The key DD question is whether this compounding is real and sustainable, or whether the improvement curve is flattening.

The Commodity Trap (Moderate Market Impact × Shallow Technical Depth)

Companies using off-the-shelf AI to improve existing processes. There's nothing wrong with this as a business as long as it is priced accordingly. The danger is when it's valued as a technology company with a defensible moat. If the core AI capability is an API call that every competitor can make, margins will compress as adoption spreads. Today's differentiator becomes tomorrow's table stakes.

These companies can still be sound investments, but the thesis needs to rest on something other than the AI: distribution advantages, regulatory positioning, brand, network effects. The AI is an accelerant, not the engine.

The Nuance Most People Miss: Wrappers Aren't Disqualifying

Here's where I diverge from the lazy consensus that "wrapper equals bad." A company building on top of existing models isn't automatically a poor investment. Sometimes the smartest technical decision is to prove a concept using off-the-shelf capabilities while focusing engineering effort on the interaction layer: the points where human meets AI.

What I've seen in practice is that the most important innovation often isn't in the model itself but in the interface patterns, the workflow integration, and the feedback loops that make AI genuinely useful rather than merely impressive in a demo. A team that deeply understands how users interact with AI outputs; where they need control, where they need transparency, where they need the system to take initiative, can build something far more valuable than a team with a technically superior model but a clunky user experience.

The critical DD question isn't "did they build their own model?" It's "what happens to their defensibility as the technology matures?" If the value lives in a proprietary data flywheel, where user interactions generate training data that improves the system, which attracts more users, which generates more data, then a wrapper today can become a moat tomorrow. But if the interaction layer is thin and the AI is doing commodity work, there's nothing to compound.

This distinction requires someone who understands both the technology and the product to evaluate. A pure technologist will dismiss the wrapper. A pure business evaluator won't know to ask about the data flywheel. You need both lenses simultaneously.

Five Questions Your DD Process Should Actually Answer

Forget the 85-point checklist. If your technical DD on an AI company doesn't definitively answer these five questions, it hasn't done its job.

What actually happens when the system receives an input? Trace the full architecture from user action to AI output to delivered result. Is there a model running, or is there a human in the loop being obscured? Where does the intelligence actually live? This sounds basic, but I've seen impressive demos that, when you trace the pipeline, turn out to involve significant manual processing disguised as automation.

Where does the training data come from, and what happens when it degrades? Data is the real asset in most AI companies, yet it's consistently the area that receives the least DD scrutiny. Who owns the data? How was it labelled? Is there a sustainable pipeline for new data, or is the company training on a fixed dataset that will become stale? What are the licensing implications if a data source changes terms?

What would it cost to replicate this with off-the-shelf tools? Be honest about this. If a competent team with access to current foundation models and public datasets could rebuild the core capability in three months, the technology isn't the moat and so something else needs to be. That something else might be valid (data network effects, distribution, domain expertise embedded in the product), but you need to know.

What's the cost structure at ten times current scale? AI infrastructure costs don't scale linearly, and they often scale in the wrong direction. Inference costs, retraining compute, data storage, and human oversight all have scaling characteristics that traditional software doesn't. A product that's margin-positive at current volume can become margin-negative at scale if the architecture isn't designed for it.

What happens when the underlying foundation models change? If the product depends on a specific model's capabilities, what's the exposure when that model is deprecated, repriced, or superseded? Companies built on a single provider's API are carrying platform risk that should be priced into the deal. Those with model-agnostic architectures or proprietary models have a fundamentally different risk profile.

What to Look for in the Team

Technical DD typically asks whether the team is "experienced" and "capable." That's not enough for AI companies. The gap between a team that can fine-tune existing models and one that can do original ML research is an order of magnitude in capability and in the value they can create.

You can gauge depth quickly if you know what to listen for. A CTO who genuinely understands their ML stack will talk about failure modes unprompted. They'll distinguish between types of accuracy, precision versus recall, and why the trade-off matters for their specific use case. They'll explain their train/test splits and, critically, go a level deeper: how the data was stratified, how the training set was assembled, what biases that assembly process might have introduced, and what that means for real-world performance on edge cases.

A CTO who's learned the vocabulary but doesn't have the depth will talk about "accuracy" as a single number without qualification. They'll describe their model's performance in ideal conditions but go vague when asked about where it breaks. They'll reference their data without being able to articulate its provenance or limitations. This isn't a character flaw: many excellent technical leaders come from software engineering backgrounds where these questions don't arise. But if the investment thesis depends on AI capabilities, you need to know whether the person leading the technology genuinely understands the machinery or is managing it at arm's length.

The composition matters too. A team heavy on software engineers but light on ML specialists can build a product but may struggle to deepen the technical moat. A team heavy on researchers but light on engineering may have impressive models but can't ship reliable production systems. The best AI companies have both and, ideally, someone who can bridge the gap between them.

The Bridge Between Strategic and Technical

The reason most AI due diligence fails is that the strategic assessment and the technical assessment happen in separate rooms, conducted by people who don't speak each other's language. The deal team evaluates market opportunity, competitive dynamics, and commercial traction. The technical team evaluates architecture, code quality, and infrastructure. Nobody connects the two.

The most important questions live at the intersection. Is the technical depth sufficient to defend the strategic position? Does the market opportunity justify the technical investment required? Will the data flywheel that the commercial model depends on actually materialise given the current architecture? These aren't technology questions or business questions. They're both simultaneously.

Getting this right requires someone who can read the code and read the market. In my experience, that's the rarest and most valuable capability in AI due diligence; and it's the one most deal teams don't think to look for until after the problems surface.

The Investor's Technical Due Diligence Playbook: What Most AI Assessments Actually Miss

The Problem With How AI Gets Evaluated

A Framework That Actually Helps: The AI Investment Matrix

The Category Creators (High Market Impact × Deep Technical Depth)

The Emperor's New Clothes (High Market Impact × Shallow Technical Depth)

The Quiet Compounders (Moderate Market Impact × Deep Technical Depth)

The Commodity Trap (Moderate Market Impact × Shallow Technical Depth)

The Nuance Most People Miss: Wrappers Aren't Disqualifying

Five Questions Your DD Process Should Actually Answer

What to Look for in the Team

The Bridge Between Strategic and Technical

Related services

Read more

Thin wrapper or true AI? Technical due diligence for AI investments

Why Most AI Projects Fail Without Expert AI Consulting

Finding a trusted AI consulting partner: beyond the marketing veneer