Guide

Is the target's data moat real? AI diligence before you sign

Whether an AI target's data moat is real, illusory, or decaying, and what each finding does to price and deal structure before you sign.

You are pricing a deal where the target's value rests on a data moat, and no one on your team can tell you whether that moat is real. Finance and legal have done their part. The question they cannot answer is whether the data the target keeps calling proprietary would actually stop a competitor, or whether it is accumulated storage with a confident story attached. That question moves both price and structure, so it is worth resolving before you sign.

Here is the position this page takes. Most data moats presented in a deal room are weaker than the pitch, and an honest technical read supports a reprice more often than a walk. The work is to find the small number of cases where the moat is genuine, and to price everything else as the commodity capability it is.

What a moat actually means here, and what it does not

A moat is a structural reason a competitor cannot easily copy what the target does. In an AI business, the only version worth a price premium is the layer a rival cannot buy off the shelf, fine-tune their way to, or hire away inside a release cycle or two. The model almost never qualifies. It is an API call available to the target and to every competitor. So when a target points at "our AI" as the defensible asset, they are usually pointing at the one layer that is not defensible at all.

The data can qualify. But volume alone does not make it qualify. A target will hand you a data-room folder showing row counts, terabytes, years of history, and present that size as defensibility. Treat that the way you would treat a capability audit that only validates what already exists. The inventory can confirm the data is large. It cannot tell you whether the data is a moat, because the size column is the one number that does not answer the question.

Volume is not a moat. It is a storage bill with a good story attached.

The fastest diagnostic: ask why, and watch the pause

Get into a room with the target's domain expert, the person who actually built or curates the data, and ask why a competitor could not rebuild this asset from scratch. Then watch how the answer arrives.

If the answer is instant and crisp ("we have eight years of transactions," "we scraped the public filings"), the moat is codifiable and almost certainly reproducible. Any well-funded incumbent in the sector can assemble the same thing. That is a reprice signal: you are looking at a commodity dressed as an advantage.

If instead they pause, if they start reaching, if the answer turns into a judgement call about why the data is labelled the way it is and why a naive competitor would label it wrong, that hesitation is the asset. The value that is hard to articulate is hard precisely because it is tacit and accumulated, which is exactly what no competitor has and no model can guess. The pause is a physical, in-the-room signal. Front-load it in your technical sessions, because it tells you more in thirty seconds than the data inventory tells you in a week.

Ask the target's domain expert why a competitor could not rebuild this, and watch how the answer arrives.

Where the moat lives: it runs through the system, not around it

The common framing error is treating "do they have a data moat" as a yes or no about the whole company. The real question is asked layer by layer: where in this pipeline does proprietary value actually sit?

In most AI targets the model is bought, the generic data is commodity, and the defensible layer is something narrower. It might be the labelling pipeline, where domain experts encode judgement no off-the-shelf annotator could replicate. It might be the feedback loop, where real user outcomes are captured and fed back with instrumentation that took years to build. It might be how the target defines relevance for its specific domain, the rules for what counts as a good match that a generalist would get subtly wrong. Find that layer. If it exists and a competitor cannot buy it, the moat is real. If every layer turns out to be rented or reproducible, the moat is a label on a commodity.

This is also where targets misdirect, often without meaning to. Build attention flows to the visible, exciting layer, the model and the UI, and starves the unglamorous layer where defensibility actually lives. A target that has poured its engineering into the model while treating data governance and labelling as plumbing has usually built a product that is pleasant to use and trivial to replicate.

What a data moat that survives diligence looks like

A genuine one looks like this: years of outcome-labelled records in a narrow domain, where each label carries expert judgement that does not reduce to a written rule, and where the people who produced those labels are still in the building. An illusory one looks like a large customer transaction history that every competitor in the sector also holds. The first resists reproduction. The second is volume. The size of the two datasets tells you nothing about which is which.

The five moats you'll be pitched -- and which hold up

Competitors and pitch decks converge on five: a proprietary dataset, a feedback loop, regulatory or longitudinal data, integration and switching cost, and a model-performance lead. Under diligence, three of those usually collapse. A proprietary dataset is often shared sector data. A model-performance lead is the most perishable claim on the list. Integration lock-in is a business moat, not a data one, and should be priced separately. The two that hold up are tacit-labelled data and a feedback loop with real instrumentation behind it. When a target claims all five, assume they have confused volume with defensibility and price accordingly.

The foundation-model inflection point

The sharpest test for an AI target is one competitors gesture at and never resolve: when does proprietary data still confer an edge, and when does a frontier model fine-tuned on a fraction of that data erase it?

Run it as an actual question to the technical team. How much of the target's measured advantage survives if a competitor fine-tunes an off-the-shelf model on a smaller, cheaper dataset? If most of the gap closes, the moat has already dissolved and the premium you are being asked to pay is for a capability the market will commoditise. If the gap holds, because the data encodes something fine-tuning cannot recover, the moat is real and the premium may be justified. This single test separates a defensible AI asset from a temporary lead more reliably than any architecture review.

The deepest moats resist articulation, which cuts both ways

The strongest data moats often cannot be written down. The value lived in the interaction of many tacit judgements that do not decompose into rules, so even the target cannot fully specify what makes it work. For a buyer this is the best possible defensibility signal and the most dangerous integration risk at the same time. If the moat cannot be extracted into documentation, it cannot be extracted from the people who hold it either. You are not buying a dataset. You are buying the team that produces it, and any post-close plan that treats them as redundant headcount destroys the asset you paid for.

The contrarian read: the data moat is usually a people moat in disguise

This is the part most diligence misses, because it sits outside the data room. When a target's defensibility lives in tacit labelling and accumulated judgement, the moat is not in the storage. It is in the people, and the decision the deal is quietly avoiding is the workforce one.

A data moat built on expert judgement is gated by the continued engagement of the people who hold that judgement. They are also the people most exposed by an acquisition. When integration treats them as a cost line while the thesis depends on their tacit knowledge, they do the rational thing. They protect themselves, they stop surfacing the workflow knowledge that made the data valuable, and they wait it out or leave. The moat walks out the door, and it does so quietly, eighteen months after close, long after the price was set. You cannot extract the asset from people you are planning to cut, and the synergy model that funds those cuts is often the same model that assumed the data alone was the moat.

A data moat that lives in people's heads is a workforce you are buying, not a dataset.

So the question that belongs in the investment committee is not only "is the data proprietary." It is "what does this moat depend on, in people, and does the deal structure keep those people motivated past the earn-out." Name that explicitly. The targets where the answer is solid are the ones where the moat is real and durable. The targets where management has never had that conversation are the ones where the moat is a story they have sold themselves.

How the finding maps to the decision

Invest when the pause is genuine, the value sits in a layer no competitor can buy, the foundation-model test leaves the gap intact, and the people who hold the tacit knowledge are retained by structure. Reprice when the claimed moat is commodity volume, when fine-tuning collapses the edge, or when decay is visible: model commoditisation, a looming data-sharing mandate, a competitor's synthetic-data strategy, or data-residency exposure that turns the asset into a liability. Pay for the business, not the moat premium. Walk when there is a perception gap between management and the people doing the work, when the moat is irreducible and unretained, or when management treats every technical nuance you raise as an obstacle and keeps pulling the conversation back to the headline data volume. That last tell means they are seeking confirmation of a price they have already set, not an honest read, and no amount of diligence will change a number that was decided emotionally.

If a single finding could move price by more than your error bars, or the moat turns out to live in people rather than in the data, that is the point to bring in an independent technical read before the number hardens.

Related insights: The non-executive director’s guide to assessing AI system performance — the assessment angle for board oversight rather than a deal; and The investor’s technical due diligence playbook — the wider read.

Proof point: we've built production AI from the ground up -- see an AI product taken from zero to beta. Knowing what real takes from the inside is what lets us tell a genuine system from a marketing layer.

Common questions

What is an example of a data moat in a deal?

A genuine one: years of labelled, domain-specific outcomes that a competitor cannot reproduce because the labelling encodes tacit expert judgement. An illusory one: a large transaction history any incumbent in the sector also holds. The first survives diligence; the second is volume with a story attached.

What does moat mean in business?

A structural reason a competitor cannot easily copy what the company does. In an AI target, the only moat worth a price premium is the layer a rival cannot buy, fine-tune, or hire its way to within a release cycle or two.

What are the five moats people list for data?

Proprietary dataset, feedback loop, regulatory or longitudinal data, integration and switching cost, and model-performance lead. In diligence most collapse to commodity volume. The two that hold are tacit-labelled data and feedback loops with real instrumentation behind them.

When does fine-tuning a frontier model dissolve a target's data moat?

When a competitor fine-tuning an off-the-shelf model on a fraction of the data closes most of the target's performance gap. If the edge survives that test, the moat is real. If it collapses, you are paying a moat premium for a commodity capability, and that is a reprice.

What is the biggest red flag that a claimed data moat is illusory?

A perception gap: management and the people doing the work disagree about what the valuable asset actually is. If they cannot agree on where the value lives, the moat is a story, and the price should reflect that.

Weighing this decision for a system that actually matters? That’s the conversation worth having before you commit budget.

Talk it through
Get AI insights in your inbox
Practical analysis on AI strategy, products, and technical leadership
No more than one newsletter a month