Why AI Needs a System of Record, Not Just File Storage
Connecting AI to your file storage doesn't solve version control. It just automates the guessing.
Most AI-for-private-markets pitches follow the same script: connect our tool to your Box, your SharePoint, your Dropbox, and query all your deal documents at once. It sounds sensible until you notice the flaw sitting underneath it. Box is not a system of record. It is a version graveyard.
What is a system of record?
A system of record is the single authoritative source for a given piece of data, the one place everyone agrees holds the correct, current answer. It is not a storage location. It is a decision, made in advance, about which version counts.
File storage tools were never built to make that decision. They were built to hold files. Box, SharePoint, and Dropbox will happily keep every version anyone has ever saved, forever, with no opinion on which one is real. That is fine when a human is doing the searching, because a human brings judgement, recent memory of the last call with management, a sense of which folder the team actually works out of. An AI layer has none of that context unless someone has built it in.
Why can’t AI just read my files in a shared drive?
It can read them. It cannot tell which one is right, and reading without judgement is where the trouble starts.
I watched this play out with a PE associate evaluating AI vendors. Her team liked the pitch from a tool connected straight to Box. Then someone asked the obvious question: “We have fifty versions of every model and memo in here. How is any AI supposed to know which one to trust?” Nobody on the vendor side had a good answer, because there wasn’t one. The tool was querying a graveyard and calling it search.
If your AI layer is querying a graveyard, it’s guessing. Confidently.
That confidence is the dangerous part. An AI that says “I don’t know which version is current” is honest and useless. An AI that picks one and states the answer with total certainty is far more dangerous, because it looks exactly like an AI that got it right.
Why does AI give different answers to the same question?
Because the underlying data usually has no single agreed answer in the first place, and AI just inherits that mess and delivers it faster.
Ask four departments for “ARR” and you will get four different numbers, each defensible on its own terms, each calculated slightly differently. That’s a known, tolerated ambiguity when humans are doing the asking, because a human will caveat the number or check which definition applies. Now tell an AI agent to “calculate ARR for the board deck.” Which of the four does it use? Nobody told it there was a choice to make, so it makes one anyway and moves on.
The more of this work you automate, the more it matters that someone has already done the unglamorous job of deciding what the correct answer is and where it lives. Automation does not remove the need for that decision. It just makes the cost of skipping it arrive faster.
Why does this matter more in deal work than almost anywhere else?
Because deal work lives and dies on version control, and the stakes of getting the wrong version attach directly to real capital decisions.
Which model has the current assumptions? Which memo reflects the latest call with management, not the one from three weeks ago before the numbers changed? Which CIM is the final version, and which is the one with the typo on page forty-seven that somehow keeps circulating? These are not edge cases you can wave away. They are the job. An AI that cannot tell “Revenue Model v12 FINAL” from “Revenue Model v12 FINAL (2) updated” is not saving anyone time. It is manufacturing liability with good production values.
Is “AI on top of Box” ever going to work?
Not as the winning long-term pattern, because the fix isn’t a smarter search layer over the same unsorted files. It’s resolving the version problem before the AI ever gets asked a question.
For some firms that means restructuring how documents get named and stored. For others it means migrating into the structured data layer purpose-built for deal work. Either way, the mental shift matters more than the specific tool: AI is not a bolt-on you drop onto existing systems and hope for the best. It is a genuine opportunity to rearchitect how deal data and workflows are organised, because the old, messy way was never designed with a tireless, literal-minded reader in mind. Patch-working a search tool on top of chaotic file storage will not give you the answers you actually need, no matter how good the model behind it is.
What does it look like to get this right?
The firms ahead of the curve move to deal-native systems of record: platforms that ingest raw materials, normalise them into structured deal objects, and only then layer intelligence on top. Companies, financials, people, each becomes a defined object with lineage back to its source document, so every figure an agent produces can be traced to where it came from and when.
The agent in that setup is not guessing which file to open. It already knows, because the guessing was resolved upstream, as part of how the data got structured in the first place. That is the difference between an AI tool that answers quickly and one you can actually put in front of an investment committee. If you’re weighing up whether your current stack can support that shift or needs rebuilding, talk to us.
The firms that treat this as a data problem first will be the ones whose AI tools compound in value. The ones that treat it as a search problem will keep discovering new ways their AI was confidently wrong.
Frequently asked questions
- What is a system of record?
- A system of record is the single authoritative source for a given piece of data, the place everyone agrees holds the correct, current version. In deal work this means one place decides which model, which memo, and which set of financials is the real one, rather than every team member's local copy being an equally plausible candidate.
- Why can't AI just read my files in Box or SharePoint?
- AI can read the files, but reading is not the same as knowing which one is right. File storage tools keep every version anyone has ever saved without judging which is current, so an AI layered on top inherits that ambiguity and has to guess which document to trust.
- Why does AI give different answers to the same financial question?
- Usually because the underlying data has no single source of truth. If four departments each keep their own version of ARR, an AI agent asked to calculate ARR for a board deck has no way to know which one is correct unless a system of record already resolved that question before the AI was ever asked.
- Is migrating off file storage worth it just for AI?
- It is worth it because the underlying problem exists with or without AI. Version confusion already costs deal teams time and creates real risk; AI just makes the cost of getting it wrong faster and more automated, which is what makes fixing it urgent now rather than later.
- What does a structured data layer actually do differently?
- It ingests raw materials such as CIMs, models, and memos, and normalises them into structured deal objects, such as companies, financials, and people, each with lineage back to its source document. An AI agent working against that layer already knows which figure is current instead of having to infer it from file names and folder structure.
- Does this mean AI can't be used until systems are migrated?
- No, but the value of AI on top of ungoverned file storage is limited to search and summarisation, not decisions you'd put in front of an investment committee. Firms get the most value by fixing the data layer first, or choosing deal-native tools that build the structured layer in as part of adoption.
Get this thinking weekly.
Acquisition Intelligence is a weekly read on AI in M&A for deal-makers. No fluff, no hype.
