Everyone's got inherited data. Legacy systems, migrations that happened three CTOs ago, naming conventions that contradict each other, and databases that look like a junk drawer someone labeled "important."
The standard advice? Clean it up first. Spend months mapping fields, standardizing names, and refactoring systems before you even think about AI.
That advice is expensive. And slow. And in the AI world, every month of delay is momentum you're handing to your competitors.
Here's the better move: bake the cleanup into the AI rollout itself.
Think of it like inheriting a house from a distant relative. It's packed with decades of stuff. Some of it is valuable. Some of it is junk. And some of it is a mystery wrapped in a filing cabinet from 1997. You don't catalog every drawer before you move in. You start living in it and sort as you go.
Same principle applies to your data.
The Playbook
Let AI do the sorting. AI handles large datasets better than humans staring at millions of rows in a database. Let it catalog what's there, flag what's missing, and surface what needs a human decision. Your people spend time making strategic calls, not counting entries.
Use a clean container strategy. Picture a water filtration system. Dirty water goes in one side, gets filtered, clean water comes out the other. Same premise. Don't touch the source data. Have AI filter it into a new container with better naming conventions, human-approved decisions baked in, and a structure that matches where you're headed. You might run a few filtering cycles. Each one gets you closer.
Let your AI agents talk to each other. Your data cleanup agent and your AI project agent need to be in conversation. Set up a shared channel (I use Slack for this) where they discuss in the open with a human watching the back and forth. It's slower than a direct agent-to-agent pipeline, but it keeps everyone who needs to know in the loop. The project agent learns what data is coming, what's changing names, and how to restructure its calls as the data gets cleaner.
Build a data lake. Information scattered across 20 systems? Welcome to reality. The AI age demands a shift from siloed product data to a comprehensive data lake. Every drop of data in that container gives your AI agents more references, connections, and context. It feels overwhelming to start. That's exactly where your data cleanup AI earns its keep.
Give yourself permission to iterate. This is the hardest one for leaders. Let AI take a first pass solo. If you've containerized properly and kept your source data untouched, the worst case is you reset and run again. The biggest friction in the AI age isn't technical. It's the feeling of control slipping away. But trying to maintain human oversight on every micro-decision means you might as well do it by hand. Let the AI run. Review. Adjust. Repeat.
From the Field
A best practice that saved me real money: ask your data cleanup agent to identify opportunities for mechanical or deterministic sorting before it starts burning tokens on AI processing.
I did this with my personal nemesis. Email. Over 300 emails per hour across multiple legacy accounts. No human was handling that volume. So during the data cleanup process, part of my prompt asked the agent to find mechanical paths forward to reduce token cost. It took a snapshot sample, identified the most common offenders, and built an automatic sorting script that runs without AI at all. Those scripts are self-running now. The data cleanup became an ongoing process instead of a one-off project.
Years ago, I worked on a massive data lake project with a leading US life insurance company. Over 35 federated backend systems. No common naming conventions. Most of them had never released data externally. I did the hard version of this work. Mapping thousands of fields by hand. Tracking down ancient documentation. Wrestling with deprecated datasets. Remapping to the best of my ability. That project won a major industry award, and I'm proud of the work. But I would never have a human do it again. Everything I did there can now be done by AI in a fraction of the time. Containerization, filtering, iteration. That's the formula.
Bottom Line
Your data is messy. That's not a blocker. It's a starting condition.
The companies winning with AI aren't the ones with perfect data. They're the ones who figured out how to clean and build at the same time.
Stop waiting. Start filtering.
Want to know where your organization actually stands? Take the MACH & AI Readiness Assessment to benchmark your data readiness, composable architecture maturity, and AI strategy against enterprise leaders.







