What I Learned Building Multi-Agent ML Systems at a Startup With No Established Infrastructure
Engineering Essay
Most ML engineering advice assumes you have clean data pipelines, dedicated MLOps teams, and well-scoped problems. I had none of that. As a founding AI engineer at a supply chain optimization startup, I built production multi-agent systems from scratch, and the lessons that stuck have almost nothing to do with model architecture.
Where the Time Actually Goes
Nobody tells you this ratio in school. Nobody warns you it never really improves.
Lesson 1: Your Job Isn't Modeling. It's Plumbing.
The first thing you learn when there's no infrastructure is that your job isn't modeling. It's plumbing. Before I could train a single agent, I had to figure out how client data actually flowed, where it broke, and what "clean" even meant for a company that had never standardized its inventory records.
I spent more time writing data validation logic than writing PyTorch. Nobody tells you that in school, and nobody warns you that this ratio never really improves.
Lesson 2: System Design Under Ambiguity
In a multi-agent setup, you're coordinating specialized models — demand forecasters, inventory optimizers, anomaly detectors — that each depend on outputs from the others. When you're building this without established patterns to follow, every architectural decision is a bet.
I learned to make those bets smaller. Tight interfaces between agents. Aggressive logging. The ability to swap out any single component without collapsing the whole pipeline. Not because I read about microservice principles, but because I shipped a monolithic version first and watched it become unmaintainable within weeks.
The Evolution
Lesson 3: "Production" Means Real Money
"Production" at a startup means something different than at a large company. Production meant a system making real decisions across millions of dollars in client inventory. There was no staging environment buffer between my code and financial consequences.
That pressure forced a discipline that no course or tutorial can replicate: you test obsessively not because best practices say so, but because you've felt the weight of a bad forecast propagating downstream.
The Biggest Takeaway
Building ML systems is primarily an engineering problem, not a research problem.
Architecture selection, training, tuning — the part everyone focuses on
Making it reliable, observable, and useful to people who never think about gradients
