ML EngineeringStartup EngineeringMulti-Agent Systems

What I Learned Building Multi-Agent ML Systems at a Startup With No Established Infrastructure

Engineering Essay

Most ML engineering advice assumes you have clean data pipelines, dedicated MLOps teams, and well-scoped problems. I had none of that. As a founding AI engineer at a supply chain optimization startup, I built production multi-agent systems from scratch, and the lessons that stuck have almost nothing to do with model architecture.

Where the Time Actually Goes

Data plumbing & validation40%

System integration & debugging25%

Testing & monitoring20%

Actual model work15%

Nobody tells you this ratio in school. Nobody warns you it never really improves.

Lesson 1: Your Job Isn't Modeling. It's Plumbing.

The first thing you learn when there's no infrastructure is that your job isn't modeling. It's plumbing. Before I could train a single agent, I had to figure out how client data actually flowed, where it broke, and what "clean" even meant for a company that had never standardized its inventory records.

I spent more time writing data validation logic than writing PyTorch. Nobody tells you that in school, and nobody warns you that this ratio never really improves.

What I Expected

Clean datasets arrive on schedule

Pick the right model architecture

Tune hyperparameters

Deploy and iterate

What Actually Happened

Define what "clean" even means for this client

Write 47 validation rules for inventory data

Debug why CSVs encode dates 4 different ways

Finally build the model (week 6)

Lesson 2: System Design Under Ambiguity

In a multi-agent setup, you're coordinating specialized models — demand forecasters, inventory optimizers, anomaly detectors — that each depend on outputs from the others. When you're building this without established patterns to follow, every architectural decision is a bet.

I learned to make those bets smaller. Tight interfaces between agents. Aggressive logging. The ability to swap out any single component without collapsing the whole pipeline. Not because I read about microservice principles, but because I shipped a monolithic version first and watched it become unmaintainable within weeks.

The Evolution

Week 1-3: The Monolith

Forecast + Optimize + Detect + Report

One change breaks everything

Week 4+: Modular Agents

Forecaster Agent

↕ tight interface

Optimizer Agent

↕ tight interface

Anomaly Agent

Lesson 3: "Production" Means Real Money

"Production" at a startup means something different than at a large company. Production meant a system making real decisions across millions of dollars in client inventory. There was no staging environment buffer between my code and financial consequences.

That pressure forced a discipline that no course or tutorial can replicate: you test obsessively not because best practices say so, but because you've felt the weight of a bad forecast propagating downstream.

Staging environment budget

$10M+

Client inventory at stake

Margin for bad forecasts

The Biggest Takeaway

Building ML systems is primarily an engineering problem, not a research problem.

~15%

The Model

Architecture selection, training, tuning — the part everyone focuses on

~85%

Everything Else

Making it reliable, observable, and useful to people who never think about gradients