Blog

Why Per-Line-Item Models Beat One Big Model

The obvious way to apply ML to bidding is one big global model. The less obvious — and often better — way is thousands of small ones, a model fitted to each line item. Here's the granularity trade-off, and what it takes to run a population of models in production.

Author: Ad360 engineering
Discipline: Platform engineering

If you set out to apply machine learning to bidding, the obvious architecture is a single, large, global model: feed it every impression from every campaign, let it learn the patterns, and score everything through one brain. It is clean, it is what most "AI bidding" implies, and it is frequently the wrong choice.

The less obvious architecture is the opposite: not one big model but thousands of small ones — a separate model fitted to each line item. It sounds absurd at first (a model per campaign line?), and it carries real operational cost. But for bidding it often wins, and understanding why reveals something fundamental about what a bid model is actually for.

The granularity trade-off

Every modeling decision in advertising is a bet on where the signal lives. A global model assumes that the patterns predicting a click or conversion are largely shared across campaigns — that what makes an impression valuable for Advertiser A also informs Advertiser B. Sometimes that is true. Often it is not.

A line item has its own goal, its own creative, its own audience, its own definition of success. A model fitted to that line item learns its particular relationship between context and outcome, without averaging it against every other campaign in the system. The trade-off is classic bias-versus-data: a global model has more data but coarser fit; a per-line-item model has a sharper fit but less data each. The right answer depends on whether campaign-specific signal outweighs the cost of smaller training sets — and in performance bidding, it frequently does.

Why per-line-item can win

Goal alignment. Each line item cares about a specific outcome. A dedicated model optimizes for that outcome instead of a blended average.
No cross-campaign contamination. A global model can let a high-volume campaign's patterns drown out a smaller one's. Per-line-item isolation prevents the loud from overwriting the quiet.
Cleaner updates. When a campaign changes — new creative, new audience — only its model needs to adapt, without disturbing everything else.
Interpretable failure. When one campaign's predictions drift, the cause is contained to one model, not buried in a global average.

In Ad360's production system this is the explicit design: each line_item_id corresponds to a specific trained model file (a .pkl). Optimization is fitted to each campaign, not imposed from a single global brain.

The catch: you now run a population of models

Per-line-item is not free, and pretending otherwise is how teams get burned. One model is easy to deploy. Thousands is an operations problem. You suddenly have to answer: where do all these models live, how do you load them fast enough to serve in-path, how do you update them without downtime, and how do you stop memory from exploding?

This is where the architecture earns its keep. Ad360's serving layer:

Preloads models so they are resident and ready, not loaded on the critical path.
Caps memory with a threshold (around 10GB) so the population cannot grow unbounded.
Hot-reloads individual models on an SNS notification when a new version is trained — no full redeploy.
Auto-expires models older than 7 days, so stale and inactive line items do not accumulate forever.

That set of mechanisms — preload, cap, hot-reload, expire — is the difference between "per-line-item models" as a nice idea and as a thing that survives production. The modeling insight is cheap; the model-population management is the hard part.

When the global model is right

Intellectual honesty requires the counter-case. A global (or pooled) model is the better choice when:

A line item has too little data to train a meaningful model of its own (cold start).
Patterns are genuinely shared across campaigns and a global model captures more signal.
The operational cost of a model population is not justified by the performance gain.

The mature answer is rarely purely one or the other. Cold-start line items can lean on pooled priors and graduate to their own models as data accrues; the architecture should support both rather than dogmatically insisting on one. Granularity is a dial, not a religion.

Common misconceptions

"One smart model is better than many small ones." Bigger is not automatically better; campaign-specific fit often beats a blended global average.
"Per-line-item means thousands of unrelated models." They can share feature pipelines, training code, and pooled priors — only the fitted parameters differ.
"It's too expensive to run." With preloading, memory caps, hot reload, and expiry, a model population is operationally tractable.
"More granularity always wins." Below a data threshold, a dedicated model overfits; granularity must be matched to available signal.

What good operation looks like

Choose granularity by where the signal lives, not by fashion.
Treat the model population as infrastructure — preload, cap memory, hot-reload, expire.
Have a cold-start path (pooled priors) for line items without enough data yet.
Monitor per-model drift so a single campaign's degradation is caught in isolation.

Open questions

What is the right data threshold for graduating a line item from a pooled prior to its own model?
Can the system learn the granularity — automatically deciding which line items deserve dedicated models?
How should per-line-item models share knowledge without reintroducing cross-campaign contamination?

The instinct to build one big model is really an instinct toward simplicity, and simplicity is valuable. But a bid model exists to capture this campaign's relationship between opportunity and outcome — and that relationship is often specific enough to deserve its own model. The engineering price is a population of models to manage; the payoff is optimization that fits each campaign instead of averaging it away.