A Series B e-commerce brand we audited last winter had spent $2.3 million on paid media over 18 months. Hundreds of experiments across hooks, audiences, formats, landing pages, and offers. By the end of it, their VP of Growth couldn't tell me which hook had won most often, why their best audience converted at 3× the others, or what their next quarter's testing roadmap should look like.
They had data — thousands of rows of campaign metrics. They didn't have insight. After 18 months and hundreds of experiments, they were starting from scratch every quarter.
This is the most common leak I see in growth teams. The first two parts of this series covered how to test and how to scale. This one is about the system that catches everything those two activities produce and makes it compound — the learning engine.
Three jobs of a learning engine
A learning engine does three things, in order:
- Capture — documents not just results but context: the hypothesis, the expected outcome, what actually happened, what surprised you.
- Attribute — explains why a result happened. "Hook B won" is a result. "Pain-point messaging outperforms aspiration messaging for cold audiences in the awareness stage" is an insight.
- Compound — feeds insights back into the next cycle of hypotheses. Every test starts from a smarter starting point than the last.
Teams that capture but don't attribute have a logbook. Teams that attribute but don't compound have a book club. The engine only runs when all three are in motion.
Five places where insights leak out
Before the system, the diagnosis. Where does the knowledge actually disappear?
- The ad-account graveyard. Results sit in Meta Ads Manager or Google Ads with no context. Two months later, no one remembers why that creative was killed.
- The Slack scroll. Insights posted at 4pm Friday. Unsearchable by Monday. Gone by next quarter.
- The spreadsheet sprawl. Six tracking docs. Three of them out of date. No single source of truth.
- The human hard drive. The senior buyer knows everything. The senior buyer leaves. The knowledge goes with them.
- The "what" without the "why." The result is recorded. The reason it happened isn't. Six months later, the data is unactionable.
If you've worked at a growth team for more than two years, you've seen all five. They're invisible until you try to build on past learning and realise there's nothing to build on.
The experiment log — the core artifact
The thing that solves Capture and Attribute is the same thing: a structured experiment log. Every test gets an entry. The entry has fields. The fields aren't optional.
- Test ID + strategic question — what are we trying to learn?
- Hypothesis — what we expect to see and why
- Variables — what changed between cells
- Audience — who saw it
- Results — the raw metrics
- Winning variant — the result
- Insight — the interpretation. The why. Always written as a generalisable statement, not a metric.
- Next action — what test does this insight unlock?
- Tags — audience, stage, format, channel. Searchable.
The difference between teams that compound and teams that don't comes down almost entirely to whether the Insight + Next Action fields get filled in honestly, every time, by the person who ran the test. Most teams fill in everything except those two. Those two are the entire point.
"Hook B: 2.9% CTR. Hook A: 1.7% CTR." is data. "Problem-agitation outperforms solution-focused messaging for cold audiences in the awareness stage." is an insight. One you can reuse; the other dies with the campaign.
Wiring Claude + the Meta MCP server into the loop · April 2026
This is where 2026 looks different from 2024. The capture step — the one most teams break on — used to require a human to remember to fill in the log after every test. The bottleneck wasn't the system; it was the discipline. The Meta MCP server changes that.
For the unfamiliar: MCP (Model Context Protocol) is the open standard Anthropic released in late 2024 that lets Claude talk directly to external tools. By April 2026 the ecosystem of MCP servers has matured significantly. The one most relevant to growth teams is the Meta Ads MCP server — community-maintained, runs locally on your machine, lets Claude query your Meta ad account directly. Google Ads, GA4, and TikTok have equivalents at various levels of maturity.
Here's the workflow we run on our own accounts now:
- Friday afternoon prompt. Open Claude Desktop. The Meta MCP server is already connected to the ad account. The prompt is something like: "Pull every experiment that ran this week. Group by hypothesis tag. For each, summarise: what we expected, what happened, the dollar delta, and your read on why."
- Claude does the pre-work. It queries the account, pulls the spend, pulls the metrics, and writes draft insight statements for each test — including what surprised it relative to the hypothesis.
- The operator reviews. Five to ten minutes. Approve, edit, or reject each draft. Operator-level judgment stays human; the formatting and pre-write are automated away.
- Append to the artifact. Approved insights get written to the experiment log (we use Notion via its MCP server; you can use Google Sheets, Linear, whatever you keep your log in).
The capture step used to take 90 minutes a week and was the first thing to slip when the team got busy. With Claude + the Meta MCP server, it's 15 minutes a week, and the slippage rate is materially lower because the painful part — the writing — is done before the human shows up.
Two operational notes for anyone setting this up:
- Use a read-only Meta access token. The MCP server can fetch metrics; it can't change campaigns. The risk surface stays small.
- Don't let Claude write directly to the artifact unsupervised. The model is good at drafting. The operator should still be the one who decides what becomes institutional memory.
Building institutional memory
Three practices turn a logbook into institutional memory:
- Weekly learning reviews. Fifteen minutes. The team walks through the insights from the past week. Forces interpretation, distributes knowledge, surfaces patterns.
- Quarterly insight synthesis. Take the last 90 days of the log. Extract 5–10 learnings as principles, not results. "We've now confirmed across three accounts that retargeting windows above 14 days underperform 7-day windows for high-AOV verticals."
- Pre-mortem searches. Before launching any new test, the operator searches the log for prior work on the same hypothesis. Most "new" tests have been run before. Knowing what happened last time changes how this one gets designed.
The shift these three practices unlock: knowledge stops being in people's heads and becomes a system-level asset. When the senior buyer leaves, the new one inherits the bench.
The complete loop
Testing finds winners and generates insights about resonance. Scaling extracts value from winners and reveals insights about efficiency. Learning captures, attributes, and compounds the insights from both back into the next round of testing. Each cycle starts from a smarter starting point than the last. That's the loop.
Most brands run ads hoping each campaign matches the last. The teams building engines test systematically, scale with discipline, and compound their knowledge. They become faster every quarter. Their competitors accumulate the same confusion.
The artifact that captures all of this — the document that becomes the team's operating manual — is the focus of part four of this series. Tactically, it's the most important piece of the engine. Operationally, it's the one that survives turnover.