Deep dive· Industry · 11 min read

PPC automation was always possible. It just never paid for itself — until now.

For 15 years, automating PPC the right way cost more than it returned. AI didn't change what's possible — it changed the economics. Here's the proof.

PPC tools and props sorted into a 'finally affordable' pile.
The serious facts are real — the article covers are not.

In short: PPC automation didn't become newly possible — it became newly affordable. AI didn't add capabilities; it collapsed the payback period on ones we've had since 2019, turning six-month builds into two-week ones. The winners aren't fewer specialists but super-seniors who finally build the workflows that never used to pay off — like negative-query triage with a local Gemma model.

2 devs × 2 yrs
to build one reporting tool the old way
~€100k / yr
just to keep that engine running (roughly)
50–100 hrs
for one keyword-research project, manually
single-digit hrs
for the same project today

The whole argument in one sentence

Automating PPC the right way was always possible — it just never paid for itself, so almost nobody did it. AI didn’t hand us new powers; it collapsed the cost of using the old ones, and that single economic shift puts a decade of “we’d love to, but it’d never pay off” back on the table. Everything below is the proof, told from inside the tools I actually build with — not from a vendor keynote.

Here’s what you’ll get out of this piece, concretely: the three eras that made PPC automation too expensive to bother with, the exact moment the math flipped this year, and one real job — clearing junk search queries with a local open-source model — broken down step by step, with the intermediate output you’d be staring at shown at every stage. By the end you’ll be able to tell the difference between “AI gave us new abilities” (mostly false) and “AI changed who can afford the old ones” (the thing that actually matters), and you’ll have a flow you can copy.

I ran a small Google Ads scripts blog between 2015 and 2017, then I stopped writing. Not because the ideas dried up — because the gap between “this is possible” and “this is worth doing for a client” stayed stubbornly wide for a decade. Almost everything you’ve read about AI killing the PPC specialist gets the mechanism backwards. AI didn’t change what you can do in paid search. Most of it was always possible. What it changed is who can do it, and at what cost — and that’s a much bigger deal than another auto-bidding toggle. Let me walk you through how I know.

The Scripts era: a few days for a “simple” script

Why I’m starting here: to show you the ceiling was never the technology. It was the labour, from the very first tool I touched.

Google Ads Scripts felt like magic when they arrived. JavaScript, right in the account, looping over campaigns. In practice, a “simple” script — pause keywords over a CPA threshold, flag broken final URLs — was a few days of writing and debugging once you handled the edge cases, the quotas, the silent failures.

Then came the part nobody budgeted for: running it across accounts. One script per account, copy-pasted, drifting out of sync, breaking quietly when one client’s naming convention didn’t match the others. Scaling and distribution was its own job. So most scripts in the wild never went past reporting — pull some numbers into a Sheet on a schedule. Anything that actually changed the account was too fragile, and too expensive to maintain, to be worth it for most agencies.

The takeaway you carry forward: even the “easy” automation layer was gated by maintenance cost, not capability.

The API era: two days to the first query, two years to a tool

Why this one matters: it’s where the payback gap got so wide it became a business decision, not an engineering one.

The Google Ads API — the AdWords API back then — was the real power, and the real wall. I watched our systems architect, a man with 25-plus years of engineering behind him, spend two full days reading documentation before he got a single query to return data. That’s not a knock on him. That’s the surface area you signed up for.

We went all in anyway and built PPC Robot: a deeply customizable reporting and operations tool, technically beautiful, genuinely powerful. It also took two developers, full-time, two years. The development it needed to keep going ran somewhere around €100,000 a year — roughly, order-of-magnitude — and it never paid for itself. It covered a fraction of what our PPC specialists actually needed, so eventually we parked it in a limited internal mode. Not because it was bad. Because the math never closed.

And we still shipped real things on top of that API, four and five years ago:

What that engine actually produced

  • 404 / broken final-URL checker across accounts shipped
  • Shopping campaign generator from the feed shipped
  • Shopping / Performance Max segmentation shipped
  • BigQuery pipeline + reporting into Sheets / Excel shipped
  • Merchant Center account status checks shipped

Look at that list and notice something: none of it is exotic by today’s standards. It was all possible. It just cost a fortune to build and a fortune to keep alive. Every meaningful feature — a keyword-research tool, an expansion tool, ad translation, a Shopping generator — was measured in months. That’s the era’s whole lesson in one sentence: the ceiling was never the technology. It was the payback period.

Two things that broke this year

Why I’m getting specific now: “AI changed everything” is a claim you should refuse to accept on faith. So here are two jobs that used to be uneconomic and now aren’t — both things I’m running, not hypotheticals — and for the first one I’ll show you exactly what each step produces.

1. Excluding the wrong search queries

Cleaning irrelevant search terms out of an account is high-value and mind-numbing. The old way was a semi-manual crawl through thousands of queries, eyeballing patterns, adding negatives by hand. Picture the situation: a running-shoe shop is paying for clicks on “running shoes repair”, “nike air max history” and “free running shoes” — none of which it sells or services. Multiply that by thousands of rows, every week, across every account. That’s the job nobody wants and everybody needs.

The action that changed: a Python script pulls the queries from the Google Ads API and hands them to an open-source model — Google’s Gemma 4 — with proper instructions and, crucially, context about the site: the sitemap, the site/DB structure, the breadcrumb taxonomy, the product feed. The result: the model doesn’t just flag individual junk queries; it surfaces the patterns behind them, faster and cheaper than a human skim. Here’s that flow as five concrete steps.

PULL — get the raw search terms

Pull the search-terms report from the Google Ads API: query, clicks, cost, conversions. Why first: this is the evidence — the actual money already spent on each term. You want cost attached to every row so the model can tell expensive junk from harmless junk. You get: a flat table of every term the account has paid for in the window.

GROUND — build a context pack about the site

Assemble what the site actually is, in a form the model can read: the XML sitemap, the breadcrumb taxonomy, the product feed (id, title, category), and the DB/category structure. Why this is the whole game: a model with no context guesses; a model that knows you have no “repair” or “rental” category reasons. You get: a context pack that turns the model from a guesser into something that knows your catalogue.

ASK — classify queries and name the patterns

Prompt Gemma 4 with the terms plus the context pack: classify each query relevant / irrelevant to what we sell, and — the important part — return the patterns behind the irrelevant ones (a token, an intent, a category mismatch). Why patterns, not rows: flagging 200 junk queries saves you an afternoon; naming the category of junk lets you exclude the next thousand you haven’t even seen yet. You get: an irrelevant-query list and, above it, the handful of rules that generated it.

REVIEW — validate the rules, not the rows

A human reads the patterns — five to ten of them — not 5,000 individual rows. Why this is the time saver: judgment is applied once per rule instead of once per query, and a wrong rule is obvious in a way a single mislabeled row never is. You get: a short, trusted list of exclusion patterns a human actually signed off on.

PUSH — add the negatives at the right level

Push the approved negatives back through the API at the correct level — ad group, campaign, or shared list — depending on how broad the pattern is. Why level matters: a site-wide junk token (“free”, “wikipedia”) belongs on a shared list, not buried in one ad group. You get: the account cleaned, and a reusable negative list that keeps working next week.

The quiet headline here isn’t “AI is smart.” It’s that an open-source model running locally is enough — you don’t even need a frontier API to make this pay. That’s the economics moving, not the capability.

Watch it run: what each step actually spits out

The five steps are the recipe; this is the food. Below is the concrete artifact each step hands you for our running-shoe shop — what you’re literally looking at before you move on. The shapes are exactly what the tools return; the rows are illustrative, not a real client. (Illustrative examples throughout.)

PULL → the raw search terms, with money attached. Every term the account paid for, sorted so the waste is visible:

Search term                Clicks  Cost   Conv
running shoes                420    €310    12
free running shoes            88    €61      0
running shoes repair          54    €40      0
nike air max history          31    €24      0
running shoes wikipedia       19    €14      0

Four of these five rows are pure spend with zero conversions — €139 the account never had to pay. The problem is obvious in five rows and invisible in five thousand.

GROUND → the context pack the model reasons against. Not prose — a compact map of what the site genuinely is:

sitemap.xml      → 1,010 URLs (categories, products, blog)
breadcrumb tax.  → Footwear > Running > Road / Trail
product feed     → 1,205 SKUs (id, title, category, price)
DB structure     → no "repair", "rental" or "history" nodes exist

That last line is the one that does the work: the model now knows “repair” is not a thing this shop offers, instead of guessing.

ASK → verdicts, and the patterns above them. The model returns a verdict per query — but the prize is the block at the bottom:

Query                     Verdict      Why
free running shoes        irrelevant   freebie intent, no purchase
running shoes repair      irrelevant   service we don't offer
nike air max history      irrelevant   informational, no buy intent
running shoes wikipedia   irrelevant   reference-seeker

→ PATTERN: tokens "free", "repair", "history", "wikipedia"
  = non-commercial modifiers absent from our taxonomy.
  Recommend excluding as a shared negative list.

Four rows became one rule. That rule will catch “running shoes free shipping returns”-style junk you haven’t seen yet — which is the entire point.

REVIEW → a human signs off on the rule. You read one line — “non-commercial modifiers absent from our taxonomy” — agree it’s right, and you’re done. No scrolling 5,000 rows. The judgment happens once.

PUSH → the negatives land at the right level. Because the pattern is site-wide, it goes on a shared negative list, not one ad group:

Shared negative list: "non-commercial modifiers"
  free · repair · history · wikipedia · manual · pdf
Applied to: all Search campaigns

One pattern, one list, every campaign protected — and it keeps earning its keep next week without another human pass. That’s the moment a mind-numbing weekly chore becomes a ten-minute review.

2. Keyword research

The second job is the one that used to be a budget line of its own. Real keyword research — the kind that maps demand to your landing pages and tells you what’s missing from the site — used to mean dozens of hours of pulling data (AdWords API, suggest boxes, OpenRefine), semi-manual cleanup, classification by landing page, and trend / search-volume / gap reporting on top.

One keyword-research project, then vs. now

  • The old way — data pull, clean, classify, report 50–100 hrs
  • What the client paid for that ≈ €2,000–4,000
  • The same project today, with one good skill single-digit hrs
  • And the output is more accurate

It’s not just cheaper. It’s better — more precise, with hours spent on validation and judgment instead of on plumbing. That combination, cheaper and better, is exactly what was supposed to be impossible. I’ve broken the modern version down end to end in the market-expansion blueprint and the content-gap analysis — both with the real intermediate output shown at every step.

The economics, before and after

This is the whole thesis in one table. Same jobs, same quality bar — only the cost of doing them moved. Documented figures where I have them; the rest is order-of-magnitude from one agency’s two decades of doing this.

The jobThe old wayToday
Keyword research (one project)50–100 hrs · €2,000–4,000 billedsingle-digit hrs · more accurate
Negative-query triagesemi-manual crawl, thousands of rows by handscript + local model names the patterns
Ship one new automation featuremonths (2 devs × 2 yrs for one whole tool)weeks
Keep a reporting engine alive~€100k / yr, never paid for itselfnear-zero with a local model

Read the table top to bottom and the pattern is the same every row: the capability column didn’t change — we could do all of this in 2019. The price column fell through the floor. That’s not a technology story. It’s a payback-period story, and the payback period is what decides whether a smart idea ever gets built.

The thing to internalize: AI didn’t unlock new PPC capabilities so much as it collapsed the payback period on the old ones. When a six-month build becomes a two-week build, the entire backlog of “we’d love to, but it’d never pay off” suddenly clears.

What this actually means for the industry

Why I’ll plant a flag here: the popular take — “AI is ending the PPC specialist” — is lazy, and getting it backwards has real career consequences for people reading this.

“The era of PPC specialists is ending” is nonsense. The opposite is happening. Good specialists spent years frustrated that the smart thing — the thing they could clearly see — “wasn’t worth building.” Now they get to build it. Automatically, profitably, at scale. A whole shelf of PPC strategies that used to be uneconomic or simply absurd to attempt is suddenly on the table.

What is happening is a sharper split inside the craft. On one side, juniors without imagination, vision, or tool fluency — people who treat the platform UI as the whole job. On the other, super-seniors who wield the tools, invent the workflows, think outside the platform’s boxes, join data sources nobody else joins, and build themselves specialized dashboards instead of waiting for a vendor to ship a feature.

And to kill the obvious misreading: this is not a story about cheaper service. Tools, compute, and development still cost money. The point is that a project that used to be six months of development is now two weeks — so the investment finally makes sense. The client gets a dramatically better service for a similar price, not a worse one for less.

Why I’m writing again

I stopped blogging years ago because the gap between idea and economically-sane execution was too wide to be interesting. That gap just closed. So this blog is back, and it’s going to be concrete: use cases with real numbers, the exact flows, the actual outputs — including the messy parts and the limits. The first deep dives are already up; more are coming.

If your reaction to all this is “we could finally do that thing we always wanted” — good. That’s the whole point of the era. Let’s build it.

The part you can steal

The part you can steal

Negative-query triage with a local open-source model (Gemma 4) — the five-step flow above, as a copy-paste recipe:

1. PULL   Google Ads API → search-terms report (query, clicks, cost, conv)
2. GROUND Build a context pack the model can reason against:
            - sitemap.xml (what the site actually sells)
            - site / DB structure + breadcrumb taxonomy
            - product feed (id, title, category)
3. ASK    Prompt Gemma 4: "Given this site context, classify each query as
            relevant / irrelevant to what we sell. Return the irrelevant ones
            AND the PATTERNS behind them (token, intent, category mismatch)."
4. REVIEW Human validates the ~10 patterns, not 5,000 rows one by one.
5. PUSH   Add negatives back via the API at the right level (shared list for
            site-wide junk tokens; ad group for local noise).

Three things that decide whether this pays off:

  1. Context is the whole game. A model with no site context guesses; a model with your sitemap, taxonomy and feed reasons. Ground it before you trust it.
  2. Hunt patterns, not rows. The win isn’t flagging individual junk queries — it’s the model naming the category of junk so you exclude the next thousand too.
  3. Open-source is enough. You don’t need a frontier API for this. A good local model keeps the data in-house and the cost near zero — which is exactly why it finally pays off.

FAQ

Are you saying agencies should fire their PPC specialists?

The exact opposite. The specialists who understand strategy and tools are now more valuable, because they can finally execute the ideas that used to be uneconomic. What shrinks is the value of pure platform-UI button-pushing.

Is the €100k/year and two-devs-for-two-years figure exact?

No — treat it as order-of-magnitude. The point isn’t the precise euro amount; it’s that a single in-house reporting engine carried a six-figure annual cost and still never paid for itself. That’s the economics this whole piece is about.

Do I need an expensive frontier model to do this?

Not for jobs like negative-query triage. A capable open-source model such as Gemma 4, run locally with good site context, does the work — which keeps both your data and your costs in your control.

What makes the AI negative-query pass better than my own eyeballing?

Two things: it reads the whole search-terms report instead of a sample, and it returns the patterns, not just the rows. You validate ten rules instead of five thousand lines, and those rules keep catching new junk next week. The human still signs off — the model just does the skimming.

So this is just hype with a fresh coat of paint?

If it were, I wouldn’t have started writing again. The change is narrow and real: not new capabilities, but a collapsed payback period on existing ones. That’s a business change, not a magic one — and it’s why the backlog suddenly clears.

What will actually be on this blog?

Concrete use cases with numbers, the flows behind them, and the outputs — limits and failure modes included. Less manifesto, more “here’s the exact thing we ran and what it returned.”

The point of all this

Want this level of visibility in your account?

One e-mail. I'll tell you honestly whether it's worth it for your setup.

Get in touch →