In short: Feed your product catalogue in and get back a ranked table of which countries to expand to — scored on real search demand, CPCs, competitor prices and a simulated ROAS per market. The seven-step run translates your feed categories into each native language, pulls keyword ideas via the Google Ads API, scrapes competitive SERPs with DataForSEO, and finishes overnight. The winners are almost never the markets the meeting expected.
What you put in, and what you get out
You put in one thing: your product feed — the same Google Merchant Center file you already have. You get back one thing: a ranked table of countries, top to bottom, telling you where to expand next and why — backed by real search demand, real CPCs, real competitor prices and a simulated return on ad spend for each market. No gut feeling, no “Germany because it’s big”. A spreadsheet that overrules the meeting.
This article is the whole blueprint, and I’m not going to wave my hand at any of it. At every step I show you the real intermediate output — the actual table, the actual counts, the actual seed that came out the other side — from one overnight run on a real bargain-segment Czech e-shop. So you can see, step by step, exactly what each stage produces and decide “yes, that’s the thing I need”.
A word on why this is worth doing at all. The honest version of an expansion analysis — demand, prices, competition and unit economics across ten or fifteen markets — was always possible. It just meant hundreds of hours of pulling keyword volumes country by country, translating seeds, eyeballing competitors, and building a spreadsheet nobody fully trusted. So it mostly didn’t get done; teams picked their next country because someone spoke the language or a distributor called. That manual cost is what collapsed — not the difficulty of the idea, the difficulty of the execution. What used to be weeks is now one overnight run plus an afternoon.
The blueprint at a glance
Seven steps. For each one: what you do, what you’re looking at, and why — because every step exists to answer one specific question about a market.
1 — Pick the shortlist (logistics first)
Do: list the countries you can actually ship to and clear customs for. Why: market size is a terrible front-door filter — the biggest market is usually the priciest to enter. Logistics is the real constraint, so it goes first. You get: a list of 10–15 candidate countries to analyze, and nothing wasted on markets you can’t serve.
2 — Turn the feed into seeds
Do: take the product_type category paths from your Merchant Center feed and parse them into clean, atomic category seeds. Why: your own catalogue already names every category you sell — it’s the best, cheapest seed source there is, far better than brainstorming keywords. You get: a few hundred meaningful seeds instead of a quarter-million raw feed rows.
3 — Translate each seed into the native language
Do: have an LLM translate the full seed list into each shortlisted country’s own language, before any research. Why: nobody in Warsaw searches in English; query the proxy language and every volume and CPC downstream is quietly wrong. You get: the exact local words to query each market on.
4 — Pull keyword ideas per country
Do: push every translated seed through the Google Ads API (GenerateKeywordIdeas) for each country — first thing you download, because everything downstream needs it. Why: this is the raw demand signal — volume, CPC and seasonality per keyword. You get: the full keyword universe per market, with the link back to its source category kept intact.
5 — Let AI clean the scope
Do: filter the noise out at the keyword level, not the seed level. Why: a broad seed like “football” pulls millions of junk searches — but kill the seed and you also kill “football boots”, a real sale. You get: a relevant core, plus the buying-intent keywords tagged rather than thrown away.
6 — Scrape the competitive page (DataForSEO)
Do: for your best-selling products, scrape the live results page per market in one API call. Why: volume tells you demand, not whether you’ll be crushed on price; the SERP shows who advertises, at what price, how crowded the auction is. You get: competitor ad copy, live shopping prices and the organic players — for a few dollars per market.
7 — Simulate the economics, then score
Do: simulate ROAS from real CPC, a defensible conversion rate and per-market AOV, then weight demand, prices, logistics and friction into one score. Why: this is the question you actually came to answer. You get: the ranked table — where to expand, in order.
Now the same seven steps, one at a time, each with the real thing it produced on this run.
Step 1 — The shortlist the logistics radius produced
You don’t analyze all of Europe; you analyze where you can deliver. Starting from the home market’s shipping and customs reach, the candidate set came out to 13 target countries across Central and Eastern Europe and the Balkans. Here is that literal shortlist, with the entry signals the scoring uses later — the home market (Czechia) sits in the table too, but only as the calibration baseline, not a target. And note that two countries get flagged out before a single keyword is pulled, for a reason no slide deck predicts:
| Country | In EU? | Currency | E-com growth | Note |
|---|---|---|---|---|
| Germany | Yes | EUR | +5 % | Anchor market |
| Austria | Yes | EUR | +6 % | — |
| Poland | Yes | PLN | +12 % | Largest CEE market |
| Slovakia | Yes | EUR | +8.5 % | Home-language overlap |
| Czechia (home) | Yes | CZK | — | Calibration baseline |
| Hungary | Yes | HUF | +15 % | — |
| Romania | Yes | RON | +18 % | — |
| Bulgaria | Yes | BGN | +16 % | — |
| Croatia | Yes | EUR | +14 % | Local payments prerequisite |
| Slovenia | Yes | EUR | +10 % | — |
| Serbia | No | RSD | +20 % | — |
| Albania | No | ALL | +28 % | — |
| Bosnia & Herz. | No | BAM | +22 % | Dropped — no Bosnian in Google Ads |
| North Macedonia | No | MKD | +25 % | Dropped — no Macedonian in Google Ads |
That last column is the kind of constraint you only learn by running it: Google Ads doesn’t support Bosnian or Macedonian for keyword ideas, so those two markets fall out of the keyword stage no matter how attractive their growth looks. Thirteen planned, eleven the API can actually research.
Step 2 — The feed, collapsed to 400 seeds
The raw material is the product feed — roughly 252,000 catalogue items, with product_type paths glued together by &, a and slashes the way real feeds always are (“Fridges and freezers”, “Toys & games”). The parser splits those into atomic categories, drops generic top-level nodes (“home”, “electronics”, “garden”), and keeps only paths that clear a frequency threshold. A quarter-million items collapses to a tight, meaningful seed list:
Feed → seeds, the real reduction
- Raw catalogue items parsed ~252,000
- Distinct brands catalogued (kept out of seeds) 40,000+
- Frequency threshold — smallest kept seed still tags 234 items
- Meaningful category seeds after normalization 400
And the seeds themselves are exactly what you’d query a market on — categories, not product names. The heaviest ones by catalogue weight:
| Seed (translated) | Catalogue items | Level |
|---|---|---|
| Toys | 21,636 | L2 |
| Cases | 14,411 | L3 |
| Women's clothing | 14,341 | L2 |
| Phones | 13,544 | L2 |
| Notebooks | 12,277 | L2 |
| Pet supplies | 11,940 | L2 |
| Kitchen equipment | 11,379 | L2 |
| Auto & moto (accessories) | 11,683 | L1 |
To make this concrete: say the shop sells small home goods, toys and phone accessories. Its feed has a path like Electronics > Mobile > Phone cases. The parser throws away “Electronics” (too generic), keeps “phone cases” as a seed, and counts how many products sit under it — here, “cases” tags 14,411 items, so it’s clearly a real category worth researching, not a one-off. Product names mostly have zero search volume, so they’re ignored; the 40,000-plus catalogued brands get held back as a separate signal, and the category seeds carry the keyword research.
Step 3 — Every seed, in the market’s own language
This is the step people skip, and it quietly poisons everything downstream. You never query a market in English. Before a single API call, an LLM translates the full seed list into each country’s native language — the 400 core seeds, expanded with synonyms and spelling variants to 488 entries × 12 languages = 5,856 machine translations sitting ready. Here is one seed, “toys”, as it actually went out to each market:
| Market | What 'toys' becomes before the API call |
|---|---|
| Germany (de) | Spielzeug |
| Poland (pl) | zabawki |
| Czechia / Slovakia (cs/sk) | hračky |
| Hungary (hu) | játékok |
| Romania (ro) | jucării |
| Croatia / Serbia (hr/sr) | igračke / играчке |
| Bulgaria (bg) | играчки |
| Slovenia (sl) | igrače |
| Albania (sq) | lodra |
Why this matters in plain terms: if you researched Poland with the English word “toys”, Google would hand you the volume and CPC for English-language searches in Poland — a tiny, weird slice of expats and tourists. Query zabawki instead and you get the actual Polish market. The numbers are different by an order of magnitude, and every downstream decision rides on them. Brand names are the one exception — those never get translated.
Step 4 — 1.4 million keyword ideas, per market
Now the heavy lifting, and the first thing you actually download: each translated seed, in each country, through the Google Ads GenerateKeywordIdeas endpoint, with a rate limit and the link from every keyword back to its source product_type preserved. Across eleven researchable countries the run produced 1,402,486 keyword ideas — and the per-market split is itself a finding, because raw demand does not track the markets you’d expect to win:
| Market | Raw keyword ideas | With search volume |
|---|---|---|
| Poland | 410,343 | 408,852 |
| Germany | 348,914 | 347,868 |
| Austria | 323,294 | 311,438 |
| Romania | 128,304 | 126,385 |
| Bulgaria | 76,866 | 75,650 |
| Croatia | 61,611 | 58,179 |
| Serbia | 14,573 | 13,848 |
| Slovakia | 10,884 | 9,173 |
| Albania | 9,704 | 4,339 |
| Hungary | 9,281 | 8,584 |
| Slovenia | 8,712 | 6,542 |
The run is bookkept seed-by-seed, country-by-country, so you can see exactly how complete it is: 4,220 seed-country jobs attempted, 3,619 returned data, 601 errored — quota hits and the unsupported-language aborts from step 1. That’s the honest texture of a real run; it isn’t one clean sweep, and you want the log that proves which cells actually have data behind them.
Step 5 — AI cleans the scope (at the keyword level)
A broad seed drags in noise, and the temptation is to kill the noisy seed. Don’t. “Football” in Germany pulls millions of Bundesliga searches that have nothing to do with your shop — but the same seed also pulls “football boots”, which you very much want. So you filter at the keyword level, not the seed level: throw away the seed and you throw away its buying-intent children.
Here is what the cleanup actually removed on this run, and why — out-of-segment categories first, then pure noise patterns:
| Removed bucket | Keywords cut | Why |
|---|---|---|
| Out of segment — large appliances | 11,243 | Fridges, washers — small-goods shop only |
| Out of segment — whole vehicles | 8,127 | Motorcycles, cars — not parts |
| Out of segment — large furniture | 6,675 | Too heavy for the logistics radius |
| Out of segment — central heating | 6,354 | Bulky items outside the catalogue |
| Noise — local car marketplace | 3,837 | PL 'otomoto' classifieds, not category demand |
| Noise — calendar pages | 2,528 | German 'Kalender' — wrong intent |
| Noise — online games | 2,378 | Entertainment, not retail |
| Noise — used-car queries | 2,100+ | 'gebrauchtwagen', 'auto kaufen' |
| Noise — football / sport | 1,800+ | Bundesliga pulled in by the 'football' seed |
A concrete example of the keyword-vs-seed rule: the seed “auto-moto” is great for this shop — it sells car accessories. But in German it also pulls “gebrauchtwagen” (used cars) and “auto kaufen” (buy a car), which are worthless to a small-goods e-shop. You don’t delete the seed; you delete those specific keywords and keep “Handyhalter Auto” (phone holder for cars). That’s 2,100-plus junk keywords gone, the good ones untouched.
And the flip side of cleanup isn’t only subtraction — it’s tagging the signal you do want. For a bargain / second-hand catalogue, the buying intent lives in the deal words, so 49,770 keywords got tagged as bargain-intent in eleven languages:
Deal-intent tagging — the signal worth keeping
- German 'gebraucht' (used) 22,128
- Polish 'używany' + variants 6,290
- German 'günstig' (cheap) 2,606
- Romanian 'ieftin' (cheap) 2,305
- English 'second hand' 1,783
- Total bargain-intent keywords tagged 49,770
This is the step that separates a usable deliverable from a noisy one. On the first pass the spreadsheet was full of Bundesliga, 2026 calendars and used-car listings; it took a two-pass review across ten languages to get to a core a client would actually trust.
Step 6 — Pull the competitive page out of a SERP
Volume tells you demand; it says nothing about whether you’ll get crushed on price. So for the top 20–30 % of best-selling products per category, you localize the title and scrape the live results page. One POST to DataForSEO returns the entire page as JSON — the part you can’t reconstruct from your own account:
import requests
# One POST → the whole results page as JSON:
# paid ads (competitor copy), shopping blocks with prices, organic, PAA.
resp = requests.post(
"https://api.dataforseo.com/v3/serp/google/organic/live/advanced",
auth=(LOGIN, PASSWORD),
json=[{
"keyword": "lodówka do zabudowy", # localized to the market's language
"location_code": 2616, # Poland
"language_code": "pl",
"device": "desktop",
}],
)
# ~$0.0035 per query → 3,000 top products per market ≈ $10.50
The same response carries three things for one price: the paid block (competitor ad copy — raw material for your messaging research), popular_products (shopping blocks with live prices you can’t get from your own Merchant Center), and the organic results (the market’s real players, including the ones who don’t advertise). You pay for a price check and walk away with a copy library and a competitor map. And the whole-program cost is the punchline:
The competitive pass, priced (12 markets × 3,000 queries)
- Organic Regular, task mode $21.60
- Merchant Shopping (prices only) $36
- Organic Advanced, live (paid + shopping + PAA) $126
- DataForSEO minimum top-up (lasts months) $50
A full competitive SERP pass across twelve countries runs $22–126 depending on mode — against a Semrush subscription at ~$140+/month flat just for the UI.
Step 7 — Simulate the economics, then score
Now you fold everything together. From real CPC, a defensible conversion rate and a per-market AOV you simulate ROAS for every country, then weight demand, prices, logistics, growth and entry friction into a single score. The output is one ranked table — the deliverable you came for — and its punchline is reliable: the countries that come out on top are almost never the ones the team expected.
| Market | Total score | Simulated ROAS | Verdict |
|---|---|---|---|
| Slovakia | 54.8 | 2.25 | #1 — cheap clicks, home-language overlap |
| Poland | 52.7 | 2.22 | Big and viable — needs local payments |
| Serbia | 43.6 | 2.49 | Blue ocean, first-mover, cheapest clicks |
| Germany | 21.6 | 1.01 | Price war — ROAS penalty ×0.4 |
| Austria | 20.4 | 0.83 | Smaller prize, same auction — penalty |
| Croatia | 20.2 | 1.50 | Penalty — payments prerequisite |
| Slovenia | 19.1 | 1.49 | Penalty — thin volume |
| Hungary | 18.8 | 1.67 | Penalty |
| Romania | 18.4 | 1.23 | Penalty |
| Bulgaria | 18.1 | 1.30 | Penalty |
| Albania | 15.4 | 0.78 | Penalty — weakest economics |
Look at the top three. Slovakia, Poland and Serbia — not Germany, not Austria, the two markets everyone defaults to. Germany has the most demand in the set (348k keyword ideas) and lands fourth, dragged down by a simulated ROAS of barely 1.0. The reason is the hard penalty that keeps the table honest: when simulated ROAS drops below the viable threshold, the score is multiplied by 0.4 — because on a fresh market paid is your only channel, you have no SEO yet, and a market where paid doesn’t pay can’t be your entry point.
Calibrate on one account you own first. The CPCs above are planner estimates, and they run high. On this client’s live home account I had the real numbers to check against: 22.6 million search terms over twelve months, a real paid CPC of 2.54 CZK, AOV 747 CZK and a measured conversion rate near 3 %. The Keyword Planner’s suggested CPC ran roughly 10× higher than that real paid CPC — so every simulated CPC here is discounted by ten before it touches the ROAS math. Never trust a simulated ROAS you haven’t calibrated against one account you actually run.
The scoring model: nine weighted factors
The score isn’t a vibe. It’s nine components, each weighted by how much it actually moves the decision. ROAS dominates, because a huge market with brutal CPCs loses to a small market with cheap clicks every single time.
| Factor | Weight | What it measures |
|---|---|---|
| Paid viability (ROAS) | 20 % | Simulated AOV × CR ÷ CPC |
| Price competition | 15 % | Your prices vs. local sellers, per unit |
| Logistics | 15 % | Distance and delivery cost from home market |
| Search demand | 13 % | Normalized monthly search volume |
| Entry ease | 12 % | EU, shared currency, related language, COD |
| E-commerce growth | 10 % | Annual growth of the market |
| Market size | 5 % | E-commerce revenue, capped |
| Organic opportunity | 5 % | Low difficulty with enough volume |
| Purchasing power | 5 % | GDP per capita |
Two hard penalties keep the table honest: if simulated ROAS drops below a viable threshold the score is multiplied by 0.4 (paid acquisition doesn’t work and on a fresh market paid is your only channel); and if a country returns no CPC data at all, its score is cut to a third — you don’t bet on a market you can’t price. That second penalty is exactly why the two markets dropped at step 1 (no Google Ads support for their language) never reach the ranking above: no keyword data, no price to model, no defensible bet.
The CPC spread, measured
That ranking is built on one factor above all others — the cost of a click. Here is the real thing: the median planner CPC per market from this overnight run, across 745,712 cleaned, priced keyword ideas pulled via the Google Ads API (GenerateKeywordIdeas, run 28–29 April 2026). Every figure is in CZK, the account’s own currency, so the markets compare cleanly.
| Market | Keyword ideas | Median CPC (CZK) | vs. cheapest |
|---|---|---|---|
| Serbia | 4,208 | 2.42 | 1.0× (baseline) |
| Poland | 260,333 | 4.73 | 2.0× |
| Croatia | 27,943 | 5.12 | 2.1× |
| Bulgaria | 44,819 | 5.14 | 2.1× |
| Romania | 65,262 | 5.30 | 2.2× |
| Hungary | 2,118 | 5.49 | 2.3× |
| Slovenia | 797 | 6.13 | 2.5× |
| Slovakia | 1,152 | 7.19 | 3.0× |
| Germany | 232,421 | 10.99 | 4.5× |
| Austria | 106,659 | 12.40 | 5.1× |
The spread is the whole argument against defaulting to the obvious market: an Austrian click costs 5.1× a Serbian one and a German click 4.5×, before a single unit of basket size enters the math. The cheapest clicks sit where the meeting never points — Serbia first, then Poland and the Balkans — while the familiar Western markets carry an auction premium most teams underestimate. (Albania is left out of the table: with only 31 priced ideas its median isn’t worth trusting.)
One honest caveat — the same one the callout makes. These are planner estimates, and planner CPC runs high: the real Czech home account behind this run paid 2.54 CZK per click across 22.6 million search terms, below even the cheapest planner figure here. So read the column as relative market pressure, not the price you’ll actually pay.
The click-cost spread, one bargain-segment run
- Cheapest market click — Serbia, planner median 2.42 CZK
- Most expensive — Austria, planner median 12.40 CZK
- Cheapest-to-priciest spread across the set 5.1×
- Home Czech account, real paid CPC (22.6M search terms) 2.54 CZK
Bonus output: the seasonality you’d plan budget around
The same keyword pull carries twelve months of search volume per keyword — 16.7 million monthly data points across 1.39 million keywords on this run. Aggregate it and the demand curve is unmistakable: a single peak month carrying 40 % more search demand than the quietest one.
Seasonality across the full keyword set
- Peak month — December 2025 752.5M searches
- Trough month — July 2025 535.7M searches
- Peak-to-trough swing 1.40×
December tops the curve and July bottoms it — exactly what you’d expect for a general-merchandise catalogue. That single swing is your launch-timing and budget signal: you don’t open a new market into its dead month, and you load spend before the December run-up, not during it.
Verified, not theorized
Every number above came out of one real run. Here is the headline set, with two more cross-border analyses for scale:
From real CEE expansion analyses
- Keyword ideas, one bargain-segment shop (Google Ads API) 1,402,486
- Countries targeted / completed 13 / 11
- Keyword research wall-clock, overnight & automated ~10 hours
- Niche catalogue: raw keywords → kept after AI cleanup 1.18M → 168k (−86 %)
- Full SERP pass: 12 countries × 3,000 queries $22–126
Two markets had to be dropped mid-run because Google Ads doesn’t support their language for keyword ideas — the kind of constraint you only learn by actually running the thing. The ~10 hours is machine time: the research grinds overnight across all countries in parallel while you sleep. The human work — shaping the feed, calibrating the economics, cleaning the noise, building the deliverable — is a few focused hours on top. What used to be weeks of manual labor is now one overnight run plus an afternoon.
(The −86 % cut is from a different, far more niche catalogue — a B2B construction-profiles shop where most broad-seed keywords were genuinely off-topic. The bargain shop above only shed a single-digit percentage to scope and noise. How aggressive the cleanup is depends entirely on how niche you are.)
Now the fun part: what you do with the ranking
1. You enter the market the spreadsheet picked, not the one you assumed
The whole point is to be surprised. On this run, Slovakia and Serbia — a small market and a non-EU one nobody had pitched — outscored Germany and Austria outright, purely because clicks were a fraction of the price and the economics actually closed. That’s the deliverable doing its job: overruling the gut with the math.
2. The competitive scrape becomes your launch playbook
Because step 6 already pulled every competitor’s ad copy and price point per market, you don’t walk in blind. You know who’s there, what they charge, which benefits they push in their headlines, and where there’s a price gap to undercut or a positioning gap to own — before you spend a single euro on traffic.
3. Demand per category tells you what to ship first
You kept the link from each keyword back to its product_type. So you don’t just know Poland is good — you know which categories Poland searches for, in what volume. The catalogue you launch with is the one the data says the market wants, not a copy-paste of your home assortment.
The part you can steal
Seed-normalization prompt — turns raw feed categories into clean, searchable seeds:
You are a keyword-research assistant. Input is one raw product_type path
from a Google Merchant Center feed, e.g. "Home & Garden > Fridges and freezers".
Return atomic, searchable category seeds in {LANGUAGE}, one per line:
- split glued categories ("fridges and freezers" → "fridge", "freezer")
- drop generic top-level nodes ("home", "garden", "electronics")
- expand each to 2–3 synonyms people actually search for
- never translate brand namesThe competitive call — localized query, the market’s location_code, advanced format:
curl -s "https://api.dataforseo.com/v3/serp/google/organic/live/advanced" \
-u "$LOGIN:$PASSWORD" -H "Content-Type: application/json" \
-d '[{"keyword":"lodówka do zabudowy","location_code":2616,"language_code":"pl"}]'Four traps that will bite you:
- Filter at keyword level, never seed level. Killing a noisy seed also kills its buying-intent children. “Football” is noise; “football boots” is a sale.
- Native language only. No English as a proxy market. Translate the seeds first; the demand lives in the local words.
- Discount the planner’s CPC. Keyword-tool suggested CPC ran ~10× the real paid CPC on a live account I checked — calibrate before you simulate ROAS.
- GTIN is treated as fulltext in Shopping, not as an exact ID — confirm a match by title, or you’ll pair the wrong products.
FAQ
How many countries should I analyze?
Start from logistics, not ambition. Whatever you can actually ship to and clear customs for — usually a 10–15 country shortlist. This run started from 13 and lost two at the keyword stage because Google Ads couldn’t research their language; the scoring thinned the rest.
A million keywords sounds like overkill. Is it?
It’s the natural scale, not an exaggeration: 400 seeds × a dozen languages × all their variants runs into the millions. This one bargain-segment shop produced 1,402,486 keyword ideas. The volume is the point — you’re mapping a market, not writing a 5,000-word brief.
Google Ads API, DataForSEO or Semrush?
Google Ads API for the keyword ideas if you have a strong token — it’s the source of truth. DataForSEO for the SERP scraping and as a cheap keyword alternative: a full 12-country competitive pass runs $22–126. Semrush is the expensive option; for this job there’s little reason to pay its flat subscription.
How do you handle so many languages?
An LLM translates the seed list into each country’s native language before the research runs — the 400 core seeds expand to 488 entries with variants, × 12 languages on this run, 5,856 translations ready before the first API call. English-as-a-proxy quietly distorts every volume and CPC, because nobody in Warsaw searches in English.
Can I trust the CPC numbers for the ROAS simulation?
Not raw. Keyword-tool CPCs are planning estimates; on this live account the suggested CPC was about 10× the real paid one (2.54 CZK across 22.6M search terms). Calibrate against one account you actually run, apply that discount, and then the simulated ROAS is worth something.
How long does the whole thing take?
The keyword research itself is one overnight run — roughly 10 hours of unattended machine time across all countries in parallel. The human work around it (feed shaping, economics, cleanup, deliverable) is a handful of hours. Call it a day or two end to end, against what used to be weeks.
CTA: Wondering which markets actually pay back? Let’s run the blueprint on your feed.