2026 · Novus Stream SolutionsAbout 11 min readNovus Stream Solutions
Keyword research with free data: Search Console, autocomplete, and common sense
You do not need a paid SEO suite to find what your audience searches for. The richest keyword data you can get is free — your own Search Console — and the second richest is sitting in the search box itself.
Overview
Keyword research has been productized so aggressively that most guides now read like ads: pick a tool, pay the subscription, trust the difficulty score. For a small site, that is backwards. The paid suites are estimating things you can observe directly — what people type, what already ranks, what your own site almost ranks for — and their most-cited numbers, search volume and keyword difficulty, are the two least reliable figures in the entire discipline. Volume estimates for the long-tail queries a small site should target are routinely wrong by an order of magnitude, and difficulty scores compress a judgment call into a single digit that mostly measures backlinks.
This guide is the full workflow using only free sources, and it is not a budget compromise — for a site under a few hundred pages it is genuinely the better process, because every input is real behavior rather than a model of it. The sources, in priority order: your own Search Console data, the search engine's own suggestion surfaces (autocomplete, People Also Ask, related searches), the visible results page itself as a difficulty meter, and the places your audience asks questions in their own words. The output is not a spreadsheet of ten thousand keywords; it is a short, prioritized queue of topics you have specific evidence you can win.
Start where the data is yours: Search Console
If your site has been live for even a few months, Search Console's performance report is the highest-quality keyword dataset you will ever have access to, because it is not an estimate of anything — it is the actual list of queries where your site appeared in results, with real impression counts, real clicks, and your real average position. Most owners glance at the top-clicked queries and leave. The value is deeper in: filter to queries with meaningful impressions but a position between roughly eight and twenty-five. Each of those is a search where the engine already half-believes you belong but does not yet show you where anyone clicks. That list is your cheapest possible wins, ranked by the engine itself.
Work the list with two moves. Where a page is almost ranking for a query it only partially addresses, expand that page — add the section that actually answers the query, sharpen the heading, and you frequently jump ten positions for an hour of work, because you are improving a page the engine has already vetted. Where a cluster of related almost-queries has no good page at all — they currently land on a tangentially related post — that is a confirmed content gap: demonstrated demand, demonstrated crawlability, missing supply. Write the dedicated page. A monthly half-hour in this report generates more validated article ideas than most small teams can publish, which is a strange kind of problem to have for free.
Autocomplete: demand, verbatim, spelled the way people type it
Search autocomplete is the most underrated research instrument available, because suggestions are generated from what people actually type, weighted toward what they type often and recently. Type a seed phrase from your niche and stop — the dropdown is a market survey. Then walk the alphabet: your phrase plus a space plus "a", then "b", then "c", each letter revealing a different shelf of completions. Then prepend the question words — how, why, can, does, best, versus — because each frames a different intent around the same seed. Fifteen minutes of this produces fifty to a hundred real queries, in the audience's own vocabulary, which is usually not your vocabulary; the gap between what you call something and what searchers call it is itself a finding.
Two refinements make this dramatically more useful. First, do it logged out or in a private window, so the suggestions reflect the population rather than your own history. Second, mine the bottom of the results page too — the related-searches block is the engine telling you which queries it considers siblings of the one you ran, which is free clustering information: queries the engine treats as siblings often belong in the same article, while queries it keeps separate usually deserve separate pages. People Also Ask boxes complete the set: expand a few entries and the box grows, feeding you the question-formatted demand around the topic. Every PAA question your article can answer cleanly, in a short direct paragraph under a matching heading, is a small additional surface your page can win.
Judging difficulty by reading the results page
Difficulty scores exist because reading ten search results takes five minutes and a number takes one second — but the five minutes is where all the real information is. Run the query you are considering and audit who currently wins it. The signals that say "winnable by a small site" are concrete: forum threads or Q&A posts ranking in the top five (the engine could not find a proper article, so it settled), thin listicles that clearly never performed the task they describe, pages that answer the query only as a side effect of targeting something broader, and dated content in a topic that has since moved. Any of those in the top five is a gap wearing a ranking.
The opposite signals — every top result a dedicated, recent, substantial page from a known brand, the result types dominated by giant aggregators, or the page-one layout crowded with shopping modules and very few organic slots — mean the auction is expensive and your article will arrive underdressed regardless of quality. Be honest about the third possibility too: queries where the engine answers directly in the results (conversions, definitions, quick facts) can be "won" and still deliver almost no clicks. The question is never only "can I rank?" but "is there a click left to earn?" Five minutes of reading answers both; no score does.
Intent first, volume never
Every query worth targeting carries an intent — the searcher wants to learn something, find something, compare options, or complete a purchase — and matching that intent matters more than any other property of the page. The intent is readable from two places: the grammar of the query itself (how-to phrasing wants instructions, "best X for Y" wants a comparison with a verdict, a bare product noun is often purchase-adjacent) and, more reliably, from what currently ranks, because the result mix is the engine showing you its conclusion about what searchers accepted. If the top results for your target query are all step-by-step guides, a product page will not break in, and vice versa, no matter how good it is.
As for volume: small sites should mostly ignore it, and not only because free estimates are unreliable. The strategic reason is that volume and winnability are inversely correlated — high-volume heads are exactly where the established players concentrate, while the long tail of specific, low-volume queries is where authority matters least and relevance matters most. Forty articles each earning a trickle from precise queries will outperform one article failing to rank for a big one, and the trickle traffic converts better, because specific queries come from people with specific situations your page precisely addresses. The portfolio logic, not the lottery logic, is the small-site game; volume numbers mostly tempt you back toward the lottery.
Listening posts: where the audience asks in full sentences
Search data shows you demand after it has been compressed into query syntax. To find demand before it reaches the search box — and to find the phrasing that makes content resonate — go where your audience asks questions in full sentences: the forums, subreddits, Discord servers, Facebook groups, and review sections of your niche. Recurring questions in those spaces are keyword research with the intent still attached: the asker explains their situation, what they tried, what confused them, which is precisely the material a great article is built from and precisely what no keyword tool exports. A question asked weekly in a niche community is demand, whether or not any tool reports volume for it.
Reviews — of competing products, of books in the space, of adjacent tools — are a second listening post with a particular gift: complaint vocabulary. The exact phrases people use to describe what frustrates them are the phrases they later type into search engines, and titles built from that language get clicked because they sound like the searcher's own thought. Keep the harvesting lightweight: a running note of repeated questions and repeated phrasings, skimmed weekly. You are not doing ethnography; you are keeping the publishing queue connected to live demand, and noticing new topics months before they show up in anyone's keyword database — which, on a small site that can publish tomorrow, is an actual structural advantage over slower competitors.
From pile to plan: clustering and the publishing queue
The harvesting steps produce a messy pile — Search Console almost-queries, autocomplete strings, PAA questions, community phrasings — and the pile is not a plan. The organizing move is clustering: grouping queries that one page could plausibly satisfy together, so you write one strong page per intent instead of five thin pages cannibalizing each other or one bloated page straddling intents that needed separation. The free clustering heuristic is the one the engine already gave you: queries whose results pages overlap heavily are one cluster; queries whose results barely overlap want separate pages. When in doubt, check what currently ranks for both phrasings — if the same pages win both, the engine has merged the intents for you.
Then prioritize with three questions per cluster, no scores required. Is there evidence of demand (it surfaced in multiple sources, or carries real impressions in Search Console)? Is the current page one beatable (you found the weakness signals when you read it)? And does winning it serve the business (the searcher is someone whose visit you can do something with — even if that something is just becoming their bookmarked reference)? Clusters that pass all three go in the queue, quick wins from the Search Console list first because they compound fastest. The queue replaces the spreadsheet: a short ordered list of pages to write or expand, each with its evidence attached, each of which you have personally verified is winnable by reading what it has to beat.
A sustainable cadence: the monthly hour
Research is not a launch-phase project that ends; demand moves, and the system that keeps you aligned with it is a small recurring habit rather than an annual overhaul. A workable cadence for a small site is a single monthly hour, split three ways. Twenty minutes in Search Console: harvest new almost-ranking queries, note which published pages are climbing or sliding. Twenty minutes of suggestion surfaces: run your core seeds through autocomplete and PAA again, because the suggestions drift as the season and the niche move, and the drift is the news. Twenty minutes of triage: fold the new findings into the queue, demote anything that has stopped looking winnable, and pick what gets written next.
Close the loop by checking results, not vibes: for each article published from the queue, look — after a few weeks — at whether it is getting impressions for the queries it targeted. Impressions arriving means the engine understood the page; impressions without clicks means the title is losing the listing-versus-listing fight (fix the title, not the article); neither means the page missed the intent or the topic was harder than the read suggested — both worth knowing while the lesson is fresh. This feedback trains your judgment in a way no tool can: after a few cycles you will read a results page and feel whether it is winnable, which was the skill the subscription was renting you all along. The data was free; the judgment is the asset; the monthly hour is what builds it.
The honest limits of the free workflow
Fairness requires naming what the paid tools genuinely do better, so you can decide if and when you ever need one. Competitor keyword inventories — the full list of what a rival ranks for — have no free equivalent; you can sample a competitor's sitemap and read their navigation, which reveals their content strategy, but not their query-level wins. Backlink analysis is similarly gated. Historical trend data beyond your own site, rank tracking across hundreds of terms, and research at the scale of thousands of pages are all real conveniences of the suites. If you are running content for a large site or for clients, the subscription pays for itself in time saved. Those are real capabilities, and nothing above replaces them.
But notice what they have in common: they matter at scale and in competitive intelligence, not in the core loop of finding things your audience wants that you can write and win. For a site publishing a handful of articles a month, the free workflow covers that loop end to end with better-grounded inputs than the estimates — your own impressions instead of modeled volume, the actual results page instead of a difficulty digit, the audience's actual sentences instead of a tool's suggestions. Start free, build the monthly habit, and let a genuine, recurring wall — not a marketing page — be the thing that tells you when scale finally justifies a subscription. Most small sites never hit that wall.
- Free covers: finding winnable topics, judging difficulty, clustering, and feedback on what you published.
- Paid genuinely adds: competitor query inventories, backlink data, rank tracking at scale.
- Trigger to upgrade: research volume, not research quality — hundreds of pages or client work.
- Until then: Search Console monthly, suggestion surfaces quarterly at minimum, and read every SERP you intend to enter.