Here is the mistake almost everyone makes with public data. They treat finding a good dataset as the hard part — the win. They download a clean spreadsheet of government statistics, feel the thrill of discovery, and assume the links will follow. They will not. The dataset is free, public, and equally available to every competitor you have. A resource that anyone can reach earns nothing by existing, because there is nothing to link to that is yours.
The asset is never the dataset. The asset is the transformation — the cleaning, the angle, the statistic that did not exist until you computed it, the visualisation, the page you build and maintain. Two teams can start from the identical open dataset and end in completely different places: one with a flat backlink graph, the other with coverage in national press. The gap between them is not the data. It is everything they did to the data.
This guide covers both halves of the job in the order you actually do them. First, how to choose which public dataset is worth your time, using a selection filter that scores the dataset before you invest in it. Then, where to find the datasets worth choosing — a sourced directory weighted toward UK and global official statistics. Then, how to package what you find into something genuinely linkable, including the one move that separates a single-link dataset from a fifty-link one. If you read only the next section, you will have the framework and the metric. Everything after is the directory, the pipeline, the evidence, and the honest list of when to walk away.
The deliverable: the Public-Dataset Linkability Filter
Before you download anything, score the dataset. Most public data is not worth packaging — it is stale, already exhausted, impossible to slice, or legally fenced. The Public-Dataset Linkability Filter rates a candidate dataset across five weighted dimensions, each scored 0–10, multiplied by its weight, and summed to a single figure out of 100. The weighting reflects what actually drives links from open data: sliceability and angle exclusivity, not how big or impressive the dataset looks.
| Dimension | Weight | What a 10 looks like |
| Sliceability | 30% | The data splits cleanly by geography, segment, or time — so one dataset yields many individually pitchable angles instead of a single story. |
| Angle exclusivity | 25% | There is a strong, under-exploited angle. The obvious story has not already been published a hundred times by bigger sites. |
| Human relevance | 20% | The numbers connect to something readers feel — money, health, place, status, safety — not an abstraction only specialists care about. |
| Freshness & cadence | 15% | The data is recent and updated on a known schedule, so the asset can be refreshed and re-pitched rather than decaying to nothing. |
| Licence & usability | 10% | You can legally republish and visualise it with clear attribution, and you can actually work with it without a dedicated data team. |
Score the dataset, then act on the band it lands in:
| Score | Verdict | What to do |
| 70–100 | Build the hero | This dataset can carry a flagship asset with many angles. Invest in cleaning, visualisation, and a full seeding programme. |
| 50–69 | One angle only | Worth a single focused story, not a flagship. Package the one strong angle cheaply and move on. |
| Below 50 | Walk away | The data is stale, exhausted, or unusable. Commissioning your own survey will likely earn more exclusive, defensible links for the same effort. |
The metric that decides the upside: Angle Yield
The entire economic case for public data over a one-off survey is the multiplier. A survey gives you one story. A sliceable public dataset gives you many — one per region, one per segment, one per time comparison — and each angle is a separate pitch to a separate set of journalists and a separate linkable page. The metric that captures this is Angle Yield (AY): the number of distinct, individually pitchable stories you can extract from a single dataset.
Angle Yield ≈ Geographic splits × Meaningful segments × Time comparisons
A national dataset that breaks down by 12 regions, three age segments, and a year-on-year change does not give you one story — it gives you a national headline plus a dozen regional angles, each of which a local outlet treats as its own news. That is why a data study built on a sliceable public dataset can out-earn a far more expensive proprietary survey: the survey is one shot, the dataset is a magazine of shots fired one at a time. The practical move is to compute AY before you build, because it tells you whether you are looking at a single-link project or a campaign. A dataset with an AY of 1 is a blog post; a dataset with an AY of 40 is a quarter of digital PR.
Why raw public data earns nothing (the data vs the belief)
The belief: useful data attracts links. The reality: the web is indifferent to data that anyone can already reach. Across the largest public studies of backlinks, roughly 94% of content earns no links at all, and a raw dataset sitting on a government portal is the purest example of why — there is no reason to link to your copy of public information rather than the source, and usually no reason to link to the source either, because raw tables are not what writers cite. They cite the story someone built from the tables.
This is the crucial mental shift. Open data is a commodity input, not a finished product. The value you add — and therefore the thing people link to — lives entirely in the transformation: the dataset you cleaned so others did not have to, the angle nobody else spotted, the statistic you computed that becomes the quotable line, the chart a journalist can embed. When a brand publishes the only usable analysis of a public dataset, it becomes the de facto source on that topic, and sources get cited with links. The data was free; the citability was manufactured.
This is also why data-led work dominates earned-link strategy in 2026. Journalists need numbers to anchor stories, and a clean public dataset turned into a clear finding hands them exactly that. Industry surveys consistently rank data-led campaigns as the primary digital PR tactic, and the average data-led campaign earns links from roughly 42 unique referring domains — a yield no amount of cold outreach for guest posts matches. There is a second dividend, too: the same analyses that earn editorial links increasingly earn AI citations, and brand mentions now correlate far more strongly with AI search visibility than raw backlinks do. A well-packaged dataset is one of the few assets that earns on both fronts at once. For the wider numbers behind how links and mentions drive rankings and AI visibility, our link building statistics for 2026 pulls the current figures together; for where data assets sit among the other tactics, the fifteen core link building strategies guide places them on the map.
There is a competitive dynamic worth naming, because it changes how you choose datasets. With public data, you are never the only one who can reach the source — so the link goes to whoever publishes the definitive treatment first and maintains it. This creates a winner-takes-most pattern: the first brand to clean a dataset properly, find the sharp angle, and build the canonical page becomes the reference everyone else cites, and late entrants analysing the same data find there is nothing left to win. The strategic implication is to compete on packaging speed and quality where a dataset is fresh, and to avoid datasets where someone has already planted that flag. The Filter’s angle-exclusivity dimension exists precisely to keep you out of fights you have already lost.
Where to find public datasets worth packaging
The sources below are organised by tier. Start with official statistics — they carry the credibility journalists trust and the methodology that survives scrutiny. The discovery layers help when you have a topic but not yet a source. Throughout, the question is not “what data exists?” but “which dataset scores highest on the Filter?”
UK and global official statistics
| Source | What it covers and where it shines |
| data.gov.uk | The UK government’s open data portal — tens of thousands of datasets from central government, local authorities, and public bodies. The first stop for any UK angle. |
| Office for National Statistics (ONS) | UK population, economy, labour market, wellbeing, and more — highly sliceable by region and over time. The single richest seam for UK regional and demographic angles. |
| NHS Digital / UK Health data | Health and social care statistics with strong human-relevance scores. Sensitive, so handle framing carefully — but high-impact when done responsibly. |
| Land Registry, Companies House, police.uk | Property prices, company formations, and crime by area — three of the most reliably linkable UK open datasets because they slice to postcode-level local angles. |
| World Bank Open Data | Internationally comparable development indicators across countries and decades. Built for cross-country comparison angles and long-run trend stories. |
| OECD, Eurostat, UN, WHO, IMF | The intergovernmental tier — economics, employment, health, environment — all comparable across nations and regions. Eurostat is strongest for sub-state European comparisons. |
US sources worth knowing (useful even for UK sites targeting US coverage)
- Data.gov — the US federal portal, the analogue of data.gov.uk, with hundreds of thousands of datasets across every topic.
- US Census Bureau and the American Community Survey — the workhorse behind a huge share of US data-PR campaigns, because it slices to state, county, and city level and connects directly to money, housing, and demographics.
- Bureau of Labor Statistics (BLS) and Bureau of Economic Analysis (BEA) — wages, employment, prices, and GDP. Reliable, current, and endlessly sliceable by metro area and industry.
Discovery layers and aggregators
| Source | Best for |
| Google Dataset Search | Finding open datasets across the web when you have a topic but not a source. Filter by last-updated and usage rights to pre-screen for freshness and licence. |
| Google Public Data Explorer | Quickly visualising and sanity-checking data from the World Bank, OECD, Eurostat and others before committing to a full build. |
| Our World in Data | Pre-cleaned, well-sourced global datasets on big topics — excellent for trend and comparison angles, with transparent methodology you can credit. |
| Kaggle, AWS Open Data, dataportals.org | Large repositories and a directory of open-data portals worldwide — useful for niche, regional, or unusual datasets the headline portals do not surface. |
| Sector sources (360Giving, CharityBase, ADR UK, local portals) | Under-mined niche data — grants, charities, linked administrative datasets, and city-level portals. Lower competition means higher angle-exclusivity scores. |
A note on the sector and local tier: it is consistently underrated. The headline national datasets have been mined by every large agency, which drags their angle-exclusivity score down. A council’s open data portal or a niche grants dataset has almost no competition, which means a modest, well-packaged finding from it can out-earn a tired analysis of national figures. When the Filter penalises a famous dataset for being exhausted, the local and sector tier is where you go looking instead.
Use the discovery layers to pre-screen, not just to find. When a candidate dataset surfaces in Google Dataset Search, the last-updated and usage-rights filters let you check two of the five Filter dimensions — freshness and licence — before you have downloaded anything. A two-minute pre-screen at this stage saves the far larger cost of cleaning a dataset only to discover it is three years stale or fenced behind a non-commercial licence. Treat finding as the cheap part and qualifying as the part that protects your time.
The seven angles hidden in every good dataset
Angle Yield is only as high as your ability to see the angles. Most people look at a dataset and find one — the headline number — and stop. Trained eyes find seven kinds of story in the same table. Run a candidate dataset through this list and the AY figure stops being a guess and becomes a count.
| Angle type | The story it tells | Who links to it |
| Geographic | The same metric ranked or mapped by region, city, or postcode. The highest-yield angle of all. | Regional and local press — dozens of separate outlets, each linking to its own area. |
| Segment comparison | How the metric differs by age, income, sector, gender, or company size. | Trade and special-interest publications covering each segment. |
| Trend over time | How the number has changed — year on year, over a decade, since a known event. | National news framing “rising/falling” stories and explainer pieces. |
| Ranking / league table | A best-to-worst ordering that becomes the reference list others cite. | Everyone covering the topic — rankings are inherently quotable. |
| Outlier / surprise | The place or group that bucks the trend — the counterintuitive finding. | Journalists hunting a fresh, shareable hook. |
| Correlation | Two public datasets joined to reveal a relationship neither shows alone. | Analytical and policy outlets; high defensibility if done carefully. |
| Projection | A defensible forward estimate — “on current trends, by 2030…” | Forward-looking and trend-watch coverage. |
Two cautions keep this list honest. Correlation angles are powerful but dangerous: joining two datasets to imply a relationship is exactly where data stories get torn apart by careful readers, so reserve correlation for cases where the link is defensible and frame it as association, never cause. And projections must be modest and transparent — a wild extrapolation earns a quick link and a lasting credibility hit. Used carefully, though, these seven angle types are why one strong public dataset routinely supports an entire quarter of coverage rather than a single post. The geographic angle alone, applied to a dataset that slices to local level, can carry a campaign on its own.
How to package a dataset into a linkable asset
This is the eight-step pipeline that turns a chosen dataset into something people link to. Each step has a gate; skipping the unglamorous ones — licence and cleaning — is how teams end up with a beautiful chart built on a number they have to retract later.
Step 1 — Verify the licence before anything else
Confirm you can legally republish, visualise, and build commercial content on the dataset, and note the exact attribution the licence requires. Most government open data permits this under an open licence, but not all data does, and “publicly visible” is not the same as “licensed for reuse.” The gate: you can state in one sentence what the licence allows and how you must credit the source. If you cannot, stop here.
Two licence traps catch teams who skip this step. The first is the non-commercial clause: plenty of academic and some public datasets are free to view but explicitly barred from commercial use, which a link-earning campaign for a business plainly is. The second is the share-alike or attribution-specific requirement that dictates exactly how the source must be credited — get the wording wrong and you are technically in breach even when acting in good faith. Read the actual licence rather than assuming, screenshot it for your records, and build the required credit into the asset from the start. It is a five-minute check that prevents a takedown notice landing the week your campaign is getting its best coverage.
Step 2 — Clean and validate (the step that creates the moat)
Cleaning is the least glamorous and most valuable step. Public datasets arrive messy: inconsistent formats, missing values, mismatched categories, footnotes that change what a column means. The team that does this work properly creates the defensible asset, because the cleaned, structured version is the thing others would rather link to than recreate. Validate every figure you intend to publish against the source, and document any decision you made — how you handled gaps, which year you used, what you excluded. The gate: every number you will publish is traceable and reproducible. A statistic you cannot defend is a liability, because the first correction request unravels the citation.
Three pitfalls cause most retractions, and all three are avoidable here. Mismatched categories — where two regions or two years define a metric differently — produce comparisons that look damning and are simply wrong; check that like is being compared with like before you compute anything. Silent rounding and unit changes inside a dataset turn a real trend into an artefact, so confirm the units are consistent across the whole series. And small denominators make ratios swing wildly: a “300% increase” built on a base of two cases is not news, it is noise. Catching these is unglamorous, but it is exactly the work that makes your version the one journalists trust — and trust is what converts coverage into an actual link rather than a passing mention.
Step 3 — Find the angle, not the number
Raw data does not have a story; you find one. The angle is the surprising, human, or locally relevant finding hidden in the table — the place that bucks the national trend, the gap that widened, the comparison nobody had drawn. Ask of every cut: would a journalist’s editor approve this as news? If the finding is “things are roughly as expected,” keep digging or change datasets. The angle is what separates a data dump from a data story.
Step 4 — Compute the citable claim
Write the one sentence a writer will quote: “According to [your brand]’s analysis of [source], [specific number].” That sentence is the product. It must be specific, attributable to you, and grounded in the data you cleaned in Step 2. If you cannot write it, you have a chart, not a claim — and charts without claims rarely earn the link.
Make the attribution effortless on the page itself. Include a short, explicit “cite this analysis as…” line with the exact wording, the anchor text you want, and the canonical URL. You will not always get the clean linked credit you ask for, but spelling it out raises the rate at which coverage turns into a real backlink rather than an unlinked mention — and the difference between those two outcomes is the difference between an asset that moves rankings and one that only flatters the brand. Phrase the credit so it names your brand as the analyst and the public source as the data, which is both honest and the framing journalists are most willing to reproduce.
Step 5 — Maximise Angle Yield by slicing
This is the step that turns one link into many. Take the national finding and slice it — by region, by city, by segment, by year-on-year change. Each slice becomes a localised or specialised version of the story that a regional outlet or trade publication treats as its own. The mechanic is well documented in digital PR: when a study breaks national data down by area, each area’s figure becomes a separate pitch, and the campaign multiplies its coverage accordingly. This is also where public data beats programmatic guesswork — the slices are real, sourced, and defensible. Build the slicing into the asset so each angle has its own anchor and, where it makes sense, its own page.
Two practical disciplines make slicing pay. First, pre-write the per-slice line. For a geographic study, that means a one-line finding for every region — “in [region], the figure is [number], the [highest/lowest] in the country” — ready to drop into a tailored pitch. The slice is worthless to a local editor if you make them dig it out of a national table themselves. Second, respect statistical honesty as you cut: the more finely you slice, the smaller each sample gets, and a regional figure built on too few data points is a correction waiting to happen. Note the cut-off below which you will not publish a slice, and hold to it even when a tiny sample produces a tempting headline. A defensible dozen angles beat a reckless fifty.
Step 6 — Visualise and build the canonical asset page
Numbers need a home and a face. Build the dataset into a dedicated, indexable page — not a buried section of a blog post — with clear visualisations and an embeddable chart or table. The embed matters: when a writer embeds your visualisation, they do the linking for you, and the link travels with the chart wherever it is re-used. Publish the full methodology alongside the asset; it is what converts a sceptical editor into a citing one. For the visualisation, prospecting, and outreach platforms that support this stage, our guide to the best link building tools covers the stack you will lean on next.
Step 7 — Seed to aligned linkers, lead with the angle
Promotion is targeted, not broadcast. Map the specific journalists, regional outlets, trade publications, and resource pages that cover each angle from Step 5, and pitch the finding rather than the dataset. Editors do not want “we analysed some open data.” They want “here is the number for your region, and here is the methodology.” Lead with the localised statistic, attach the chart, link the methodology. This is the easiest outreach in link building because you are handing over news, not asking for a favour. The mechanics of finding contacts and following up are the same ones that power every earned-link tactic; what changes is the strength of what you are offering.
Step 8 — Refresh on the data’s cadence and let it compound
Most public datasets update on a schedule — monthly, quarterly, annually. Each update is a fresh reason to recompute your finding, re-pitch everyone who cited the last version, and reinforce your position as the canonical source. This is how a single dataset becomes an annual fixture that earns links every cycle. Set the refresh in the calendar at launch, track referring domains per angle, and watch the compounding: the recurring report that journalists bookmark is the highest-return form of this entire discipline.
The pre-publish audit
Before you pitch a single journalist, run this pass. A data asset earns nothing — or worse, earns a retraction — when any one of these fails, so treat each as a gate.
| Check | Pass condition |
| Licence stated | The page names the source and credits it exactly as the open licence requires. |
| Every figure reproducible | You can re-derive each published number from the source data and your documented method. |
| Citable claim is on the page | The headline statistic appears in text, attributed to your analysis, exactly as a writer would quote it. |
| Methodology is published | A linked, plain-English note explains the data, the date range, and every judgement call you made. |
| Embed exists | There is a copyable chart or table embed so writers link by re-using the visual. |
| URL is canonical and indexable | The asset lives on its own crawlable page, not buried in a post or behind a form. |
| Angle list and linker map ready | Each sliced angle has a named set of target outlets before launch, not after. |
What it looks like when it works
The clearest evidence comes from campaigns that publicly documented their results, and they map almost exactly onto the pipeline above — especially the Angle Yield step.
A US personal-finance brand built a study estimating the tax an average person pays over a lifetime in each state, using two public datasets — the American Community Survey and the Consumer Expenditure Survey. Because the analysis broke down by state, every state became its own pitch. The campaign earned 727 backlinks from outlets including The Washington Post, Yahoo, USA Today, and CNBC, drew tens of thousands of visits, and won an industry award. That is Angle Yield in action: one cleaned pair of public datasets, fifty state angles, hundreds of links.
The same source documents an Australian campaign that combined government and census data with a survey to produce “best and worst” regional comparisons, earning dozens of backlinks and national broadcast coverage by targeting regional journalists with local angles — again, the slicing step doing the heavy lifting.
In the UK, an agency has described using public-domain data to build a campaign that landed 34 pieces of coverage at an average Domain Rating of 55, with a 69% link-to-coverage rate and placements in Sky News, Huffington Post, and the Daily Mirror. The figure worth noting is the link-to-coverage rate: a well-built, credible data asset converts coverage into actual links at a far higher rate than thin, link-less brand mentions, because editors are willing to credit a defensible source.
The through-line across all three: none of them won because they found secret data. They won because they cleaned ordinary public data, found a human angle, sliced it for yield, and packaged it so journalists could cite it. For the grounding on why these earned, editorially given links outperform anything manufactured, our explainer on what link building is and why it still matters lays out the case.
The failure case is just as instructive, and far more common. A composite drawn from patterns we see repeatedly, rather than any single named campaign, runs like this: a team finds an interesting national dataset, publishes a tidy summary chart with the headline figure, posts it on the blog, and shares it with their own audience. A few weeks later it has earned almost nothing. Diagnosed against the Filter and the pipeline, the failures stack up — the dataset was famous and its obvious angle already exhausted (low angle-exclusivity), the team never sliced it so Angle Yield stayed at one, and the finding lived inside a blog post rather than on a canonical, embeddable page. The data was fine. Every decision after the download was the problem. This is the pattern the rest of this article exists to prevent: the dataset is rarely what fails, and the dataset is rarely what wins.
Public data or a survey? Choosing where to spend
Because public data and commissioned surveys compete for the same data-PR budget, it helps to know when each wins. The trade is exclusivity against cost and yield, and the Filter score is the tie-breaker.
| Factor | Public dataset | Commissioned survey |
| Cost to acquire | Free — cost is cleaning and analysis time only | Paid — fieldwork and sample cost up front |
| Exclusivity | Low — competitors can reach the same source | High — the data is uniquely yours |
| Angle Yield | Often high — sliceable by region and segment | Designed in — you can build slicing into the questions |
| Speed to launch | Fast — the data already exists | Slower — fieldwork takes weeks |
| Best when | The dataset is fresh, sliceable, and not yet mined | The public angle is exhausted or your question is novel |
The decision rule: if your chosen dataset clears 70 on the Filter and the sharp angle is still open, use the public data — it is faster, free, and high-yield, and exclusivity matters less when you win on packaging. If the dataset scores below 50 because every angle is taken, that low score is the signal to spend on a survey instead, buying the exclusivity the public source can no longer give you. The two are not rivals so much as tools for different conditions, and the Filter tells you which condition you are in.
When not to use public datasets
Honest advice includes the cases where open data is the wrong choice. Reach for a survey, proprietary data, or a different tactic entirely when:
- The dataset is already exhausted. If the obvious angle from a famous national dataset has been published a hundred times, your version adds nothing and earns nothing. Low angle-exclusivity is a hard stop, not a hurdle.
- You cannot verify the licence. Building commercial content on data you are not permitted to reuse is a legal and reputational risk that no amount of coverage justifies.
- The data is stale and un-updatable. A one-off figure from a discontinued dataset earns a brief flurry and then decays, with no refresh to compound it — the weakest possible return profile.
- There is no human angle. Data that only specialists care about will only be cited by specialists, if at all. If you cannot connect the numbers to money, place, health, or status, the human-relevance score is fatal.
- You have no capacity to clean and validate. Publishing a wrong statistic from messy data is worse than publishing nothing — corrections cost citations and trust. If you cannot do Step 2 properly, do not start.
- A survey would be more exclusive for the same cost. When the dataset scores below 50 on the Filter, commissioning original survey data often buys a more defensible, more ownable angle that competitors cannot replicate from the same public source.
Your Monday-morning deliverable
Everything above collapses into a repeatable process you can run this week:
- Pick three candidate datasets from the directory and score each against the Public-Dataset Linkability Filter (Sliceability 30, Angle exclusivity 25, Human relevance 20, Freshness 15, Licence 10). Keep only those scoring 50 or higher.
- For your top dataset, compute Angle Yield: geographic splits × segments × time comparisons. If AY is 1, treat it as a single post; if AY is high, plan a campaign.
- Verify the licence and write the one-sentence statement of what you may publish and how you must attribute it.
- Clean and validate, documenting every decision so each published figure is reproducible.
- Write the citable claim — “according to our analysis of [source], [number]” — then slice it into per-region and per-segment versions.
- Build the canonical, indexable asset page with visualisation, embed code, and full methodology.
- Map aligned linkers per angle, pitch the finding (not the dataset), and set the data’s next refresh date in the calendar before you launch.
Run that sequence and you stop treating public data as the finish line and start treating it as the raw material it actually is. The dataset is free and so is your competitor’s access to it. The cleaning, the angle, the slicing, and the packaging are not free — which is exactly why they are what earns the links. Find the data that scores, transform it into something only you have built, and let the refresh cycle compound it for years.
