entity salience seo

Entity Salience and On-Page Co-Occurrence: Engineering Topic Authority

TL;DR

Entity salience is how central an entity is to a page — and Google literally scores it. Run any page through Google’s Natural Language API and every entity comes back with a salience value between 0 and 1. That number is the closest thing you have to Google’s own answer to “what is this page about, and who owns it?”

Co-occurrence is the companion signal: which entities you appear alongside, on the page and across the web. Salience makes you central to a topic; co-occurrence ties you to the other entities that define it. Together they are how topic authority is actually engineered in 2026.

The deliverable below — the Salience Gap Audit — extracts the entities Google attributes to your page, compares them to the canonical entity set for the topic, and returns a Salience Coverage score plus the exact entities you are missing or under-weighting.

Five on-page levers move salience, in priority order: the title/H1, the first 100 words, subject position, frequency-with-variation, and the about/mentions schema. Internal links with descriptive anchors are on-site co-occurrence; earned mentions beside topic entities are web-wide co-occurrence.

Run it on a loop — Map, Measure, Close, Corroborate — and re-score at 30/60/90. UK sites win fastest by engineering co-occurrence with the place and category entities Google already trusts for their market.

Most SEO advice about “topical authority” stops at a metaphor: cover a topic deeply and Google will trust you on it. True, but useless as an instruction, because it tells you nothing about the mechanism or how to measure progress. The mechanism has a name and a number. Google’s Natural Language API assigns every entity it finds on a page a salience score between 0 and 1 — a measure of how central that entity is to the page’s meaning. The entity with the highest salience is, in Google’s reading, what the page is about. That single value turns topical authority from a vibe into something you can audit, engineer and track.

This guide treats salience and its companion signal — co-occurrence, which entities appear together — as engineering inputs rather than happy accidents. Salience makes your brand or your page central to a topic. Co-occurrence binds you to the specific entities that define that topic, so that when an answer engine assembles its understanding of the space, you are one of the nodes it reaches for. Get both right and you are not “writing about” a topic; you are becoming part of its structure in the graph the machines actually consult.

1. What salience actually is — and how Google measures it

Salience is not frequency. A page can mention “invoicing” forty times and still have low invoicing salience if the term is buried, syntactically peripheral, and never the grammatical subject. Equally, a page can mention an entity three times and have it score as the dominant entity because of where and how those mentions appear. Google’s Natural Language API documents the salience field explicitly: a 0–1 score per entity, with the scores across a document summing toward a normalised whole, so raising one entity’s salience necessarily lowers others’. That zero-sum quality is the single most important thing to understand, and the thing most content gets wrong.

The practical consequence: a page that tries to be salient for everything is salient for nothing. The classic illustration is a profile page that targets a person but never names the works, awards or roles that define them — the subject’s own salience collapses because the supporting entities that would corroborate “aboutness” are absent. The fix is not more mentions of the target; it is the deliberate presence of the right neighbouring entities, which is where co-occurrence enters.

It also helps to separate salience from the two things it is most often confused with. It is not relevance in the old keyword sense — relevance asked whether a query term appeared on the page at all. And it is not TF-IDF, the term-weighting maths that underpins a generation of content tools; TF-IDF measures how unusual a term is relative to a corpus, which is useful for finding gaps but says nothing about whether an entity is the grammatical and structural subject of your page. Salience is a document-internal judgement about centrality. A term can be rare (high TF-IDF) and still peripheral (low salience), and a common word can be the dominant entity. Optimising one is not optimising the other, which is why pages built to hit term-frequency targets so often fail to register as authoritative: they are dense without being central.

The cleanest way to feel the difference is the subject-versus-object test. “Our platform automates expense management” makes the platform the subject and expense management a mere object — so the page reads, to an NLP model, as being about the platform, not the category. Flip it: “Expense management is the process of … and our platform automates it end to end” makes the category the subject of the defining sentence, then ties your brand to it. Same words, roughly the same density, materially different salience distribution. Multiply that choice across a page and you have either engineered or sabotaged your topic authority without changing a single fact.

The signals that move a salience score

Salience is computed from a small set of observable signals. Knowing them is what makes salience engineerable rather than mysterious:

Salience driverWhy it raises salienceLeverage
Position (first ~100 words)Entities introduced early are read as the document’s subject, not an aside.High
Title / H1 presenceThe most weighted location on the page; an entity here anchors the whole document’s aboutness.High
Syntactic role (subject)Entities acting as the grammatical subject of sentences score higher than those buried as objects or modifiers.High
Frequency with variationRepeated reference — using natural variants and synonyms, not exact repetition — reinforces centrality without keyword stuffing.Medium
Co-occurring topic entitiesSurrounding the target with the entities that define its topic corroborates what the page is about.High
Structured data (about / mentions)Explicitly declares the primary subject vs referenced entities, training both Google and LLMs.Medium

Notice that three of the six highest-leverage drivers are about placement and role, not volume. This is the clean break from the keyword-density era. Density optimisation asked “how many times does the term appear?” Salience optimisation asks “is this entity the structural subject of the page, surrounded by the entities that prove it?” The second question is the one Google’s models actually answer — and the one that also governs whether an LLM treats your page as a coherent source on the topic when it decides which pages to pull into an answer.

The UK framing

For UK sites the salience lens reframes a familiar problem: ranking nationally for a category term you are genuinely expert in, yet losing to thinner competitors. Frequently the cause is not authority but salience confusion — the money page tries to be salient for the service, the location, three adjacent services and a blog-style explainer all at once, so Google cannot resolve a single dominant entity. A UK accountancy firm’s “services” page that is equally about self-assessment, payroll, VAT, R&D credits and “London” has, in salience terms, no subject at all. The remedy is structural, not promotional: one dominant entity per page, corroborated by its natural neighbours, with the others demoted to their own salient pages and linked together. That is the topical-cluster architecture, expressed in the units Google measures.

2. The deliverable: the Salience Gap Audit

Engineering salience starts with measuring what Google currently sees — not what you intended to write. The Salience Gap Audit is a repeatable process that compares the entities Google attributes to your page against the entities the topic requires, and returns a coverage score plus a prioritised fix list. It runs in four moves.

  1. Extract what Google sees. Run the page through the Natural Language API (or an equivalent entity extractor). You get back every entity with a salience score. Sort descending: the top entity is what Google thinks the page is about.
  2. Define the canonical entity set. For your target topic, list the entities a genuinely authoritative page would contain — derived from what consistently co-occurs on the pages already winning the SERP and AI answers for the query. This is your target set.
  3. Compute Salience Coverage. Coverage = (canonical entities present on the page ÷ total canonical entities), weighted so that a present-but-low-salience entity counts as a partial hit. A simple, honest version: 1.0 per entity present and salient, 0.5 present but weak, 0 absent.
  4. Rank the gaps. List the absent and under-salient entities by how central they are to the topic. Those are your edits — in priority order, not a wishlist.

A worked example makes the artefact concrete. Suppose a UK SaaS targets the topic “expense management software.” The API returns this (illustrative) salience profile for the current page:

Entity on pageSalience (0–1)Verdict
expense management software0.41Dominant — correct subject
receipt scanning0.14Present, healthy
company (brand name)0.11Present — wants to be higher
integration0.07Thin
HMRC / VAT compliance0.00Absent — critical UK gap
approval workflow0.00Absent — defining topic entity
corporate card0.00Absent — defining topic entity

The dominant entity is right, which many pages get wrong. But three defining entities are absent — and for a UK product, “HMRC / VAT compliance” being at zero is a category error, because that is exactly the co-occurrence a British buyer (and a UK-tuned answer engine) expects. Salience Coverage here is low not because the page is short but because it is missing the entities that prove topical depth. The audit converts a vague “add more detail” into a precise instruction: introduce approval workflow, corporate card and HMRC/VAT compliance as salient, subject-position entities, each with enough supporting context to register above zero.

Building the canonical entity set well. The audit is only as good as the target set, so do not invent it from intuition. Run entity extraction against the three to five pages and AI answers already winning your query, and take the entities that appear across most of them — that intersection is the topic’s consensus structure. Add the UK-specific entities a domestic page would be expected to carry (the relevant regulator, the tax or compliance regime, recognised national bodies), because national answer engines weight them. The result is a defensible, evidence-based target rather than a keyword list, and it doubles as the column headers for the Co-Occurrence Matrix in the next section.

Scoring bands. Salience Coverage under 0.4: the page is not topically complete — expect to be out-competed by deeper pages regardless of links. 0.4–0.7: competitive but beatable; close the highest-centrality gaps. Above 0.7 with the correct dominant entity: topically authoritative — shift effort to corroboration and internal linking. Run the audit across a cluster and rank pages ascending by coverage; the lowest-coverage pages on your most valuable topics are where the work pays back first.

This is deliberately a measurement, not a content brief. It tells you what to add and in what order, and it gives you a number to move. Pair it with a competitive view — the same entity-extraction run against the top three pages currently winning the query exposes the exact entities they co-occur with and you do not, which is the on-page equivalent of the gap analysis in our guide to reading a competitor’s profile to find what you are missing.

At scale, the audit becomes a portfolio instrument. Export your priority pages, run extraction on each, and record three columns: the dominant entity the API returns, whether it matches the intended subject, and the Salience Coverage score. The pages where the dominant entity is wrong — where Google thinks the page is about your brand, a feature, or a location rather than the category you meant to own — are the most urgent, because no amount of corroboration fixes a page that is centrally about the wrong thing. Sort by that flag first, then by coverage ascending, and you have a remediation queue ordered by impact rather than by whim. For a content team, that single spreadsheet replaces months of “this page feels thin” guesswork with a ranked, evidence-based backlog.

3. The Co-Occurrence Matrix: binding your entity to the topic

Salience makes a page central to a topic. Co-occurrence does something subtly different and more durable: it teaches the graph that your brand entity belongs in the company of the entities that define a space. When your brand consistently appears beside the category’s defining concepts — on your own pages and, crucially, across the web — the knowledge graph and the LLMs learn the association. That association is what makes you a candidate when the category’s questions are asked, even on queries where you do not rank in the classical sense.

The Co-Occurrence Matrix is the planning artefact. Down the left, your brand and your priority pages. Across the top, the defining entities of your topic (drawn from the canonical set in Section 2). In each cell, two marks: does the pair co-occur on-page, and does it co-occur across the web (in third-party coverage)? The empty cells are your roadmap — and they split cleanly into on-page work (Section 4) and earned-mention work (Section 5).

Topic entityOn-page co-occurrenceWeb co-occurrenceAction
approval workflowMissingMissingAdd on-page; pitch into a relevant guide
HMRC / VAT complianceMissingWeakAdd on-page (priority); earn UK-press mention
corporate cardWeakMissingStrengthen on-page; seek co-mention
spend controlPresentWeakMaintain; build web co-occurrence
category benchmark dataPresentPresentDefend; this is working

The matrix also clarifies a decision people get wrong: where to declare it. Schema gives you an explicit channel. The about property names the entities a page is primarily about; the mentions property names entities it references in passing. Used honestly — about for your one or two dominant entities, mentions for the supporting cast, each linked to its Wikipedia or Wikidata URL via sameAs — these properties hand Google and the LLMs a machine-readable co-occurrence declaration that backs up the prose. Schema does not substitute for the on-page substance; it confirms it. A page whose about declaration and whose actual salience profile agree is a page the systems can trust quickly.

Concretely, the expense-management page above would carry an about array naming its one dominant entity — expense management software, linked via sameAs to its Wikipedia/Wikidata node — and a mentions array naming approval workflow, corporate card and HMRC/VAT compliance, each linked to its own canonical node. That is a single, honest declaration that says: this page is principally about X, and it legitimately discusses Y and Z. When the prose and the salience profile already support that claim, the schema is corroboration the systems can act on immediately; when they do not, the schema is an unbacked assertion the systems learn to discount. The order of operations therefore matters — write and measure first, declare second.

4. Engineering salience on the page

With the audit and the matrix in hand, the on-page work is mechanical. Apply the levers in this order — they are ranked by effect on the salience score, so a small intervention near the top beats a large one near the bottom.

1. Make the dominant entity unmissable in the title, H1 and first 100 words

The opening of the page is where Google decides the subject. Name your dominant entity in the title and H1, then use it as the grammatical subject of the first two or three sentences — not as a passing object. This one move resolves most “no clear subject” failures. Resist the temptation to front-load three competing entities; pick the one this page exists to own and let the rest be supporting.

2. Introduce the defining entities as subjects, with real substance

Each canonical entity from your gap list needs more than a name-drop to clear zero salience. Give it a sentence or two where it is the subject and is explained — “Approval workflows route each claim to the right manager before payment,” not “…including approval workflows.” For a UK page, that is where HMRC/VAT compliance, Making Tax Digital, or the relevant regulator earns a real paragraph rather than a parenthetical, because the British co-occurrence is precisely what a domestic answer engine looks for.

A quick before/after shows the size of the effect. Before: “We also handle VAT.” That clause registers HMRC/VAT compliance at roughly zero salience — present to a human reader, invisible to the model. After: “VAT compliance under HMRC’s Making Tax Digital regime is handled automatically: the system records the right tax codes, files digitally, and keeps the audit trail HMRC expects.” Now the entity is a subject, explained, co-occurring with two further trusted UK entities (Making Tax Digital, HMRC). One rewritten clause, three salience gains. Repeat that for each absent entity on your gap list and Coverage climbs without the page ever feeling optimised.

3. Repeat with variation, never with stuffing

Reinforce centrality through natural variants and pronoun reference, not exact repetition. “Expense management software … the platform … the system … the tool” reads as one coherent subject; “expense management software” five times in two paragraphs reads as manipulation and trips the same quality systems that govern how exact-match anchors are penalised in link profiles. The modern signal is coherence, not density.

4. Use internal links as on-site co-occurrence

Internal links are co-occurrence signals with a steering wheel: a descriptive, entity-rich anchor from a related page tells Google both that the two pages belong together and what the destination is about. Link your supporting cluster pages into the hub with anchors that name the entity, and keep those links in the main body, where they carry the most weight. This is the legitimate, durable form of internal-link shaping — concentrating relevance, not gaming flow — and it is covered in depth in our guide to the internal-link sculpting that actually works in 2026. Bidirectional links across a tight topic cluster are, in salience terms, the cluster declaring its own internal relationships to Google.

There is a quieter benefit to getting the internal anchors right: they teach the cluster’s shape to the systems that now read sites as graphs rather than as page lists. When your “expense management software” hub links out to spokes on approval workflows, corporate cards and VAT compliance using anchors that name those entities, and each spoke links back to the hub by name, you have declared a small knowledge graph of your own — a hub entity with defined relationships to its supporting entities. That structure is precisely what an LLM traverses when it decides which of your pages to surface for a sub-topic question, and it is why a tightly linked cluster routinely out-cites a set of stronger but disconnected pages. The links are not just equity plumbing; they are co-occurrence statements with direction and meaning attached.

5. Declare it in structured data

Finally, encode the result: about for the dominant entity, mentions for the supporting entities, sameAs to their canonical Wikidata/Wikipedia URLs. This closes the loop between what you wrote and what you assert, and it is the cheapest 0.5 of corroboration available. The discipline is honesty — declare only what the page is genuinely about, because a mismatch between declaration and substance is a signal the systems learn to distrust.

UK local co-occurrence. For location-dependent businesses, co-occurrence with place entities is a salience multiplier the systems trust because it is hard to fake. Naming your city, your region, recognised local landmarks and the relevant UK regulator in subject position ties your brand entity to those place entities in the graph — which is exactly what surfaces you in local AI answers. “A Leeds-based payroll bureau serving West Yorkshire SMEs under HMRC RTI rules” co-occurs your brand with four trusted entities in one sentence.

5. Engineering co-occurrence across the web

On-page co-occurrence is necessary but self-asserted. The signal the systems weight most heavily is co-occurrence they did not control: your brand appearing beside the topic’s defining entities in independent coverage. This is where entity engineering and link building become the same discipline pointed at a new outcome. You are no longer only chasing a link’s authority; you are engineering the context the link sits in, because the surrounding text is what teaches the graph the association.

That reframing changes how you value placements. A mention of your brand inside a paragraph genuinely about your topic — even unlinked, even nofollow — can do more for entity association than a higher-authority link dropped into unrelated text, because the co-occurrence is the point. Industry analysis of tens of thousands of brands has found branded mentions correlating far more strongly with AI Overview visibility than raw backlink counts — a gap that only makes sense once you see that the mention places your entity in topical company, while a bare link does not. The tactics that produce contextual co-occurrence are the familiar ones, re-aimed:

  • Expert commentary and reactive PR. Being quoted in the editorial follow-up to a story in your space puts your brand entity beside the exact topic entities a journalist is writing about. The mechanics are in our reactive newsjacking playbook.
  • Source platforms. A specific, expert HARO contribution earns a co-mention inside a topically matched article — high-relevance context by construction.
  • Guest contributions. A genuinely useful guest post on a niche-relevant publication lets you place your brand beside the topic’s defining entities in a context you help shape, with a branded anchor that signals association rather than over-optimised exact match.
  • Contextual insertions. Where appropriate and earned, a branded niche edit into a topically matched paragraph works because LLMs evaluate the surrounding text semantically — the same link in an unrelated sentence contributes far less.
  • Citable assets. An interactive tool or original-data study earns co-occurrence at scale, because every publication that cites your finding reproduces your brand beside the topic — corroboration that compounds.

A UK example makes the valuation tangible. Imagine that same expense-management brand earns two placements in the same month. One is a high-authority link from a general business roundup that mentions the brand in a list, with no topical context around it. The other is a single sentence in a Financial Times-tier piece about Making Tax Digital, quoting the brand’s founder on what the regime means for SME finance teams — beside HMRC, MTD and “expense management” in one paragraph. The first is the better link by classical metrics; the second is the better co-occurrence by a wide margin, because it teaches the graph that this brand belongs in the company of the exact entities that define its category in its market. For entity authority, you would take the second every time — and the work that produces it is reactive, expert-led PR, not volume outreach.

The through-line: branded anchors and topical context beat exact-match anchors and raw authority for entity work. You are teaching the web a relationship, and relationships are learned from consistent, credible repetition across independent sources — the corroboration layer that no amount of on-page optimisation can manufacture alone.

Measuring web co-occurrence is cruder than measuring on-page salience, but a workable proxy exists: for each target entity, count the independent domains where your brand and that entity appear together in genuine context (not a directory listing). Track that count per entity over time. A rising co-occurrence count on your priority entities is a leading indicator that the association is forming in the graph — and it usually precedes the moment your brand starts being named in AI answers for the topic, which is why it is worth watching before the citations themselves arrive. It is the entity-era version of tracking referring domains: a forward-looking number, not a vanity one.

6. The Salience Engineering Loop: putting it on a cadence

Salience and co-occurrence are not a one-off project; they drift as you publish, as competitors deepen, and as the topic itself evolves. Run them as a loop with four stages:

StageWhat you doOutput
MapBuild the canonical entity set for the topic from the pages already winning it.Target entity set + Co-Occurrence Matrix
MeasureRun the Salience Gap Audit on your page and the top competitors.Salience Coverage score + ranked gaps
CloseApply the on-page levers in priority order; declare in schema.A page with the correct dominant entity and full coverage
CorroborateEarn contextual co-mentions beside the topic’s defining entities.Web-wide co-occurrence; durable association

The loop is also how you make the work measurable to a sceptical stakeholder, which matters because salience gains are invisible in a standard rankings report until they convert. Track three things over time: Salience Coverage per priority page, the share of your priority pages whose dominant entity is correct, and the count of independent pages where your brand co-occurs with each target entity. Plot them monthly. A simple Looker Studio view that connects this work to ranking and citation movement turns an abstract programme into a trend a board will fund.

How often to run the loop depends on competitiveness. For your handful of most valuable topics — the ones that drive revenue — a full Map-and-Measure pass every quarter keeps you ahead of competitors deepening their own coverage and catches the moment a topic’s canonical entity set shifts (a new regulation, a new sub-category, a renamed standard). For the long tail, an annual pass is enough. The Close and Corroborate stages run continuously in the background as part of normal publishing and PR, not as a special project. The discipline is simply to never let a priority page drift back to having no clear subject, because that drift is silent and only shows up as a competitor quietly taking your place in the answers.

Verify at 30, 60 and 90 days

  1. Day 30: re-run the entity extraction on edited pages; confirm the dominant entity held and the closed gaps now register above zero salience.
  2. Day 60: prompt the major answer engines with the topic’s defining questions; log whether your brand is named and beside which entities.
  3. Day 90: re-score Salience Coverage across the cluster and tally new web co-occurrences. A page that has not improved coverage by now usually has a competing entity still fighting for the subject slot, or schema that contradicts the prose.

7. Where this breaks in production

Five failure points account for most disappointing salience programmes:

  • Salience dilution from over-stuffing entities. Because salience is roughly zero-sum, cramming twenty entities onto one page flattens them all. Engineer one dominant entity and a focused supporting cast; push the rest to their own pages.
  • Relapsing into keyword density. Hitting a term-frequency target is not salience. If your edits read as repetition rather than coherent subjecthood, you have optimised the wrong thing and risked a spam signal in the process.
  • Schema that overclaims. An about declaration that names entities the page does not genuinely cover teaches the systems your declarations are unreliable. Declare only what the salience profile supports.
  • Engineering salience for the wrong entity. Making a page maximally salient for a term nobody searches, or for your product name instead of the category, wins a contest that does not matter. Anchor the target set to real query and answer-engine demand.
  • Treating the NLP API as Google’s actual model. The public API is a directional proxy, not the live ranking system. Use it to diagnose and to confirm directional movement — not to chase a specific decimal as if it were the algorithm itself.

The bottom line

Topical authority is not a mood the systems are in about your brand; it is a structure you can build in the units they measure. Salience decides whether a page has a clear subject and whether that subject is yours. Co-occurrence decides whether your brand entity lives in the right neighbourhood of the topic, on your pages and across the web. Run the Salience Gap Audit to see what Google currently believes, build the Co-Occurrence Matrix to see where you are absent, close the on-page gaps in priority order, and earn the contextual mentions that corroborate the rest — then loop. UK sites have a quiet advantage here: the place, regulator and category entities that define your market are well-formed in the graph already, so co-occurring with them is some of the cheapest, most trusted topic authority you will ever engineer. Make every important page unmistakably about one thing, surrounded by the entities that prove it, and the machines will stop guessing what you are the authority on — because you will have told them, in their own language.

Leave a Reply

Your email address will not be published. Required fields are marked *

entity disambiguation seo Previous post Entity Disambiguation at Scale: Resolving Brand and Author Confusion