Build a Claude-powered link prospecting agent that finds, scores and qualifies link targets autonomously — with the full 2026 architecture, tool schemas, scoring rubric and cost model.
Link prospecting is the single most time-expensive task in link building, and almost all of that time is wasted on candidates that were never going to qualify. A human prospector pulls a list, opens forty tabs, checks metrics one by one, reads each site to judge topical fit, hunts for a contact, and discards most of what they find. It is repetitive, rule-based judgement at scale — which is exactly the shape of work an AI agent does best. Industry estimates put the time AI-driven research saves at around eight hours per outreach rep per week, and the best agencies now report that machine-assisted prospecting cuts time-to-placement significantly by ranking targets on topical similarity and historical acceptance.
This guide is the canonical build for that system. By the end you will have a complete architecture for a Claude-powered prospecting agent that discovers candidate sites, enriches each with metrics and contact data, scores them against a relevance-first rubric, qualifies or rejects them, and drafts a tailored opening pitch — with a human gate before anything is sent. It is the hub for our deeper agentic-workflow cluster, and it assumes you already understand what backlinks are and why editorial relevance now outweighs raw authority. If you do not, read that first; everything below builds on it.
One framing point before we start, because it determines whether your agent helps or harms you: an agent is not a spam cannon. The 2026 consensus across credible practitioners is that AI should research prospects, enrich data, score relevance, draft pitches and classify replies — while humans keep editorial judgement and final sign-off. Build it that way and it compounds your output. Build it as an autosend machine and Google’s spam systems will eventually find you.
The deliverable: the prospecting agent architecture
Here is the whole system on one page. Every later section builds one of these components. Read this table, and you already understand the agent; the rest is implementation detail.
| Stage | What it does | How Claude is used |
| 1. Discover | Generates and expands a candidate list from seed queries | Claude proposes search queries and competitor link sources; tools fetch results |
| 2. Enrich | Attaches metrics, traffic, topic and contact data to each candidate | Tool calls to SEO and contact APIs; Claude orchestrates which to call |
| 3. Score | Rates each candidate against a weighted rubric | Claude applies the rubric and returns a structured score with reasons |
| 4. Qualify | Sorts into qualify / review / reject tiers | Claude classifies against thresholds and flags edge cases for humans |
| 5. Draft | Writes a tailored opening pitch for qualified targets | Claude drafts using the enrichment context and your offer |
| 6. Gate | Human reviews, edits and approves before send | No autosend; Claude prepares, a person decides |
Stages 1 to 5 are the agent. Stage 6 is the guardrail that keeps you safe. The engine that runs stages 1 to 5 is a single primitive repeated in a loop: tool use. Anthropic’s documentation calls tool access one of the highest-leverage primitives you can give an agent, and that is the right mental model — the agent is mostly a loop around well-defined tools, not a clever prompt.
What a link prospecting agent actually is (and is not)
Three things get called “agents” and only one of them is. Getting the distinction right saves you from over-engineering.
- A script runs a fixed sequence: fetch list, pull metrics, write CSV. No decisions. Reliable, but blind — it cannot tell a relevant site from an irrelevant one.
- A chatbot answers questions in a conversation. Useful for drafting one pitch, useless for processing a thousand prospects.
- An agent runs a loop in which the model decides which tool to call next, reads the result, and decides again — until the task is done. It combines the reliability of a script with the judgement of a chatbot.
The agentic loop is a contract: you specify the available operations and their input and output shapes, and Claude decides when and how to call them. Mechanically, the model returns a structured tool call with a stop_reason of tool_use; your code executes the function and returns the result; the model continues. There are two flavours that matter for cost and architecture: client tools, which your application runs, and server tools, which Anthropic runs (web search and web fetch among them). A prospecting agent uses both — server-side web search to discover, client-side functions to hit your SEO and CRM stack.
The loop, stripped to its essence, looks like this:
| messages = [ system_prompt, user_task ] while True: response = claude.messages.create( model=”claude-sonnet-4-6″, tools=TOOLS, # your tool schemas messages=messages, ) if response.stop_reason != “tool_use”: break # agent is done, return result for call in tool_calls(response): result = run_tool(call.name, call.input) # you execute messages.append(tool_result(call.id, result)) |
That is the entire control structure. Everything sophisticated about the agent lives in the tool definitions and the rubric you give it, not in the loop. This is why practitioners running production Claude agents describe the loop as deliberately boring and debuggable — boring is the goal.
The five jobs you are delegating to the agent
Before writing any code, be precise about what the agent owns. Each job maps to one or more tools and a slice of the rubric.
Job 1 — Discovery
The agent expands a handful of seed inputs (your target keywords, two or three competitor domains) into a broad candidate pool. It proposes search operators, identifies the sites already linking to competitors, and surfaces resource pages, roundups and editorial hubs in your niche. This is where breadth comes from, and it is the cheapest stage to over-produce — you will cut the list hard at scoring. It draws directly on the tactics catalogued in our link building strategies guide.
Job 2 — Enrichment
For every candidate, the agent attaches the data the rubric needs: domain rating or authority, organic traffic, primary topic, whether the site links out editorially, recency of publishing, and a contact email. Enrichment is pure tool orchestration — the model decides which APIs to call and in what order, and assembles the result into a clean record.
Job 3 — Relevance scoring
The agent reads each enriched record and scores it against your rubric. This is the judgement stage, and it is where an agent earns its keep over a script: it can read a site’s actual subject matter and decide whether “SaaS growth blog” really means your niche or just shares a keyword. The 2026 shift is decisive here — domain metrics are still useful but are no longer sufficient as primary quality filters; topical relevance comes first.
Job 4 — Qualification
Scores become decisions: qualify, send to manual review, or reject. The agent applies your thresholds and — critically — flags low-confidence or edge cases rather than forcing a verdict. A good agent knows when to defer to a human.
Job 5 — Pitch drafting
For qualified targets, the agent drafts a short, specific opening pitch using the enrichment context — referencing the publisher’s recent work and the precise asset you are offering. It never sends. Drafts go to the human gate. If your campaign type is guest contribution, the draft aligns with the norms in our guide to guest posting for links.
Step-by-step: building the agent
Step 1 — Define the tools as schemas
Tools are JSON schemas describing a function’s name, purpose and parameters. The description field is not documentation — it is the instruction Claude reads to decide when to call the tool, so write it for the model. Adding strict: true makes tool calls match your schema exactly, which you want for anything that feeds a downstream system. A minimal prospecting toolset:
| TOOLS = [ { “name”: “search_web”, “description”: “Find candidate sites for a query or operator.”, “input_schema”: { “type”: “object”, “properties”: { “query”: {“type”: “string”} }, “required”: [“query”] } }, { “name”: “fetch_domain_metrics”, “description”: “Return DR, organic traffic and primary topic for a domain.”, “input_schema”: { “type”: “object”, “properties”: { “domain”: {“type”: “string”} }, “required”: [“domain”] } }, { “name”: “find_contact”, “description”: “Return a verified editorial contact email for a domain.”, “input_schema”: { “type”: “object”, “properties”: { “domain”: {“type”: “string”} }, “required”: [“domain”] } }, { “name”: “score_prospect”, “description”: “Apply the qualification rubric; return score 0-100 + reasons.”, “strict”: true, “input_schema”: { “type”: “object”, “properties”: { “relevance”: {“type”: “integer”}, “authority”: {“type”: “integer”}, “audience”: {“type”: “integer”}, “risk”: {“type”: “integer”}, “verdict”: {“type”: “string”, “enum”: [“qualify”,”review”,”reject”]} }, “required”: [“relevance”,”authority”,”audience”,”risk”,”verdict”] } }, ] |
Note the division of labour: search_web, fetch_domain_metrics and find_contact gather facts; score_prospect forces Claude to emit a structured, schema-conformant judgement you can store and audit. The enum on verdict means you never have to parse free text to learn the decision.
Step 2 — Write the qualification rubric
The rubric is the brain of the agent. It lives in your system prompt and it is where relevance-first thinking gets encoded into numbers. A 100-point model that deliberately under-weights raw authority:
| Criterion | Max | What earns the points |
| Topical relevance to your niche | 40 | Site’s core subject genuinely overlaps your target topics |
| AI Overview / citation presence | 15 | Site is cited for your target queries in AI answers |
| Real audience (organic traffic) | 15 | Evidence of genuine readers, not a metrics shell |
| Editorial link behaviour | 10 | Links out to sources within real content |
| Domain authority (DR / DA) | 10 | Useful signal — but capped at 10 on purpose |
| Risk / spam signals (subtractive) | −10 | PBN markers, link-selling footprints, thin content |
Thresholds: 70 and above qualifies; 55 to 69 routes to manual review; below 55 is rejected automatically. The deliberate design choice is that authority alone tops out at 10 points, so a high-DR but off-topic site cannot pass on metrics. This mirrors what AI Overviews reward in practice — a niche-pure publication beats a bigger general blog for the queries that matter.
| Why relevance is weighted 40 and authority only 10 When a generative engine answers a query, it does not pull from the highest-DR sites in its index; it pulls from the sites it judges genuinely authoritative on that specific topic. A link from a site the engine already trusts for your topic carries weight a high-DR generalist link does not. Encoding that into the rubric is the difference between an agent that builds rankings and one that builds a vanity backlink count. |
Step 3 — Run the agentic loop
With tools defined and the rubric in the system prompt, the loop from Section 2 does the work. The model calls search_web to discover, fetch_domain_metrics and find_contact to enrich, then score_prospect to render a verdict — looping until every candidate is processed. Use tool_choice of auto so the model decides each turn; with the default auto setting Claude calls a tool when the request maps to it and the answer is not already in context, and responds directly otherwise. For independent look-ups across many candidates, enable parallel tool calls so the model fans out instead of plodding one domain at a time.
Step 4 — Add extended thinking for the hard judgement
Scoring a borderline prospect is a chained decision: read the topic, weigh it against the rubric, check for risk signals, then decide. With extended thinking enabled, the model plans in a thinking block before choosing a tool, and you get visibly better tool selection on multi-step problems. Anthropic’s guidance is to enable thinking for agents that need to chain three or more tools to reach an answer — which describes the score-and-qualify stage exactly. Reserve it for that stage; you do not need it for a flat metrics fetch.
Step 5 — Batch the heavy lifting and cache the rubric
Prospecting at volume is the textbook case for asynchronous processing. Anthropic’s Message Batches API processes large volumes of requests asynchronously at a flat 50% discount on input and output tokens, returning results within a 24-hour window — and for prospecting you almost never need a candidate scored this second. Two architectural moves cut the bill hard:
- Batch the scoring pass. Queue every enriched candidate as a batch request rather than calling the synchronous API one domain at a time; you halve the token cost for free.
- Cache the rubric. Your rubric and system instructions are identical on every call. Prompt caching lets you mark that block as cacheable and read it back at roughly 90% off the normal input price, and the discount stacks with batch. Across a thousand candidates that is the single biggest saving available.
Model routing is the third lever. Use the cheap, fast tier for high-volume mechanical work and reserve the expensive tier for genuine judgement:
| Stage | Suggested model | Why |
| Bulk enrichment + first-pass scoring | Haiku 4.5 | Cheapest tier; classification and extraction are its sweet spot |
| Loop orchestration + pitch drafting | Sonnet 4.6 | Best price/performance for tool-use and writing |
| Borderline / high-stakes judgement | Opus 4.8 | Reserve for the edge cases the rubric cannot settle alone |
This routing reflects the standard division: the lightweight tier for simple classification and high-volume tasks, the mid tier for most production tool-use, and the top tier only for complex reasoning. Most candidates never touch the expensive model.
Step 6 — Install the human gate
The agent prepares; a person decides. Qualified prospects and their draft pitches land in a review queue — a spreadsheet, a CRM, or a simple dashboard — where a human edits, approves or kills each one before anything is sent. This is not bureaucracy; it is the safeguard that keeps the whole operation white-hat. Reputable teams keep human editors for final quality control because AI-written outreach without review is increasingly flagged by Google’s spam systems. The agent’s job is to make the human’s review fast, not to remove the human.
The rubric in action: three prospects scored
Numbers make the rubric concrete. Here are three candidates for a UK B2B SaaS link campaign, scored on the model above. Watch how the relevance weighting changes the verdicts a metrics-only filter would get wrong.
| Criterion (max) | Site A | Site B | Site C | |
| Topical relevance (40) | 38 | 12 | 30 | |
| AI Overview presence (15) | 13 | 4 | 9 | |
| Real audience (15) | 11 | 14 | 8 | |
| Editorial link behaviour (10) | 9 | 6 | 7 | |
| Domain authority (10) | 6 | 10 | 5 | |
| Risk signals (−10) | 0 | −8 | −2 | |
| Total | 77 | 38 | 57 | |
| Verdict | Qualify | Reject | Review |
Site B is the trap. It has the highest domain authority of the three and a real audience — a metrics-first filter would rank it top. But it is off-topic, carries link-selling footprints, and is barely cited for the target queries, so the rubric correctly rejects it. Site A, with a lower DR but tight topical fit and clean signals, qualifies. Site C lands in the human review tier, which is exactly right: a 57 is a genuine maybe, and a person should look. This is the judgement a script cannot make and an agent can.
Discovery in depth: turning three seeds into a candidate pool
Discovery is where most prospecting either succeeds or quietly fails, because the rest of the agent can only qualify what discovery surfaces. The art is breadth without noise: cast wide enough to find the long tail of niche-relevant sites, structured enough that you are not drowning in irrelevant domains. The agent expands a tiny seed input into a large pool through three parallel methods, and it is worth being explicit about each.
The first method is query expansion. You give the agent two or three seed keywords; it generates the variations a skilled prospector would type — informational queries, “best of” and roundup operators, resource-page footprints, “write for us” and contributor patterns, and statistics-led queries that tend to attract editorial citations. From three seeds a well-prompted agent will routinely produce thirty to fifty distinct search queries, each fetched through the server-side web search tool.
The second method is competitor link mining. The agent takes your two or three closest ranking competitors and pulls the domains already linking to them. These are the highest-yield candidates in any campaign because they have already demonstrated willingness to link to a site like yours on a topic like yours — the relevance and the editorial behaviour are pre-validated. A single competitor with a healthy profile can contribute hundreds of candidate referring domains.
The third method is citation mining. For each of your priority queries, the agent checks which publications are cited in AI Overviews and answer engines, because those sites are exactly the ones a generative engine already trusts on your topic. The practical instruction here is the one the field has converged on: run each primary keyword, see whether a publication appears in the AI Overview citations, and prioritise the ones that recur. The agent does this at a scale no human would tolerate.
| Worked example: three seeds to a pool Seeds: “SaaS link building”, “B2B backlinks”, “digital PR SaaS”. Query expansion yields roughly 40 search queries. Competitor mining across three rivals contributes about 600 referring domains. Citation mining across 15 priority queries surfaces another 50 recurring publications. After de-duplication you hold a pool near 550 unique candidate domains — produced in minutes, from three words of input. Discovery has done its job; scoring will now cut that 550 down to perhaps 40 genuinely worth contacting. |
Enrichment in depth: the record the rubric reads
Scoring is only as good as the record it reads, so enrichment quality caps the whole agent’s accuracy. For each candidate the agent assembles a structured record — and the discipline is to capture exactly what the rubric needs and nothing it does not, because every extra field is cost and latency for no decision-making value.
A clean enrichment record contains:
- Domain and canonical URL — the identity of the candidate.
- Primary topic and a short topic profile — what the site is actually about, read from its content rather than inferred from a keyword.
- Domain rating or authority and organic traffic — the metric signals, deliberately secondary.
- Editorial-link evidence — whether recent articles link out to external sources within the body copy.
- AI-citation flag — whether the site appears in answer-engine citations for the target queries.
- Risk markers — sponsored-post footprints, link-selling pages, thin or AI-spun content, sudden link spikes.
- A verified contact — name, role and a deliverable email address.
Two enrichment rules save more grief than any other. First, verify the contact before scoring, not after — there is no point qualifying a site you cannot reach, and a bounced pitch damages your sending reputation. Second, treat missing data as a signal, not a blank. If the agent cannot find a contact, or cannot read a clear topic, that ambiguity should pull the candidate towards the manual-review tier rather than a confident qualify. Agents inherit data problems and amplify them through automated action, so a record full of holes should never produce a high-confidence verdict.
A day in the life of the agent: one batch, end to end
It helps to watch the whole loop run once. Here is a single batch from seed to review queue, with the decisions the agent makes at each turn.
- Kickoff. You hand the agent three seed keywords, three competitor domains and your offer (an original UK SaaS-churn data study). The system prompt already holds the rubric and the campaign rules.
- Discovery. The agent fans out parallel search_web calls across expanded queries and competitor backlink look-ups, returning a raw pool of roughly 550 domains. It de-duplicates and drops obvious non-candidates — social platforms, your own properties, the competitors themselves.
- Enrichment. For each survivor it calls fetch_domain_metrics and find_contact, assembling structured records. Candidates with no reachable contact are tagged for review rather than discarded outright, in case the contact gap is fixable.
- Scoring. Running as a cached, batched job, the agent applies the rubric to every record and calls score_prospect, emitting a 0–100 score, the sub-scores, and a verdict for each. Extended thinking is on for this stage so borderline cases get reasoned through rather than guessed.
- Qualification. Verdicts sort the pool: perhaps 40 qualify, 70 land in review, and the remainder are rejected with logged reasons. The reject log is not waste — it is how you later audit and tune the rubric.
- Drafting. For the 40 qualified targets, the agent drafts a tailored opening pitch each, referencing the publisher’s recent work and offering the churn study.
- Gate. The 40 drafts plus the 70 review cases land in a sheet. A human spends an hour approving, editing and killing — reviewing 110 pre-researched candidates in the time it once took to manually qualify a dozen.
The compression is the point. A task that was a day of tab-juggling becomes an hour of judgement on top of a few pounds of compute. The human still decides everything that matters; the agent removes everything that did not need a human in the first place.
Pitch drafting: what the agent hands the human
A draft is a starting point, not a send-ready email, and prompting the agent accordingly keeps quality high. The brief you give it should demand specificity over flattery: reference one concrete recent piece from the target, state the asset on offer in a line, and make a single clear ask. The failure mode to prompt against is the generic, robotic template — the kind that the field warns against because mass robotic outreach earns nothing and risks your sender reputation. A workable draft structure the agent can fill per target:
| Subject: <specific, plain — references their topic, not yours> Hi <name>, <one line on a specific recent piece they published> <one line: the asset you offer + why it fits their readers> <single clear ask: would you consider linking / covering it?> <source URL> Thanks, <you> |
Feed the agent the enrichment record as the personalisation source: the publisher’s name and role, the title and angle of a recent article, the topic profile, and your one-line offer. The agent stitches those into the structure. The human then does what only a human should — checks the tone is right for that specific editor, confirms the recent-piece reference is accurate and not hallucinated, and sharpens the ask. Two minutes of editing on a well-researched draft beats twenty minutes writing from a blank page, and it keeps a person accountable for every word that goes out under your name.
Reliability: the failure modes that bite in week three
The loop works on day one. The problems surface in week three, at volume, and they are predictable enough to engineer against in advance.
- Rate limits. At volume you will hit them. Wrap tool and model calls in retry-with-backoff so a transient limit becomes a short pause, not a failed batch. This is standard production hygiene for any tool-using Claude workload.
- Tool errors as data. When a tool fails — an API times out, a domain is unreachable — return the error to the model as a tool result rather than crashing the loop. The model can then route around it: skip the candidate, try an alternative source, or flag for review.
- Hallucinated specifics in pitches. The single most damaging failure: a draft that cites an article the publisher never wrote. The human gate catches it, which is one more reason the gate is non-negotiable. You can also constrain the agent to quote only from enrichment data it actually retrieved.
- Context bloat over long runs. A loop that processes hundreds of candidates in one conversation accumulates tokens fast. Process candidates in independent batches rather than one ballooning thread, and lean on caching for the fixed rubric so repeated context is cheap.
- Schema drift. If a downstream system depends on the score_prospect output, use strict mode so the structure never silently changes. A malformed verdict that slips into your database is harder to find than a loud crash.
None of these is exotic; they are the ordinary realities of running an agent in production, and the standard tool-use patterns in Anthropic’s implementation guidance cover the mechanics. Engineer for them up front and your week three is uneventful.
The cost model: what scoring a thousand prospects actually costs
Style demands numbers, so here is the worked economics. Assume you are scoring 1,000 enriched candidates on the cheap tier (Haiku 4.5, around £1 per million input tokens and £5 per million output by Anthropic’s published rates, quoted here in pounds for convenience). Each scoring call carries roughly 2,500 input tokens (the rubric plus one candidate record) and returns about 350 output tokens. The configurations stack like this:
| Configuration | Input | Output | Total |
| Synchronous, no caching | £2.50 | £1.75 | £4.25 |
| Batch API (−50%) | £1.25 | £0.88 | £2.13 |
| Batch + rubric caching | ≈ £0.58 | £0.88 | ≈ £1.45 |
Scoring a thousand prospects for the price of a coffee. The caching line assumes about 1,500 of the 2,500 input tokens are the fixed rubric, read back at the cached rate; figures are illustrative and your mix will differ, but the order of magnitude holds. The strategic point is that batch and caching together can cut effective API spend dramatically on eligible workloads, and prospecting is the definition of an eligible workload — high volume, repeated context, no real-time requirement. Set against the cost of a human doing the same triage, the agent pays for itself inside its first list. For the wider tooling economics, see our comparison of link building tools.
The tooling stack you will wire up
The agent is only as good as the tools behind it. You need four categories of capability, exposed to Claude as client tools (your APIs) or used as server tools (Anthropic-run web search and fetch):
- Discovery. Server-side web search and fetch for query expansion and competitor link sourcing, plus a backlink index API for “who links to my competitors” queries.
- Metrics. An SEO data provider for domain rating, organic traffic and topic classification — the numbers the rubric needs.
- Contacts. An email-finding and verification service so the agent never drafts a pitch it cannot deliver.
- Storage and review. A database or sheet for scored records and a queue for the human gate.
If you would rather not maintain bespoke integrations, the Model Context Protocol is the standardising layer: define a tool once on an MCP server and any client can call it, with the server rather than your application code handling execution. For repeatable campaign types — resource-page outreach, link insertions, list inclusions — standardise the toolset first; standardised asks give you cleaner data on reply and placement rates before you scale into bespoke digital PR.
Guardrails, failure modes and compliance
An autonomous system that contacts publishers on your behalf can damage your domain faster than a human ever could. Build these guardrails in from day one, not after the first incident.
- Never autosend. The human gate is non-negotiable. The agent’s output is a queue, never an outbox.
- Cap relevance, not just volume. A high reject rate is a healthy sign. If the agent is qualifying most of what it sees, the rubric is too loose and you are about to spam marginal sites.
- Garbage in, amplified out. Agents inherit and magnify data problems. Verify every contact and sanity-check enrichment before scoring, because an error in the data becomes an error sent to a real editor.
- Log every decision. Store the score, the reasons and the verdict for each candidate. When a campaign underperforms you want to debug the rubric, not guess.
- Respect contact preferences and law. Honour opt-outs, suppress previously-contacted domains, and keep outreach compliant with the relevant rules in each market you work — including the UK and EU.
These are the same disciplines that separate durable link programmes from the ones that get penalised, just enforced by code. They sit on top of, not instead of, the strategic principles in our link building strategies guide.
Measuring the agent: the KPIs that matter
An agent you cannot measure is an agent you cannot improve. Track five numbers and review them weekly while you tune, monthly once it is stable:
| Metric | What it tells you |
| Qualification rate | Share of candidates that pass — too high means a loose rubric, too low means weak discovery. |
| Human-override rate | How often reviewers reverse the agent’s verdict — your rubric-accuracy gauge. |
| Reply rate on agent-drafted pitches | Whether the drafts actually land with editors. |
| Placement rate per qualified prospect | The bottom line: links won ÷ qualified targets. |
| Cost per qualified prospect | Total API + tool spend ÷ qualified prospects — your efficiency benchmark. |
The most useful of these is the human-override rate. Every time a reviewer flips a verdict, you have a labelled example of where the rubric is wrong — feed a batch of those back into the rubric wording and the agent gets measurably sharper. For benchmarks on what good link velocity and acceptance look like, cross-reference our 2026 link building statistics.
Questions teams ask before building this
Do I need to be a developer to build it?
You need to be comfortable with basic scripting and API calls — the loop itself is short and the heavy lifting sits in tools you mostly call rather than write. A marketer who can read and adapt code can stand up a v1; a campaign at production scale benefits from an engineer to harden the tooling and storage. The conceptual model matters more than deep programming skill: if you understand the six-stage architecture, you can direct the build.
Will an AI-built outreach programme get me penalised?
Not if you keep the human gate and weight relevance over volume. Penalties come from mass, irrelevant, autosent outreach — not from using AI to research and draft. The whole field’s guidance is to let AI reduce manual work while people review websites, tone, compliance and the final message. The agent makes you faster at doing the right thing; it does not change what the right thing is.
How is this different from buying an off-the-shelf outreach tool?
Off-the-shelf tools give you a fixed workflow and someone else’s qualification logic. Building your own agent means the rubric encodes your definition of a good link, the tooling plugs into your data, and the costs are raw compute rather than per-seat licences. For standardised, repeatable campaigns a packaged tool may be enough; for a programme where relevance judgement is your edge, owning the rubric is worth the build.
What does it cost to run at scale?
The compute is close to negligible relative to human time — the worked model above scores a thousand prospects for the price of a coffee once batching and caching are on. Your real costs are the SEO-data and contact APIs the agent calls, plus the human hour per batch at the gate. The economics flip in the agent’s favour the moment your list size exceeds what a person could triage by hand, which is almost immediately.
Which Claude model should the loop run on?
Sonnet 4.6 is the sensible default for the orchestration loop and pitch drafting — it has the price-to-performance balance most production tool-use wants. Push bulk classification and first-pass scoring down to Haiku 4.5 to save money, and reserve Opus 4.8 for the genuinely hard judgement calls the rubric cannot settle. Most candidates should never touch the expensive tier.
Your Monday-morning build checklist
You will not ship the full system in a day, but you can stand up a working v1 this week. Do these six things in order:
- Get an API key and confirm a basic tool-use loop runs end to end with one dummy tool.
- Define your three fact-gathering tool schemas — search_web, fetch_domain_metrics, find_contact — and wire them to real APIs.
- Write your rubric into the system prompt with explicit weights and thresholds; start from the 40/15/15/10/10/−10 model and adapt.
- Add the score_prospect tool with strict mode and an enum verdict so decisions are structured and auditable.
- Run 50 candidates through synchronously to debug, then switch the scoring pass to the Batch API with rubric caching.
- Pipe qualified prospects and draft pitches into a review sheet — and never connect an outbox.
That is a complete, safe, measurable prospecting agent built on a primitive that is deliberately simple. Start narrow — one campaign type, one niche, a tight rubric — prove the qualification rate and human-override rate are good, then widen. The agents that compound are the ones whose owners treated the rubric as a living document and the human gate as sacred. Build it that way and prospecting stops being the bottleneck in your link programme and becomes its engine. From here, the deeper builds — multi-agent orchestration, retrieval-augmented personalisation and vector-based prospect matching — all extend this same loop; this guide is the foundation they stand on.
