CSV batch enrichment
Use when the user wants to upload a lead CSV, enrich many leads overnight, or resume a paused batch after topping up.
Endpoints
| Method | Path | Auth |
|---|---|---|
POST |
/api/v1/batch/enrich/upload |
Optional bearer; anonymous gets upload_token |
POST |
/api/v1/batch/enrich/{id}/link |
Bearer — bind anon job to agent |
POST |
/api/v1/batch/enrich/{id}/start |
Bearer — requires $10 min virtual balance |
POST |
/api/v1/batch/enrich/{id}/resume |
Same as start (continues after pause) |
GET |
/api/v1/batch/enrich/{id} |
Bearer or Authorization: Batch {upload_token} |
GET |
/api/v1/batch/enrich/{id}/preview |
Summary counts only (no PII) |
GET |
/api/v1/batch/enrich/{id}/download.csv |
Full enriched CSV |
Web UI: /batch and the paperclip on
/enrich_agent. Results:
/batch/{id}/results.
CSV format
Any delimiter and header shape — comma, tab, or semicolon; utility exports, CRM dumps, vendor spreadsheets, etc. The upload path:
- CRM fast path — obvious headers (
company,street,phone, …) map without an LLM call. - Per-agent header cache — when the same header set was mapped before for this agent (fingerprinted by normalized column names), reuse the stored map and skip Crimson Span.
- Crimson Span mapping — when headers are non-standard and uncached, one
data411-crimson-spancall reads the header row plus up to five sample rows and returns a column map to canonical lead fields plus optionalsearch_variants(billing names, mailing addresses, secondary phones, account ids, etc.). - Thin-result hunts — variant columns are stored on each lead as
search_variantsand injected into supplement passes only when core coverage is thin (first-pass hunts found little from the primary company/address).
Canonical lead fields (same as enrich/lead-session):
| Field | Typical sources |
|---|---|
contact_id |
CRM id, account number |
company |
business / account name |
street |
address line 1 (+ optional line 2 merged) |
city, state, zip |
locality |
number1, number2 |
primary / secondary phone |
dm_name |
contact or owner name |
contact_id is optional — a hash is generated from company + address when missing.
At least company or enough address fields must be present.
Quoted CSV (RFC 4180) is best when company names or addresses contain commas — wrap those fields in double quotes. The upload UI checks the first 10 lines before sending.
Example (standard CRM):
contact_id,company,street,city,state,zip,number1
81555753,Haney Furniture Co,100 Main St,Pittsburgh,PA,15213,7246547732
99201,"Smith & Sons, LLC","100 Main St, Ste 2",Tampa,FL,33602,8135550100
Env: BATCH_ENRICH_LLM_MAP_ENABLED (default 1),
BATCH_ENRICH_LLM_MAP_MODEL (default data411-crimson-span),
OPENROUTER_API_KEY required for non-standard exports.
Max rows and file size: BATCH_ENRICH_MAX_ROWS (default 10 000) and
BATCH_ENRICH_MAX_UPLOAD_BYTES (default 5 MB).
Optional upload form field row_limit (process_limit on the job) caps how many
rows the worker enriches. The full CSV is parsed and every mapped row is
stored in batch_enrich_row (status skipped beyond the limit; web UI Rows
to process, default 10). Response: stored_row_count, row_count
(pending), file_row_count. Enrich-agent uploads auto-call /start when balance
allows. Crimson Span column mapping receives up to min(row_limit, 10) sample rows
(BATCH_ENRICH_LLM_PREVIEW_ROWS, default 10) and returns chat_label
(primary_column + optional location_columns) so sidebar titles and each row's
chat_title are set at upload — enrichment does not re-derive names per record.
How processing works
- Upload creates a
batch_enrich_joband onebatch_rowper CSV line. - When a bearer is present, demo sessions are created: one
batch_fileparent and onebatch_rowchat per lead (seeded with the CRM JSON). While the batch worker runs, each row chat mirrors liveenrich_lead_sessionhunt progress (tool progress + parallel websearch tails) — open Files in /enrich_agent and click any lead to watch. - Start checks virtual balance ≥
BATCH_ENRICH_MIN_BALANCE_USD(default $10). - Worker claims rows and runs the same path as
enrich/lead-session+ backfill persist; pauses when balance cannot cover the next row. - Bill-on-match — empty rows auto-credit like single-lead hunts.
- Preview and results pages show counts only (people, phones, emails) — never
contact names or numbers in the UI cards. The authenticated CSV download is a
flattened LEB graph (
backfill_json): one row per lead with dotted/indexed columns (contacts.0.display_name,phones.0.e164,emails.0.address,entities.0.legal_name,confidence,summary_markdown, …). The internaltool_traceis omitted. Columns are the union across all rows, prefixed withrow_index,enrich_status,enrich_matched,enrich_spent_usd.
Billing order
Same as all paid routes: virtual USD balance first; wallet x402 only when balance is exhausted. Top up at /connect?action=topup.
Anonymous flow
- Upload without bearer →
upload_tokenin response. - Store token in
sessionStorage(SPA does this automatically). - Mint / connect with
return=/batch/{id}. POST …/linkwith{ "upload_token": "…" }.- Fund to $10+ →
POST …/start.
Headless parallel REST
For integrations that already have row iterators (no CSV file), see batch_and_concurrency and event_callbacks.