CSV batch enrichment

Use when the user wants to upload a lead CSV, enrich many leads overnight, or resume a paused batch after topping up.

Endpoints

Method Path Auth
POST /api/v1/batch/enrich/upload Optional bearer; anonymous gets upload_token
POST /api/v1/batch/enrich/{id}/link Bearer — bind anon job to agent
POST /api/v1/batch/enrich/{id}/start Bearer — requires $10 min virtual balance
POST /api/v1/batch/enrich/{id}/resume Same as start (continues after pause)
GET /api/v1/batch/enrich/{id} Bearer or Authorization: Batch {upload_token}
GET /api/v1/batch/enrich/{id}/preview Summary counts only (no PII)
GET /api/v1/batch/enrich/{id}/download.csv Full enriched CSV

Web UI: /batch and the paperclip on /enrich_agent. Results: /batch/{id}/results.

CSV format

Any delimiter and header shape — comma, tab, or semicolon; utility exports, CRM dumps, vendor spreadsheets, etc. The upload path:

  1. CRM fast path — obvious headers (company, street, phone, …) map without an LLM call.
  2. Per-agent header cache — when the same header set was mapped before for this agent (fingerprinted by normalized column names), reuse the stored map and skip Crimson Span.
  3. Crimson Span mapping — when headers are non-standard and uncached, one data411-crimson-span call reads the header row plus up to five sample rows and returns a column map to canonical lead fields plus optional search_variants (billing names, mailing addresses, secondary phones, account ids, etc.).
  4. Thin-result hunts — variant columns are stored on each lead as search_variants and injected into supplement passes only when core coverage is thin (first-pass hunts found little from the primary company/address).

Canonical lead fields (same as enrich/lead-session):

Field Typical sources
contact_id CRM id, account number
company business / account name
street address line 1 (+ optional line 2 merged)
city, state, zip locality
number1, number2 primary / secondary phone
dm_name contact or owner name

contact_id is optional — a hash is generated from company + address when missing. At least company or enough address fields must be present.

Quoted CSV (RFC 4180) is best when company names or addresses contain commas — wrap those fields in double quotes. The upload UI checks the first 10 lines before sending.

Example (standard CRM):

contact_id,company,street,city,state,zip,number1
81555753,Haney Furniture Co,100 Main St,Pittsburgh,PA,15213,7246547732
99201,"Smith & Sons, LLC","100 Main St, Ste 2",Tampa,FL,33602,8135550100

Env: BATCH_ENRICH_LLM_MAP_ENABLED (default 1), BATCH_ENRICH_LLM_MAP_MODEL (default data411-crimson-span), OPENROUTER_API_KEY required for non-standard exports.

Max rows and file size: BATCH_ENRICH_MAX_ROWS (default 10 000) and BATCH_ENRICH_MAX_UPLOAD_BYTES (default 5 MB).

Optional upload form field row_limit (process_limit on the job) caps how many rows the worker enriches. The full CSV is parsed and every mapped row is stored in batch_enrich_row (status skipped beyond the limit; web UI Rows to process, default 10). Response: stored_row_count, row_count (pending), file_row_count. Enrich-agent uploads auto-call /start when balance allows. Crimson Span column mapping receives up to min(row_limit, 10) sample rows (BATCH_ENRICH_LLM_PREVIEW_ROWS, default 10) and returns chat_label (primary_column + optional location_columns) so sidebar titles and each row's chat_title are set at upload — enrichment does not re-derive names per record.

How processing works

  1. Upload creates a batch_enrich_job and one batch_row per CSV line.
  2. When a bearer is present, demo sessions are created: one batch_file parent and one batch_row chat per lead (seeded with the CRM JSON). While the batch worker runs, each row chat mirrors live enrich_lead_session hunt progress (tool progress + parallel websearch tails) — open Files in /enrich_agent and click any lead to watch.
  3. Start checks virtual balance ≥ BATCH_ENRICH_MIN_BALANCE_USD (default $10).
  4. Worker claims rows and runs the same path as enrich/lead-session + backfill persist; pauses when balance cannot cover the next row.
  5. Bill-on-match — empty rows auto-credit like single-lead hunts.
  6. Preview and results pages show counts only (people, phones, emails) — never contact names or numbers in the UI cards. The authenticated CSV download is a flattened LEB graph (backfill_json): one row per lead with dotted/indexed columns (contacts.0.display_name, phones.0.e164, emails.0.address, entities.0.legal_name, confidence, summary_markdown, …). The internal tool_trace is omitted. Columns are the union across all rows, prefixed with row_index, enrich_status, enrich_matched, enrich_spent_usd.

Billing order

Same as all paid routes: virtual USD balance first; wallet x402 only when balance is exhausted. Top up at /connect?action=topup.

Anonymous flow

  1. Upload without bearer → upload_token in response.
  2. Store token in sessionStorage (SPA does this automatically).
  3. Mint / connect with return=/batch/{id}.
  4. POST …/link with { "upload_token": "…" }.
  5. Fund to $10+ → POST …/start.

Headless parallel REST

For integrations that already have row iterators (no CSV file), see batch_and_concurrency and event_callbacks.