InvestorLift Intelligence
/ API Docs · curl Back to app
curl reference

InvestorLift Scraper API

Pull clean, structured data from any InvestorLift marketplace listing — one URL or thousands at a time. Every example below is a single curl command, ready to paste into a terminal.

1. Save your key

Paste your API_KEY below — every curl snippet auto-fills with it.

2. Pick an endpoint

One property → /api/scrape. Many → /api/bulk-scrape.

3. Get JSON back

Address, price, ARV, beds/baths, photos — direct from the listing's NUXT_DATA.

Your API key

Stored in your browser's localStorage only — never sent anywhere except as the X-API-Key header. Once saved, every curl block below fills it in for you.

Regenerate rotates the server-side key. The old key stops working immediately; the new one is persisted to .api_key and stays fixed across restarts until you regenerate again.

Live

Try it now

Paste any InvestorLift property URL and run a real scrape against this server. Uses the API key you saved above.

Authentication

Recommended
Header
-H "X-API-Key: {KEY}"
Also works
Bearer token
-H "Authorization: Bearer {KEY}"
For browser links
Query string
?api_key={KEY}
POST

/api/scrape

Extract a single property page. Returns the full record from the page's NUXT_DATA blob.

Request body

{
  "url":  "https://investorlift.com/marketplace/p/abc123",   # required
  "mode": "clean",                                          # "clean" | "raw"  (default: clean)
  "filters": { "min_price": 100000, "states": ["TX"] }            # optional, see Filter schema
}

curl example

curl -X POST {BASE}/api/scrape \
  -H "X-API-Key: {KEY}" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://investorlift.com/marketplace/p/abc123","mode":"clean"}'
POST

/api/bulk-scrape

Run a parallel extraction over many URLs. Returns immediately with the queued count; poll /api/bulk-scrape/status for live progress.

Max URLs
5000
Max workers
20
Default
8 workers
Per-URL timeout
25s

Request body

{
  "urls":    ["https://investorlift.com/marketplace/p/abc", "…/xyz"],
  "mode":    "clean",                                # "clean" | "raw"
  "workers": 8,                                      # 1..20
  "filters": {                                       # optional
    "min_price":      100000,
    "max_price":      400000,
    "min_beds":       3,
    "states":         ["TX", "FL"],
    "property_types": ["Single-Family"],
    "exclude_under_contract": true
  }
}

curl example

curl -X POST {BASE}/api/bulk-scrape \
  -H "X-API-Key: {KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://investorlift.com/marketplace/p/abc",
      "https://investorlift.com/marketplace/p/xyz"
    ],
    "mode": "clean",
    "workers": 8
  }'

Full pipeline (start → poll → download)

# 1) Start the bulk job
curl -X POST {BASE}/api/bulk-scrape \
  -H "X-API-Key: {KEY}" \
  -H "Content-Type: application/json" \
  -d '{"urls":["https://investorlift.com/marketplace/p/abc"],"mode":"clean","workers":8}'

# 2) Poll live status until "running": false
curl -H "X-API-Key: {KEY}" {BASE}/api/bulk-scrape/status

# 3a) Pull results as JSON
curl -H "X-API-Key: {KEY}" {BASE}/api/bulk-scrape/json

# 3b) Or download the Excel workbook (use -OJ to preserve the server filename)
curl -H "X-API-Key: {KEY}" -OJ {BASE}/api/bulk-scrape/excel
GET

/api/bulk-scrape/status

Live progress for the current bulk run. Safe to call as often as 1×/sec.

curl

curl -H "X-API-Key: {KEY}" {BASE}/api/bulk-scrape/status

Response shape

{
  "running":      false,
  "done":         42,
  "total":        42,
  "succeeded":    39,
  "failed":       1,
  "filtered_out": 2,
  "results":      [{ /* clean property records */ }],
  "failures":     [{ "url": "…", "error": "HTTP 503" }],
  "log":          [{ "url": "…", "ok": true }],
  "started":      1736380000.123,
  "finished":     1736380012.456,
  "excel_ready":  true,
  "run_dir":      "2026-05-24_..."
}
GET

/api/bulk-scrape/excel

Two-sheet workbook (Properties + Failures). Frozen header, AutoFilter, hyperlinked URL cells.

# Save as the server-suggested filename via -OJ:
curl -H "X-API-Key: {KEY}" -OJ {BASE}/api/bulk-scrape/excel

# Or via query-string auth (handy for direct browser download):
curl -OJ "{BASE}/api/bulk-scrape/excel?api_key={KEY}"
GET

/api/bulk-scrape/json

Pretty-printed JSON of the last bulk run — includes results, failures, filters, started/finished timestamps.

curl -H "X-API-Key: {KEY}" {BASE}/api/bulk-scrape/json -o investorlift_bulk.json

Job tracking

Every meaningful API call this server handles is persisted to Supabase as a row in api_jobs. Single-property scrapes additionally land in scrape_results; bulk runs land in bulk_scrape_runs. Recording is fire-and-forget on a small background pool — a Supabase outage delays the audit trail but never blocks a real response.

High-frequency polling endpoints (/status, /history, /log) are intentionally not tracked so the audit table stays signal-rich. Toggle the whole layer off with TRACK_API_JOBS=false if needed.

GET

/api/jobs

Recent API jobs, newest first. Defaults to 50 rows; ?limit=N goes up to 500.

curl

# 50 most-recent jobs
curl -H "X-API-Key: {KEY}" {BASE}/api/jobs

# Last 10 jobs
curl -H "X-API-Key: {KEY}" {BASE}/api/jobs?limit=10

Response shape

{
  "configured": true,                       # false → Supabase not set up yet
  "tracking":   true,                       # TRACK_API_JOBS env toggle
  "count":      3,
  "jobs": [
    {
      "job_id":        "job-20260524T034512-ab12cd34",
      "endpoint":      "/api/scrape",
      "method":        "POST",
      "status":        "succeeded",
      "http_status":   200,
      "remote_addr":   "127.0.0.1",
      "api_key_hint":  "…aB12",
      "request_summary":  { ... },                # redacted body + query
      "response_summary": { "ok": true, "matched": true },
      "started_at":   "2026-05-24T03:45:12Z",
      "finished_at":  "2026-05-24T03:45:13Z",
      "duration_ms":  812
    }
  ]
}
GET

/api/jobs/<job_id>

One job by id, with its persisted scrape result or bulk run attached when present.

curl

curl -H "X-API-Key: {KEY}" {BASE}/api/jobs/job-20260524T034512-ab12cd34
POST

/api/admin/regenerate-key

Rotate the server's API key. You must authenticate with the current key. The response contains the new key once — store it immediately. The old key stops working as soon as this endpoint returns. The new key is persisted to .api_key on the server, so it survives restarts and stays fixed until you call this endpoint again.

curl

curl -X POST {BASE}/api/admin/regenerate-key \
  -H "X-API-Key: {KEY}"

Response

{
  "ok":      true,
  "api_key": "il_…new key here…",
  "warning": "Store this key now — it will not be shown again."
}

Marketplace → Supabase sync

Full-marketplace pipeline: Selenium URL discovery → parallel NUXT extraction → bulk write to your Supabase database. Sync endpoints don't require an API key (they're internal-UI routes), but they do require the server's SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY env vars to be set.

POST

/api/marketplace-sync/run

Trigger a sync. Returns 202 immediately; poll /status for progress.

Request body (all optional)

{
  "discovery_mode": "all",            # "active" | "historical" | "all"
  "states":         ["TX", "FL"],     # omit for all 50
  "trigger":        "API"             # shown in scrape_runs.triggered_by
}

curl

# Full marketplace, all 50 states, both active + historical passes
curl -X POST {BASE}/api/marketplace-sync/run \
  -H "Content-Type: application/json" \
  -d '{"discovery_mode":"all","trigger":"API"}'

# Just TX + FL, active listings only
curl -X POST {BASE}/api/marketplace-sync/run \
  -H "Content-Type: application/json" \
  -d '{"discovery_mode":"active","states":["TX","FL"],"trigger":"API"}'
GET

/api/marketplace-sync/status

Snapshot of the current/last sync — phase, per-table progress, recent rows.

curl {BASE}/api/marketplace-sync/status
POST

/api/marketplace-sync/stop

Ask the in-flight sync to wind down. Kills the scraper subprocess and rolls sync_control back to idle.

curl -X POST {BASE}/api/marketplace-sync/stop
GET

/api/marketplace-sync/probe

Cheap connectivity check against Supabase — verifies the URL + service-role key are valid.

curl {BASE}/api/marketplace-sync/probe
GET

/api/marketplace-sync/history

Recent sync runs (last 25), newest first.

curl {BASE}/api/marketplace-sync/history
GET

/api/marketplace-sync/log

Last N lines of the live sync log (default 200, max 800). Plain JSON fallback for environments that buffer SSE.

curl {BASE}/api/marketplace-sync/log?n=200
GET

/api/info

Self-describing JSON: endpoints, auth requirements, current limits, supported filter keys.

curl {BASE}/api/info
GET

/api/states

All 50 US state codes + display names + the canonical property-type list. Useful for building dropdowns.

curl {BASE}/api/states
GET

/health

Liveness probe — returns {"ok": true}. No auth required.

curl {BASE}/health

Filter schema

Every filters key is optional. Records that don't match are dropped from results and counted under filtered_out. Filters apply after extraction — they don't reduce HTTP calls.

KeyTypeExampleMatches when
min_price / max_pricenumber100000price within range
min_arv / max_arvnumber300000ARV estimate within range
min_arv_pct / max_arv_pctnumber70price ÷ ARV percentage in range
min_beds / max_bedsnumber3bedroom count in range
min_baths / max_bathsnumber2bathroom count in range
min_sqft / max_sqftnumber1200square footage in range
min_year / max_yearnumber1990year_built in range
min_days / max_daysnumber30days_published in range
statesstring[]["TX","FL"]state code matches (case-insensitive)
citiesstring[]["Houston"]city matches (case-insensitive)
zipsstring[]["33101"]zip is in the list
property_typesstring[]["Single-Family"]type contains/contained-by any item
exclude_under_contractbooltruedrops sold/pending listings

Error codes

Every error returns {"error":"…","kind":"…"}.

400 Bad Request

Missing/invalid url or mode; non-investorlift.com URL.

401 Unauthorized

No or invalid API key — pass X-API-Key.

409 Conflict

A bulk run is already in progress — poll /api/bulk-scrape/status first.

413 Payload Too Large

Too many URLs in one bulk call — split into batches of 5,000 or fewer.

422 Unprocessable

Page loaded but no NUXT data was found — the listing may be offline or behind a login.

502 Bad Gateway

Upstream HTTP failure after retries — wait 30s and retry.