// OVERVIEW

The Web Crawler API allows you to programmatically crawl web pages and extract structured data. Authenticate using an API key in the Authorization header. Perfect for AI agents, automation scripts, and CLI tools.

Base URL: https://your-domain.com/api/v1

// AUTHENTICATION

Include your API key in the Authorization header:

Authorization: Bearer wc_your_api_key_here

Generate API keys at /keys

// POST /api/v1/crawl

Execute a crawl job. Costs credits based on depth and URL count.

REQUEST BODY

FieldTypeRequiredDescription
urlsstring[]yesArray of URLs to crawl
depthnumbernoLink depth (1-10, default: 1)
formatstringnojson | xml | csv | txt (default: json)
fieldsobject[]noExtraction fields (see below)
optionsobjectnoCrawl options

FIELDS SCHEMA

{
  "id": "unique_id",
  "name": "field_name",
  "selector": "css_selector",
  "type": "text | link | image | attribute | html",
  "attribute": "attr_name"  // only for type: attribute
}

RESPONSE

{
  "success": true,
  "credits_used": 3,
  "credits_remaining": 47,
  "data": [...],
  "total_pages": 3,
  "total_items": 15,
  "duration": 2.4,
  "format": "json",
  "errors": []
}

// GET /api/v1/balance

Check your current credit balance.

{ "credits": 50 }

// GET /api/v1/cost?depth=2&urls=3

Calculate crawl cost without executing. Public endpoint (no API key required).

{ "cost": 6, "depth": 2, "url_count": 3 }

// EXAMPLES

CURL

curl -X POST https://your-domain.com/api/v1/crawl \
  -H "Authorization: Bearer wc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "depth": 2,
    "format": "json",
    "fields": [
      {"id": "title", "name": "title", "selector": "h1", "type": "text"},
      {"id": "links", "name": "links", "selector": "a", "type": "link"}
    ]
  }'

PYTHON

import requests

API_KEY = "wc_your_key_here"
BASE_URL = "https://your-domain.com/api/v1"

response = requests.post(
    f"{BASE_URL}/crawl",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "urls": ["https://example.com"],
        "depth": 1,
        "format": "json",
        "fields": [
            {"id": "title", "name": "title", "selector": "h1", "type": "text"}
        ]
    }
)

data = response.json()
print(f"Crawled {data['total_pages']} pages, {data['total_items']} items")
print(f"Credits remaining: {data['credits_remaining']}")

JAVASCRIPT (Node.js / AI Agent)

const API_KEY = "wc_your_key_here";
const BASE_URL = "https://your-domain.com/api/v1";

const response = await fetch(`${BASE_URL}/crawl`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    urls: ["https://example.com"],
    depth: 1,
    format: "json",
    fields: [
      { id: "title", name: "title", selector: "h1", type: "text" }
    ]
  }),
});

const data = await response.json();
console.log(`Credits remaining: ${data.credits_remaining}`);

OPENAI / CLAUDE FUNCTION CALLING SCHEMA

{
  "name": "web_crawl",
  "description": "Crawl web pages and extract structured data",
  "parameters": {
    "type": "object",
    "properties": {
      "urls": {
        "type": "array",
        "items": { "type": "string" },
        "description": "URLs to crawl"
      },
      "depth": {
        "type": "integer",
        "minimum": 1,
        "maximum": 10,
        "description": "How many link levels deep to crawl"
      },
      "format": {
        "type": "string",
        "enum": ["json", "xml", "csv", "txt"],
        "description": "Output format"
      },
      "fields": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "name": { "type": "string" },
            "selector": { "type": "string" },
            "type": { "type": "string", "enum": ["text","link","image","attribute","html"] }
          }
        },
        "description": "What data to extract from each page"
      }
    },
    "required": ["urls"]
  }
}

// ERROR_CODES

HTTPErrorDescription
401unauthorizedInvalid or missing API key
402insufficient_creditsNot enough credits for this crawl
400validation_errorInvalid request parameters
502crawl_failedCrawl engine error (credits refunded)
503engine_unavailableCrawl engine offline (credits refunded)

// PRICING

Cost = depth x url_count credits per crawl request.

ExampleCost
1 URL, depth 11 credit
3 URLs, depth 26 credits
5 URLs, depth 525 credits