Web Crawler

// OVERVIEW

The Web Crawler API allows you to programmatically crawl web pages and extract structured data. Authenticate using an API key in the Authorization header. Perfect for AI agents, automation scripts, and CLI tools.

Base URL: https://your-domain.com/api/v1

// AUTHENTICATION

Include your API key in the Authorization header:

Authorization: Bearer wc_your_api_key_here

Generate API keys at /keys

// POST /api/v1/crawl

Execute a crawl job. Costs credits based on depth and URL count.

REQUEST BODY

Field	Type	Required	Description
`urls`	string[]	yes	Array of URLs to crawl
`depth`	number	no	Link depth (1-10, default: 1)
`format`	string	no	json \| xml \| csv \| txt (default: json)
`fields`	object[]	no	Extraction fields (see below)
`options`	object	no	Crawl options

FIELDS SCHEMA

{
  "id": "unique_id",
  "name": "field_name",
  "selector": "css_selector",
  "type": "text | link | image | attribute | html",
  "attribute": "attr_name"  // only for type: attribute
}

RESPONSE

{
  "success": true,
  "credits_used": 3,
  "credits_remaining": 47,
  "data": [...],
  "total_pages": 3,
  "total_items": 15,
  "duration": 2.4,
  "format": "json",
  "errors": []
}

// GET /api/v1/balance

Check your current credit balance.

{ "credits": 50 }

// GET /api/v1/cost?depth=2&urls=3

Calculate crawl cost without executing. Public endpoint (no API key required).

{ "cost": 6, "depth": 2, "url_count": 3 }

// EXAMPLES

CURL

curl -X POST https://your-domain.com/api/v1/crawl \
  -H "Authorization: Bearer wc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "depth": 2,
    "format": "json",
    "fields": [
      {"id": "title", "name": "title", "selector": "h1", "type": "text"},
      {"id": "links", "name": "links", "selector": "a", "type": "link"}
    ]
  }'

PYTHON

import requests

API_KEY = "wc_your_key_here"
BASE_URL = "https://your-domain.com/api/v1"

response = requests.post(
    f"{BASE_URL}/crawl",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "urls": ["https://example.com"],
        "depth": 1,
        "format": "json",
        "fields": [
            {"id": "title", "name": "title", "selector": "h1", "type": "text"}
        ]
    }
)

data = response.json()
print(f"Crawled {data['total_pages']} pages, {data['total_items']} items")
print(f"Credits remaining: {data['credits_remaining']}")

JAVASCRIPT (Node.js / AI Agent)

const API_KEY = "wc_your_key_here";
const BASE_URL = "https://your-domain.com/api/v1";

const response = await fetch(`${BASE_URL}/crawl`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    urls: ["https://example.com"],
    depth: 1,
    format: "json",
    fields: [
      { id: "title", name: "title", selector: "h1", type: "text" }
    ]
  }),
});

const data = await response.json();
console.log(`Credits remaining: ${data.credits_remaining}`);

OPENAI / CLAUDE FUNCTION CALLING SCHEMA

{
  "name": "web_crawl",
  "description": "Crawl web pages and extract structured data",
  "parameters": {
    "type": "object",
    "properties": {
      "urls": {
        "type": "array",
        "items": { "type": "string" },
        "description": "URLs to crawl"
      },
      "depth": {
        "type": "integer",
        "minimum": 1,
        "maximum": 10,
        "description": "How many link levels deep to crawl"
      },
      "format": {
        "type": "string",
        "enum": ["json", "xml", "csv", "txt"],
        "description": "Output format"
      },
      "fields": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "name": { "type": "string" },
            "selector": { "type": "string" },
            "type": { "type": "string", "enum": ["text","link","image","attribute","html"] }
          }
        },
        "description": "What data to extract from each page"
      }
    },
    "required": ["urls"]
  }
}

// ERROR_CODES

HTTP	Error	Description
401	`unauthorized`	Invalid or missing API key
402	`insufficient_credits`	Not enough credits for this crawl
400	`validation_error`	Invalid request parameters
502	`crawl_failed`	Crawl engine error (credits refunded)
503	`engine_unavailable`	Crawl engine offline (credits refunded)

// PRICING

Cost = depth x url_count credits per crawl request.

Example	Cost
1 URL, depth 1	1 credit
3 URLs, depth 2	6 credits
5 URLs, depth 5	25 credits