Docs
Log in

Endpoint

GEThttps://api.crawlbase.com/storage
# Two operations: store (write) and retrieve (read).
# Storage is implicit — set store=true on a Crawling API call to write.
# Use this endpoint to read.

Storing pages

You don't call this endpoint to store. Instead, add store=true to any Crawling API call. The page gets stored automatically and you receive an rid in the response.

curl 'https://api.crawlbase.com/?token=YOUR_TOKEN' \
  --data-urlencode 'url=https://example.com' \
  --data-urlencode 'store=true' -G

# Response includes rid: a1B2c3D4e5F6

Retrieving pages

By RID

curl 'https://api.crawlbase.com/storage?token=YOUR_TOKEN&rid=a1B2c3D4e5F6'
from crawlbase import StorageAPI

api = StorageAPI({'token': 'YOUR_TOKEN'})
res = api.get(rid='a1B2c3D4e5F6')
print(res['body'])
const { StorageAPI } = require('crawlbase');
const api = new StorageAPI({ token: 'YOUR_TOKEN' });

const res = await api.get({ rid: 'a1B2c3D4e5F6' });
console.log(res.body);

By URL

Look up a stored page by its original URL. Returns the most recent stored version.

curl 'https://api.crawlbase.com/storage?token=YOUR_TOKEN&url=https%3A%2F%2Fexample.com'

Parameters

token
stringrequired
Your Crawlbase token.
rid
stringone of
Request identifier returned when the page was stored.
url
stringone of
Original URL. Returns the most recent stored version. URL-encode it.
format
html | jsonhtml
Response envelope. json wraps body and metadata.

Bulk retrieve

Pull up to 100 stored pages in one round-trip by RID. POST a JSON body with the list and (optionally) ask the server to delete each entry as it's returned — useful for "drain the queue" pipelines that don't need to keep storage warm.

POSThttps://api.crawlbase.com/storage/bulk
curl -X POST 'https://api.crawlbase.com/storage/bulk?token=YOUR_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{ "rids": ["RID1","RID2","RID3"], "auto_delete": true }'
rids
string[]required
Array of RIDs to fetch. Maximum 100 per request — anything past 100 is silently dropped.
auto_delete
booleanfalse
When true , each successfully-returned entry is deleted from storage in the same call. Use this when you're draining a queue of one-shot results and don't need them retained.

The response is a JSON array, one object per returned RID. The body field is base64-encoded and gzip-compressed — base64-decode then gzip-inflate to get the original page.

[
  {
    "stored_at": "2021-03-01T14:22:58+02:00",
    "original_status": 200,
    "pc_status": 200,
    "rid": "RID1",
    "url": "https://example.com/a",
    "body": "H4sIAAAA…"  // base64(gzip(html))
  },
  {
    "stored_at": "2021-03-01T14:30:51+02:00",
    "original_status": 200,
    "pc_status": 200,
    "rid": "RID2",
    "url": "https://example.com/b",
    "body": "H4sIAAAA…"
  }
]

Bulk delete

Delete up to a list of RIDs in one call. Returns a per-RID status so you can spot the ones that were already gone or failed.

POSThttps://api.crawlbase.com/storage/bulk_delete
curl -X POST 'https://api.crawlbase.com/storage/bulk_delete?token=YOUR_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{ "rids": ["RID1","RID2","RID3"] }'

Response is a JSON array, one entry per submitted RID. status: true means the entry was deleted; status: false with result: "Not Found" means the RID didn't exist (already cleaned up, expired, or never written).

[
  { "rid": "RID1", "result": "Deleted",   "status": true  },
  { "rid": "RID2", "result": "Not Found", "status": false },
  { "rid": "RID3", "result": "Failed",    "status": false }
]
Deletion is irreversible

Double-check the RID list before sending. There is no soft-delete or undo — if you need a recoverable workflow, retrieve with auto_delete=false first and only call /bulk_delete once you've persisted the body locally.

Delete a single page

Drop one entry from storage by RID. Use DELETE /storage with the RID on the query string.

DELETEhttps://api.crawlbase.com/storage
curl -X DELETE 'https://api.crawlbase.com/storage?token=YOUR_TOKEN&rid=RID'

Three response shapes:

OutcomeBody
Found and deleted{"success": "The Storage item has been deleted successfully"}
Found but delete failed{"error": "The Storage item could not be deleted"}
Not in storage{"error": "Not Found"}

List RIDs

Page through the RIDs in your storage area — the inventory call. For datasets larger than a single response, use scroll-based pagination ( scroll=true seeds a scroll session and returns a scroll_id you replay on subsequent calls).

GEThttps://api.crawlbase.com/storage/rids
# First page
curl 'https://api.crawlbase.com/storage/rids?token=YOUR_TOKEN&limit=100&scroll=true'

# Next page — replay the scroll_id from the previous response
curl 'https://api.crawlbase.com/storage/rids?token=YOUR_TOKEN&scroll_id=dXVlcnlUaGVuRmV0Y2g7…'
limit
integeroptional
Maximum number of RIDs to return per call. Cap is 10000. No default — set this explicitly.
scroll
booleanfalse
When true , the response includes a scroll_id you can replay to fetch the next page. Without it you only get the first page.
scroll_id
stringoptional
Token from a previous response. Replay it to advance the scroll. Don't pass scroll=true on follow-ups — only on the first call.
scroll_order
asc | descdesc
Order RIDs by stored timestamp. Default is newest first.
{
  "rids": ["RID1", "RID2", "RID3", "..."],
  "scroll_id": "dXVlcnlUaGVuRmV0Y2g7NTs1NDpDV…"
}
Scroll sessions expire

A scroll_id is good for ~15 seconds of inactivity. If you see "Scroll session has expired or is invalid" , start over with a fresh scroll=true request — the cursor's gone.

Total count

Single integer: how many pages are currently in your storage area.

GEThttps://api.crawlbase.com/storage/total_count
curl 'https://api.crawlbase.com/storage/total_count?token=YOUR_TOKEN'

# Response
# { "totalCount": 5491078 }

Retention & pricing

  • Stored pages are kept for 14 days by default . Extend on enterprise plans.
  • Each store=true call counts as a single request — no extra charge.
  • Retrieval (this endpoint) is free . Read as many times as you need.
  • If a page is re-crawled with store=true , the new version replaces the old.
When storage shines

Audit trails ("what did the page say when we crawled it?"), reprocessing pipelines (re-parse stored HTML when your scraper logic improves), and serving cached results to readers without re-crawling.