Cloud Storage · Crawlbase Documentation

Endpoint

GEThttps://api.crawlbase.com/storage

# Two operations: store (write) and retrieve (read).
# Storage is implicit - set store=true on a Crawling API call to write.
# Use this endpoint to read.

Storing pages

You don't call this endpoint to store. Instead, add store=true to any Crawling API call. The page gets stored automatically and you receive an rid in the response.

curl 'https://api.crawlbase.com/?token=YOUR_TOKEN' \
  --data-urlencode 'url=https://example.com' \
  --data-urlencode 'store=true' -G

# Response includes rid: a1B2c3D4e5F6

Retrieving pages

By RID

curl 'https://api.crawlbase.com/storage?token=YOUR_TOKEN&rid=a1B2c3D4e5F6'
from crawlbase import StorageAPI

api = StorageAPI({'token': 'YOUR_TOKEN'})
res = api.get(rid='a1B2c3D4e5F6')
print(res['body'])
const { StorageAPI } = require('crawlbase');
const api = new StorageAPI({ token: 'YOUR_TOKEN' });

const res = await api.get({ rid: 'a1B2c3D4e5F6' });
console.log(res.body);

By URL

Look up a stored page by its original URL. Returns the most recent stored version.

curl 'https://api.crawlbase.com/storage?token=YOUR_TOKEN&url=https%3A%2F%2Fexample.com'

Parameters

token

stringrequired

Your Crawlbase token.

rid

stringone of

Request identifier returned when the page was stored.

url

stringone of

Original URL. Returns the most recent stored version. URL-encode it.

format

html | jsonhtml

Response envelope. json wraps body and metadata.

Bulk retrieve

Pull up to 100 stored pages in one round-trip by RID. POST a JSON body with the list and (optionally) ask the server to delete each entry as it's returned - useful for "drain the queue" pipelines that don't need to keep storage warm.

POSThttps://api.crawlbase.com/storage/bulk

curl -X POST 'https://api.crawlbase.com/storage/bulk?token=YOUR_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{ "rids": ["RID1","RID2","RID3"], "auto_delete": true }'

rids

string[]required

Array of RIDs to fetch. Maximum 100 per request - anything past 100 is silently dropped.

auto_delete

booleanfalse

When true , each successfully-returned entry is deleted from storage in the same call. Use this when you're draining a queue of one-shot results and don't need them retained.

The response is a JSON array, one object per returned RID. The body field is base64-encoded and gzip-compressed - base64-decode then gzip-inflate to get the original page.

[
  {
    "stored_at": "2021-03-01T14:22:58+02:00",
    "original_status": 200,
    "pc_status": 200,
    "rid": "RID1",
    "url": "https://example.com/a",
    "body": "H4sIAAAA…"  // base64(gzip(html))
  },
  {
    "stored_at": "2021-03-01T14:30:51+02:00",
    "original_status": 200,
    "pc_status": 200,
    "rid": "RID2",
    "url": "https://example.com/b",
    "body": "H4sIAAAA…"
  }
]

Bulk delete

Delete up to a list of RIDs in one call. Returns a per-RID status so you can spot the ones that were already gone or failed.

POSThttps://api.crawlbase.com/storage/bulk_delete

curl -X POST 'https://api.crawlbase.com/storage/bulk_delete?token=YOUR_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{ "rids": ["RID1","RID2","RID3"] }'

Response is a JSON array, one entry per submitted RID. status: true means the entry was deleted; status: false with result: "Not Found" means the RID didn't exist (already cleaned up, expired, or never written).

[
  { "rid": "RID1", "result": "Deleted",   "status": true  },
  { "rid": "RID2", "result": "Not Found", "status": false },
  { "rid": "RID3", "result": "Failed",    "status": false }
]

Deletion is irreversible

Double-check the RID list before sending. There is no soft-delete or undo - if you need a recoverable workflow, retrieve with auto_delete=false first and only call /bulk_delete once you've persisted the body locally.

Delete a single page

Drop one entry from storage by RID. Use DELETE /storage with the RID on the query string.

DELETEhttps://api.crawlbase.com/storage

curl -X DELETE 'https://api.crawlbase.com/storage?token=YOUR_TOKEN&rid=RID'

Three response shapes:

Outcome	Body
Found and deleted	`{"success": "The Storage item has been deleted successfully"}`
Found but delete failed	`{"error": "The Storage item could not be deleted"}`
Not in storage	`{"error": "Not Found"}`

List RIDs

Page through the RIDs in your storage area - the inventory call. For datasets larger than a single response, use scroll-based pagination ( scroll=true seeds a scroll session and returns a scroll_id you replay on subsequent calls).

GEThttps://api.crawlbase.com/storage/rids

# First page
curl 'https://api.crawlbase.com/storage/rids?token=YOUR_TOKEN&limit=100&scroll=true'

# Next page - replay the scroll_id from the previous response
curl 'https://api.crawlbase.com/storage/rids?token=YOUR_TOKEN&scroll_id=dXVlcnlUaGVuRmV0Y2g7…'

limit

integeroptional

Maximum number of RIDs to return per call. Cap is 10000. No default - set this explicitly.

scroll

booleanfalse

When true , the response includes a scroll_id you can replay to fetch the next page. Without it you only get the first page.

scroll_id

stringoptional

Token from a previous response. Replay it to advance the scroll. Don't pass scroll=true on follow-ups - only on the first call.

scroll_order

asc | descdesc

Order RIDs by stored timestamp. Default is newest first.

{
  "rids": ["RID1", "RID2", "RID3", "..."],
  "scroll_id": "dXVlcnlUaGVuRmV0Y2g7NTs1NDpDV…"
}

Scroll sessions expire

A scroll_id is good for ~15 seconds of inactivity. If you see "Scroll session has expired or is invalid" , start over with a fresh scroll=true request - the cursor's gone.

Total count

Single integer: how many pages are currently in your storage area.

GEThttps://api.crawlbase.com/storage/total_count

curl 'https://api.crawlbase.com/storage/total_count?token=YOUR_TOKEN'

# Response
# { "totalCount": 5491078 }

Retention & pricing

Stored pages are kept for 14 days by default . Extend on enterprise plans.
Each store=true call counts as a single request - no extra charge.
Retrieval (this endpoint) is free . Read as many times as you need.
If a page is re-crawled with store=true , the new version replaces the old.

When storage shines

Audit trails ("what did the page say when we crawled it?"), reprocessing pipelines (re-parse stored HTML when your scraper logic improves), and serving cached results to readers without re-crawling.