Cloud Storage
Store crawled pages in Crawlbase's managed storage. Fetch them later by URL or RID. Skip the database, skip the S3 bucket, skip the cron job that wires them together.
Endpoint
# Two operations: store (write) and retrieve (read).
# Storage is implicit — set store=true on a Crawling API call to write.
# Use this endpoint to read.Storing pages
You don't call this endpoint to store. Instead, add store=true to any Crawling API call. The page gets stored automatically and you receive an rid in the response.
curl 'https://api.crawlbase.com/?token=YOUR_TOKEN' \
--data-urlencode 'url=https://example.com' \
--data-urlencode 'store=true' -G
# Response includes rid: a1B2c3D4e5F6Retrieving pages
By RID
curl 'https://api.crawlbase.com/storage?token=YOUR_TOKEN&rid=a1B2c3D4e5F6'from crawlbase import StorageAPI
api = StorageAPI({'token': 'YOUR_TOKEN'})
res = api.get(rid='a1B2c3D4e5F6')
print(res['body'])const { StorageAPI } = require('crawlbase');
const api = new StorageAPI({ token: 'YOUR_TOKEN' });
const res = await api.get({ rid: 'a1B2c3D4e5F6' });
console.log(res.body);By URL
Look up a stored page by its original URL. Returns the most recent stored version.
curl 'https://api.crawlbase.com/storage?token=YOUR_TOKEN&url=https%3A%2F%2Fexample.com'Parameters
json wraps body and metadata.Bulk retrieve
Pull up to 100 stored pages in one round-trip by RID. POST a JSON body with the list and (optionally) ask the server to delete each entry as it's returned — useful for "drain the queue" pipelines that don't need to keep storage warm.
curl -X POST 'https://api.crawlbase.com/storage/bulk?token=YOUR_TOKEN' \
-H 'Content-Type: application/json' \
-d '{ "rids": ["RID1","RID2","RID3"], "auto_delete": true }'true , each successfully-returned entry is deleted from storage in the same call. Use this when you're draining a queue of one-shot results and don't need them retained.The response is a JSON array, one object per returned RID. The body field is base64-encoded and gzip-compressed — base64-decode then gzip-inflate to get the original page.
[
{
"stored_at": "2021-03-01T14:22:58+02:00",
"original_status": 200,
"pc_status": 200,
"rid": "RID1",
"url": "https://example.com/a",
"body": "H4sIAAAA…" // base64(gzip(html))
},
{
"stored_at": "2021-03-01T14:30:51+02:00",
"original_status": 200,
"pc_status": 200,
"rid": "RID2",
"url": "https://example.com/b",
"body": "H4sIAAAA…"
}
]Bulk delete
Delete up to a list of RIDs in one call. Returns a per-RID status so you can spot the ones that were already gone or failed.
curl -X POST 'https://api.crawlbase.com/storage/bulk_delete?token=YOUR_TOKEN' \
-H 'Content-Type: application/json' \
-d '{ "rids": ["RID1","RID2","RID3"] }'Response is a JSON array, one entry per submitted RID. status: true means the entry was deleted; status: false with result: "Not Found" means the RID didn't exist (already cleaned up, expired, or never written).
[
{ "rid": "RID1", "result": "Deleted", "status": true },
{ "rid": "RID2", "result": "Not Found", "status": false },
{ "rid": "RID3", "result": "Failed", "status": false }
]Double-check the RID list before sending. There is no soft-delete or undo — if you need a recoverable workflow, retrieve with auto_delete=false first and only call /bulk_delete once you've persisted the body locally.
Delete a single page
Drop one entry from storage by RID. Use DELETE /storage with the RID on the query string.
curl -X DELETE 'https://api.crawlbase.com/storage?token=YOUR_TOKEN&rid=RID'Three response shapes:
| Outcome | Body |
|---|---|
| Found and deleted | {"success": "The Storage item has been deleted successfully"} |
| Found but delete failed | {"error": "The Storage item could not be deleted"} |
| Not in storage | {"error": "Not Found"} |
List RIDs
Page through the RIDs in your storage area — the inventory call. For datasets larger than a single response, use scroll-based pagination ( scroll=true seeds a scroll session and returns a scroll_id you replay on subsequent calls).
# First page
curl 'https://api.crawlbase.com/storage/rids?token=YOUR_TOKEN&limit=100&scroll=true'
# Next page — replay the scroll_id from the previous response
curl 'https://api.crawlbase.com/storage/rids?token=YOUR_TOKEN&scroll_id=dXVlcnlUaGVuRmV0Y2g7…'true , the response includes a scroll_id you can replay to fetch the next page. Without it you only get the first page.scroll=true on follow-ups — only on the first call.{
"rids": ["RID1", "RID2", "RID3", "..."],
"scroll_id": "dXVlcnlUaGVuRmV0Y2g7NTs1NDpDV…"
}A scroll_id is good for ~15 seconds of inactivity. If you see "Scroll session has expired or is invalid" , start over with a fresh scroll=true request — the cursor's gone.
Total count
Single integer: how many pages are currently in your storage area.
curl 'https://api.crawlbase.com/storage/total_count?token=YOUR_TOKEN'
# Response
# { "totalCount": 5491078 }Retention & pricing
- Stored pages are kept for 14 days by default . Extend on enterprise plans.
- Each
store=truecall counts as a single request — no extra charge. - Retrieval (this endpoint) is free . Read as many times as you need.
- If a page is re-crawled with
store=true, the new version replaces the old.
Audit trails ("what did the page say when we crawled it?"), reprocessing pipelines (re-parse stored HTML when your scraper logic improves), and serving cached results to readers without re-crawling.

