# RIDs

The /rids endpoint is designed to provide clients with a bulk list of RIDs (Request IDs) from the storage area. This API supports pagination through a scroll mechanism, enabling efficient data retrieval for large datasets.

# Parameters

The only required param is the token. See below the additional params to be used for efficient RIDs pulling:

  • limit (optional): Specifies the maximum number of RIDs to return. Defaults to 10k, with a maximum allowable value of 1k. Use this parameter to control the size of the data returned.

  • scroll (optional): When set to true, this parameter enables scroll-based pagination for the request. It initiates a scrolling session that provides a scroll_id used for subsequent requests.

  • scroll_id (optional): An identifier from a previous request's response to fetch the next set of RIDs. This parameter is used for pagination.

  • scroll_order (optional): Determines the order of RIDs returned. Acceptable values are asc (ascending) or desc (descending). The default order is desc.

# Request

To retrieve the latest 100 RIDs:

curl 'https://api.crawlbase.com/storage/rids?token=_USER_TOKEN_&limit=100'

# Response

A successful response will return an array of RIDs and, if applicable, a scroll_id for further pagination:

  "rids": ["RID1", "RID2", ...],
  "scroll_id": "dXVlcnlUaGVuRmV0Y2g7NTs1NDpDV..."
  • rids: An array containing the requested RIDs.
  • scroll_id: A token for retrieving the next set of results. This value is critical for pagination and is provided when more data is available beyond the current request's limit.

# Scrolling

To efficiently navigate through large datasets, clients can opt for scroll-based pagination by setting the scroll parameter to true. This method is ideal for sequential data retrieval where the total dataset size exceeds the limit parameter's maximum value.

# Initial Request with Scroll

curl 'https://api.crawlbase.com/storage/rids?token=_USER_TOKEN_&limit=100&scroll=true'

This request starts a scroll session and returns the first batch of RIDs along with a scroll_id, which is essential for fetching the next batch.

# Fetching Subsequent Batches

To retrieve additional RIDs, use the provided scroll_id without specifying the scroll parameter again. The scroll_id maintains the state of the pagination.

curl 'https://api.crawlbase.com/storage/rids?token=_USER_TOKEN_&scroll_id=dXVlcnlUaGVuRmV0Y2g7NTs1NDpDV...'

# Notes

For efficient use of the /rids API, please take note of the following:

  • The limit parameter's cap at 10k ensures optimal server performance and resource management. Pagination via scroll_id for accessing large datasets.

  • The initial request with scroll=true initiates the scroll session. The response includes a scroll_id for subsequent data retrieval.

  • The scroll_id is pivotal for continuous pagination. Ensure to include it in follow-up requests until all desired data has been retrieved.

  • Scrolling sessions expire after 15 seconds of inactivity, after which the scroll_id becomes invalid. To access more data beyond this period, initiate a new request with scroll=true.

  • If you receive an error message stating, Scroll session has expired or is invalid, it means the scroll context you're trying to use is no longer available. This usually happens if the scroll timeout has elapsed. In this case, initiate a new scroll request.

By adhering to these guidelines and utilizing the provided parameters effectively, you can maximize the utility and performance of the /rids endpoint for their data retrieval needs.