Generic Extractors
Two universal extractors for sites without a named scraper. Define your own fields and selectors — we'll handle the request, anti-bot, and parsing.
Overview
Generic extractors fill the gaps between named scrapers. When the site you need isn't in the catalog yet — niche marketplaces, regional retailers, internal portals — these two scrapers let you describe the page yourself and we run the extraction.
generic-extractor takes a CSS-selector schema (or our auto-detection) and returns the parsed values. email-extractor is purpose-built for one common task: pulling every email address visible on a page, regardless of how the page hides them (mailto links, plain text, slightly-obfuscated patterns like name [at] domain.com).
Common use cases:
- Long-tail catalog ingestion: drop a schema for a regional retailer, run nightly imports without us shipping a dedicated scraper for it.
- Lead generation: walk a list of company websites, run
email-extractor, build a contactable prospect list (subject to your jurisdiction's outbound-email rules). - Research pipelines: extract structured fields (titles, headings, meta) from any page for downstream NLP — useful when you need normalised input from heterogeneous sources.
- Site monitoring: define a schema once, monitor a competitor's pricing or copy changes by diffing the parsed JSON over time.
Both scrapers ride the same anti-bot, residential-routing, and JS-rendering stack as the named scrapers — so the auto-detection works on JS-heavy SPAs without you wiring up a separate browser. If a target needs a dedicated parser eventually, the schema you wrote is a good handoff document for our scraper team.
Generic extractors
Two universal building blocks — one for arbitrary structured extraction, one for the always-needed task of pulling emails. Use these when there's no named scraper for the site you care about.
- Generic Extractor — schema-driven HTML extractor. Pass selectors, get back structured JSON.
- Email Extractor — pulls every email address visible on a page.
Example call
Below: a generic-extractor call against Stack Overflow's homepage. With no schema specified, the scraper returns auto-detected metadata — page title, language, and headings grouped by level. Pass a custom selectors object (see the full reference) to extract specific fields.
curl 'https://api.crawlbase.com/?token=YOUR_TOKEN' \
--data-urlencode 'url=https://stackoverflow.com/' \
--data-urlencode 'scraper=generic-extractor' -GSample response
{
"url": "https://stackoverflow.com/",
"title": "Stack Overflow - Where Developers Learn...",
"language": "en",
"headings": {
"h1": ["Where developers grow together"],
"h2": ["Hot Network Questions"]
}
}Full reference (parameters, all 4 SDK languages, edge cases):Generic Extractor — full reference

