Open data is one of the quiet advantages of modern analytics. Governments, research institutions, and nonprofits publish enormous, freely usable datasets, and most teams never tap them. Used well, open data feeds trend analysis, measures the effect of public policy, sharpens product decisions, and grounds a model in real numbers instead of guesses.

This guide collects strong, genuinely free open data sources, grouped by subject so you can jump to what your project needs: economic and financial, government, health, science, academic, environmental, crime, business directory, media, social, and a few catch-all repositories. For each one you get what it provides and how people typically use it, plus a short note at the end on using open data responsibly.

What is open data?

Open data is data that anyone can access, use, and share without paying for it or asking special permission. In practice that rests on three ideas. Anyone can access it, so the files are public rather than locked behind a rejected request or a long-dead format. Anyone can use it, so individuals, companies, and governments may build on it freely, and it excludes the sensitive personal information that should never be public. Anyone can share it, so the data can be reused and redistributed.

Government agencies and nonprofit organizations host most of it, because storing and serving data at scale is expensive and they already do it as part of their mission. Much of it carries a license such as Creative Commons, which lets you use the data freely while telling you how to credit the source. That license is the part most people skip, and it is the part that keeps your use clean, so read it before you build on a dataset.

Many sources, one project. Government, scientific, financial, and health portals each publish open datasets you can combine into a single project, as long as you respect each licence.

Economic and financial data

Markets, trade flows, and macroeconomic indicators, useful for research, dashboards, and anything that needs a reliable baseline of numbers.

Global Financial Data

GFD offers free subscriptions that open up global market and economic data, alongside periodicals, books, and a deep archive of historical series. It is a good starting point when you need long time horizons rather than just recent figures.

U.N. Comtrade Database

Curated by Comtrade Labs, this free database covers global trade and exposes it through an API, so you can pull figures programmatically instead of downloading spreadsheets by hand. It also bundles tools for visualizing and extracting the data.

World Bank Open Data

One of the best single sources for GDP, logistics, global energy consumption, and the disbursement and management of global funds. It is updated frequently, and some datasets ship with their own visualization tools.

Financial Times

Though it presents as an online newspaper, the Financial Times is also a broad source of information on global markets across the Americas, Europe, Africa, and Asia, which makes it useful for qualitative context around the raw numbers.

Government and global data

National statistics offices and intergovernmental bodies publish some of the richest open data anywhere, covering everything from crime to demographics.

Data.gov.uk

The UK counterpart to the United States data.gov, spanning categories from crime and justice to defense and government spending. A practical first stop for anyone working with British public data.

U.K. Data Service

A complement to Data.gov.uk with recent datasets on social trends, politics, finance, and international relations, aimed more at the research community.

Open Data Network

A search engine for open datasets, with advanced filters that help you find public safety, finance, infrastructure, housing, and development data quickly rather than browsing site by site.

UNICEF

Open datasets that monitor and report on children and women worldwide, including disease outbreaks, gender and education, and attitudes toward social norms.

Data.gov

One of the most comprehensive open data portals in the world, covering science and research through to manufacturing and climate. Data comes in CSV, JSON, and XML, and the metadata is updated frequently so you can trust that what you pull is current. If you plan to load it into a spreadsheet or database, our notes on JSON versus CSV help you pick the right format.

U.S. Census Bureau

The definitive open source for demographic data on U.S. residents, assembled from federal, state, and local governments plus private companies. Indispensable for population, housing, and economic demography work.

Health data

Public health agencies and research institutes publish detailed health datasets, valuable to researchers, policymakers, and anyone building tools in the health space.

HealthData.gov

A repository of more than 3,000 datasets spanning over 125 years, built to give entrepreneurs, researchers, and policymakers access to high-value health data.

Broad Institute

A clear, well-organized source of open health and scientific research data, with particular depth on various cancers.

Food and Drug Administration

The FDA publishes data on foodborne illnesses and contaminants, recalls, and news about dietary supplements in the United States.

National Cancer Institute

Part of the National Institutes of Health and a complement to the Broad Institute, with advanced filters that let you narrow in on specific cancer-related datasets.

World Health Organization

One of the most comprehensive global repositories for mortality rates, disease outbreaks, mental illness, and health financing.

Center for Disease Control

The CDC offers a wide range of free datasets on chronic illness, cancer, heart disease, congenital conditions, and much more.

NHS Digital

An easy-to-use free service providing high-quality datasets on the state of health and social care systems in England.

Scientific data

Earth, space, and cross-disciplinary research data, often at very large scale.

NASA Earth Data

Free Earth science data from NASA, with measurements across the atmosphere, the cryosphere, land, the ocean, and the calibrated radiance of the sun.

Open Science Data Cloud

OSDC holds more than a petabyte of large datasets and lets scientific researchers manage, share, and analyze open data efficiently across many disciplines.

NASA Planetary Data System

Thousands of open datasets about the planets in our solar system, available to researchers, educators, students, and the general public alike.

Academic data

Education statistics, survey research, and scholarly work, useful for institutions, social scientists, and literature reviews.

National Center for Education Statistics

The NCES publishes datasets that educational institutions use to improve retention and graduation rates and to understand how students learn.

Pew Research Center

One of the country's largest open data sources, aggregating high-quality survey datasets. Survey data is typically released two years after the report is published, and you create a free account to access it.

Google Scholar

Searching for datasets here works much like a normal search engine, giving you a wide pool of educational, peer-reviewed sources to draw on.

Environmental data

Energy, climate, and environmental health datasets for sustainability research and reporting.

IEA Atlas of Energy

Open datasets from the International Energy Agency that let you view global consumption rates for energy and electricity.

Climate Data Online

CDO is a strong source for historical and near-real-time climate data worldwide, including daily summaries, marine data, and weather radar.

National Center for Environmental Health

Curated by the CDC, this repository highlights national data systems that collect public health and environmental data from a country-wide perspective.

Crawlbase Crawling API

Many of these portals expose APIs, but plenty of useful public data still lives on ordinary web pages that render with JavaScript or rate-limit heavy traffic. When the source is a site rather than a clean download, the Crawlbase Crawling API requests the page for you and returns the HTML, handling JavaScript rendering, IP rotation, and CAPTCHAs so collection does not stall. You keep your own parsing code and start with 1,000 free requests.

Crime and drug data

Criminal justice and substance-abuse datasets, mostly from U.S. federal agencies and the United Nations.

National Archive of Criminal Justice Data

The NACJD provides public and restricted-access datasets on recidivism, gang violence, terrorism, hate crimes, and more.

National Institute on Drug Abuse

NIDA hosts datasets significant to anyone studying tobacco, alcohol, illicit drugs, and prescription opioid abuse in the United States.

Uniform Crime Reporting Program

The FBI aggregates statistics from more than 18,000 cities, colleges, counties, states, and tribes through this program.

Bureau of Justice Statistics

Beyond arrest-related deaths, the BJS collects figures such as annual emergency room data and firearm inquiries.

United Nations Office on Drugs and Crime

The UNODC regularly publishes datasets on drug production and trafficking, homicide rates, corruption, and organized crime.

Business directory data

Company records and review data, useful for market research, competitive analysis, and lead generation.

Open Corporates

One of the world's largest open company databases, with hundreds of millions of corporate records across virtually every country.

Glassdoor

Job-review sites carry a surprising amount of open data. On Glassdoor you can find gender pay analysis, monthly salary reports, and local pay reports.

Yelp

Yelp's open datasets contain millions of business reviews, which you can analyze to surface patterns and trends in business sentiment.

Media and journalism data

News archives and data-journalism sources, often available through developer APIs.

Associated Press Developer

The AP's developer services let you build integrations around news content, polling data, and metadata, in a similar spirit to the New York Times developer network.

FiveThirtyEight

A widely respected data-journalism source covering topics as varied as politics and sports, with datasets that back its published analysis.

The New York Times Developer Network

Register an app and you can access NYT abstracts, links, multimedia, books, listings, and stories, with text reaching as far back as 1851.

Marketing and social media data

Real-time sentiment, search interest, and social-graph data for marketers and analysts.

Social Mention

A search engine for real-time social data, surfacing sentiment, keyword usage, users, and hashtags at a broad scale.

Google Trends shows what the world is searching for, which lets marketers time campaigns around rising interest rather than guessing.

Graph API

The Graph API is a set of APIs that let apps read and write data from the Facebook social graph. Treat it as a structured window into public Facebook data, and note that its access is governed by Facebook's own terms.

Other repositories and aggregators

Catch-all sources that consolidate many of the datasets above and a few unusual ones.

Google Public Data Explorer

Many of the sources on this list are consolidated here, which makes it an excellent place to start when you are not sure where to look. Google Dataset Search complements it as a free way to find datasets across the web.

Datasets SubReddit

A Reddit community where people share interesting datasets, often scraped or compiled in languages such as R, useful for finding off-the-beaten-path data.

DBpedia

Think of Wikipedia as a database rather than a website. DBpedia lets you explore millions of Wikipedia entries and the relationships between them through a single query interface, which is why it has underpinned AI projects at companies including Apple, Google, and IBM.

Is big data open source?

A growing share of big-data tooling is open source, including database systems such as MongoDB, a scalable NoSQL store well suited to big-data workloads. Open-source components also cover the rest of the pipeline, from data collection and ingestion through processing and storage, so you can assemble a full stack without proprietary licensing. Open tools and open data pair naturally: free datasets give you the input, and open-source systems give you somewhere to put it.

Using open data responsibly

Free does not mean unconditional. Most open datasets carry a license, often Creative Commons, that tells you how you may use the data and how to credit the source, so read it and attribute correctly. When a source exposes an API, prefer it over scraping the site and stay within its rate limits and terms. If a dataset touches personal information, handle it under rules like GDPR and CCPA even when the data is technically public. When you collect from web pages rather than downloads, keep your request rate reasonable and respect each site's robots.txt. The goal is to be a good citizen of the open-data commons you are benefiting from. Once the data is in hand, our guides on structuring and cleaning data and analyzing it with pandas help you turn raw files into insight.

Recap

Key takeaways

  • Open data is access, use, and share. A dataset counts as open when anyone can reach it, build on it, and redistribute it freely, usually under a license such as Creative Commons.
  • Start from the category you need. Economic, government, health, science, academic, environmental, crime, business, media, and social each have established free sources, so pick the group before the source.
  • Government and intergovernmental portals are the backbone. Data.gov, the U.S. Census Bureau, the World Bank, the WHO, and similar bodies publish some of the deepest, best-maintained open data anywhere.
  • Aggregators save discovery time. Google Public Data Explorer, Google Dataset Search, and DBpedia consolidate many sources, which is the fastest way in when you do not know where to look.
  • Respect the license and the rules. Read the license, attribute the source, prefer APIs over scraping, and apply privacy law when personal data is involved.

Frequently Asked Questions (FAQs)

What is the difference between open data and free data?

All open data is free to obtain, but open data goes further: it must also be free to reuse and redistribute, usually under an explicit license. Some data is free to view yet restricted in how you may use or share it, so check the license rather than assuming that free means open.

Which open data source should I start with?

Start from your subject. For broad U.S. government data, Data.gov is the natural entry point; for global economics, the World Bank; for demographics, the U.S. Census Bureau; for health, the WHO or HealthData.gov. If you are unsure where to look, Google Public Data Explorer and Google Dataset Search aggregate many sources in one place.

Can I use open data in a commercial product?

Often yes, but it depends entirely on the license attached to the specific dataset. Many open licenses, including most Creative Commons variants, permit commercial use as long as you attribute the source correctly. Always read the license terms for each dataset before shipping anything built on it.

Do I need to scrape open data or can I download it?

Most major portals offer direct downloads in CSV, JSON, or XML, or an API, and those should always be your first choice. Scraping only comes into play when useful public data lives on web pages with no download or API. In that case, collect it politely and within the site's terms.

Is big data the same as open data?

No. Big data describes the scale and complexity of datasets, while open data describes their licensing and accessibility. A dataset can be open without being large, and large datasets are frequently proprietary. They overlap when a big dataset is also published openly, as several sources on this list are.

How do I keep my open data use compliant?

Read and follow each dataset's license, attribute the source as it requires, prefer official APIs over scraping, and respect rate limits and robots.txt when you do collect from pages. If the data includes personal information, handle it under applicable privacy laws such as GDPR and CCPA even though the data is public.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available