All the data in one single place

Download all the CORE data in a single package.

Matching your needs

Prototype, analyse and process your data directly on your infrastructure.

Largest full text collection

World's largest full text collection of scientific papers for machine processing.

Simple to use

Accessible and easy to understand documentation and processes.

How it works

How it works

CORE data can be downloaded as a bulk dataset, allowing you to process it on your own computer or within your infrastructure. The dataset provides a harmonised and enriched data format for access content from across our data providers. This is perfect for prototyping new methods, especially when intensive data processes need to be run. It is also a good choice for data analysis and text mining.

Access documentationsee data statistics
Eric Olson

Eric Olson

Consensus, co-founder and CEO

“To build the product we have always envisioned, having a robust and comprehensive dataset of machine-readable, peer-reviewed papers is absolutely essential. We are incredibly grateful to be able to partner with an organization like CORE that not only can meet our data needs, but also shares our vision of making science more accessible and consumable. This unique combination of best-in-class data-offering and mission-alignment makes CORE an ideal partner for Consensus.”

SEE More testimonials

Dataset 2020-03-18

Full dataset (~400GB, 2.1TB Extracted)

Dataset 2018-03-01

Metadata only dataset (beta) (127 GB) - 123M metadata items, 85.6M items with abstract

With full text dataset (beta) (330 GB) - 123M metadata items, 85.6M items with abstract, 9.8M items with fulltext.

Documentation and access to previous datasets.

Older dumps of the CORE Dataset are free and ODC-By licensed. Organisations that register for the CORE Dataset can purchase a licence for the more recent datasets. Sustaining Members receive access to the most recent datasets as a free member benefit.

If you use CORE in your work, we kindly request you to cite one of our publications.

cite publication

What’s included

The Dataset provides you with:

  • The entire CORE's corpus of both metadata and full texts in a machine processable format.
  • Detailed documentation on how to download the CORE dataset and how data is organised.
  • Access to a very large corpus of research documents at the level of full texts, perfect for training machine learning models, NLP and text mining.
  • Unique content from the network of open repositories, in addition to research papers with a registered DOI.
What’s included
Latest

dataset

2023

year

datasets

2022

year

datasets

2021

year

datasets

2020

year

Register for the CORE Dataset

Enter your email address to register for our datasets or access the download page if you have already registered. Please enter your institutional email if you are registering in an institutional capacity.

We will send the instructions to this address

The terms of use for the dataset are available on our datasets download page.