CORE’s mission is to aggregate all open access research outputs from repositories and journals
worldwide and make them available to the public. In this way CORE facilitates free unrestricted
access to research for all.
supports the right of citizens and general public to access the results of research towards which they contributed by paying taxes,
facilitates access to open access content for all by offering services to general public, academic institutions, libraries, software developers, researchers, etc.,
provides support to both content consumers and content providers by working with digital libraries, institutional and subject repositories and journals,
enriches the research content using state-of-the-art technology and provides access to it through a set of services including search, API and analytical tools,
contributes to a cultural change by promoting open access, a fast growing movement.
CORE harvests openly accessible content available according to the
By 'open access' to this literature, we mean its free availability on the public internet,
permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles,
crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial,
legal, or technical barriers other than those inseparable from gaining access to the internet itself.
The only constraint on reproduction and distribution, and the only role for copyright in this domain,
should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
CORE uses information from various registries, such as OpenDOAR and DOAJ, to include new repositories, journals and archives into CORE. If your repository, journal or archive is already registered with some authoritative registry, you don't need to do anything. If your repository, journal or archive has not been registered yet get in touch.
CORE harvests all metadata records in a repository, but it is in position to harvest full-text records in PDF only. We are working though to include other file types, such as HTML webpages, images etc.
CORE works at the level of repositories and cannot update specific records. You can upload the new record into your institutional repository/journal and CORE will synchronise it at the next scheduled re-harvesting.
Google Scholar is a search engine with scholarly research papers but it is not designed to aggregate repository and journal systems. More specifically:
Google Scholar crawls and indexes research papers that can be found on the web and links to the original source, while CORE harvests and caches the full-text.
Google Scholar limits its access only at the granularity level, i.e. its search engine, whereas CORE is in position to extend access to raw data and, apart from the CORE search engine.
Google Scholar offers both closed access and open access resources, while CORE offers open access resources only, enabling immediate access to full-text.
The audience is different. Even though CORE has a search engine where users can retrieve scientific literature, it focuses also on other research stakeholders, such as text miners and repository managers, and offers services designed for them, such as the CORE API and Datasets.
CORE harvests content that is available open access elsewhere, i.e. repositories and open access journals. When the “Download” button is missing, it means that the full-text is not available from the hosting service and CORE displays only the metadata of this record.
In order to realise the data transfer and regular data updates of CORE and your system, CORE uses a variety of protocols to ingest the content. The easiest way to get your content integrated with CORE is the OAI-PMH protocol. If you wish to join CORE get in touch.
We would expect this might take about one hour for a typical repository and in some repository systems, such as EPrints, most of these recommendations are followed by default. You can find more details here.
We mainly support oai_dc, the mainstream metadata format used in the OAI-PMH Protocol, utilising the Dublin Core vocabulary, a popular vocabulary for bibliographic data. We also support RIOXX, a richer metadata protocol, used mostly by the UK repositories.
To provide its service, it is essential for CORE to be able to store a cached copy of the harvested content. This is needed to verify open access sources, offer analytical services, support for text-mining, recommendation tools, etc. By cashing a copy of the harvested resource, CORE is not different from many commercial and non-commercial, academic and non-academic, search engines including Google or CiteSeerX. The primary difference from such systems is that CORE caches only copies of open access content. More information on the benefits of this approach is available in the “CORE: Three Access Levels to Underpin Open Access” article.
CORE uses information from various registries, such as OpenDOAR, to include new repositories, journals and archives into CORE. If the circumstances have changed in your repository, you can restrict harvesting and crawling activities by modifying your rules in your “robots.txt” file by using the Standard for Robots Exclusion. This will also guarantee the content cannot be cached by search engines and harvesting systems. In addition, you could withdraw your repository from all open access registries lists; when this takes place, please notify us.
CORE aggregates content from repositories registered in OpenDOAR, journals registered in DOAJ or those that requested their content to be aggregated. This means that all the content sources aggregated by CORE must be open access as this is a requirement for the providers to be included in these registries. According to the official BOAI definition of open access, CORE is allowed to, "distribute, search, or link to the full texts of articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the Internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain is to give authors control over the integrity of their work and the right to be properly acknowledged and cited."
CORE’s system is fully automated and relies on data made available in a machine readable form. If your repository hosts full-text with a restrictive license that prohibits harvesting, this needs to be properly communicated in a machine readable form. All non open access items should be blocked in the robots.txt file. If this information is provided in the metadata for each record and CORE exposes the full-text, please get in touch with us.
CORE’s system is fully automated and relies on data made available in a machine readable form. Our system understands that the full-text of a record was removed only when the record is marked as deleted in the metadata of your repository. See here how to take down full-text from CORE.
If there is full-text content that appears in CORE but not in the hosting service you can take it down via the CORE Repositories Dashboard anytime without notifying us. If you don’t have access to the Dashboard visit this page.
Yes, this is possible but there is usually a cost associated with it. Please email us the name of your company/organisation, business entity, the number of requests you estimate to send and how often you will send them and we will get back to your with a quote.
Please note the dataset has been created from information that was publicly available on the Internet. Every effort has been made to ensure this dataset contains only open access content. We have included only content from repositories and journals that are listed in registries where the condition for inclusion is the provision of content under open access compatible license. However, as metadata are often inconsistent, licensed information is often not machine readable, and repositories from time to time leak information that is not open access, we cannot take any responsibility for the license of the content in the dataset. It is therefore up to the user of this dataset to ensure that the way in which they use the dataset does not breach copyright. The dataset is in no way intended for the purposes of reading the original publications, but for machine processing only.
If you have access to the CORE Repositories Dashboard, log into the Dashboard to get the instructions on how to download the CORE Recommender. Otherwise, visit our recommender registration page, where you will also find the installation instructions.
The CORE recommender uses the popular content-based filtering system. The similar resources that appear in the CORE Recommender and their quality are highly impacted by the metadata information supplied by the repository of origin. If that information is incorrect or incomplete, you should contact the repository of origin. To improve the recommendations, you can use the feedback button, with which you can remove any undesirable articles.
Yes, you can use the CORE API for commercial purposes, but you
need to contact us first and let us know about it. Commercial
use of our API could increase the traffic in our servers and we
would like to be aware of the expected traffic.
If you want to support CORE around the world please have a look at our
I recently completed my MSc in Computer Science at The University of Hertfordshire.
I joined the team at KMi as a PhD research student with Dr. Petr Knoth and Professor Zdenek
Zdrahal as my supervisors. My thesis will expand on the original work on Semantometrics
completed by Dr Knoth and Dasha Herrmannova. My areas of interest include semantics,
natural language processing and text mining.
PhD student & Data Analyst
Works with the rest of the team on both front- and back-end development. Enjoys maintaining
and improving CORE as it provides her with useful data for her PhD. Dreams of turning CORE
into a hub for Semantometric research. Book lover and sports enthusiast in her spare time.
Software developer, sports lover, chocolate addicted, table tennis world champion.
Works on maintaining CORE service, continuously enhancing it with new features and
developing new modules for it. Among the rest of the team envisions the creation of the
world's largest full text open access dataset.
While not on his computer coding, he enjoys playing chess, scratching his guitar and dreams
of climbing Alpe d'Huez or Col de la Madone with his bike.
Developer and Continuous Integration secret lover
Works on improving and and maintaining CORE services, especially in the frontend side. Once
he was a Super Mario plumber.
A communications guru in the 'dominated by developers' CORE team. Her job is to liaise with
the data providers and the technical group. As an open access advocate she hopes that one
day there will be no access barriers to scientific literature. In her spare time she likes
to swim, read and travel.
Founder; Product & Team Leader
Started by developing the first CORE prototype in 2010 out of sheer frustration over the
difficulty of text-mining open access papers. Responsible for leading the team, development
& research effort, funding and product strategy. Believes in free unrestricted access to
research for all. Excited by the opportunities of machine access to research papers to
extract and create new knowledge. Green tea lover & retired ice skater.
Developer and Data & Harvesting Expert