Joining CORE as data provider
To check whether your repository or journal is harvested by CORE, go to our data providers list.
CORE uses information from various registries, such as OpenDOAR and DOAJ, to include new repositories and journals into CORE. If your repository or journal is already registered with some authoritative registry, you don't need to do anything. If your repository or journal has not been registered yet get in touch.
CORE is an international service and harvests repositories from various locations around the world. This information is displayed in a map and can be accessed here.
CORE is a harvesting service and is not similar to research networking sites, e.g. ResearchGate or Academia.edu, where authors can deposit papers, so please do not email us the full text of your papers. If you have deposited your articles in a repository let us know the name of the repository. There are chances that we it already and if we don't we could start harvesting it.
CORE does not harvest all the repositories that exist in our database with the same frequency. If you have a question regarding a specific repository do get in touch with us.
Depending on the size of the repository and the existing traffic in CORE's servers, the harvesting can last from a couple of hours to a couple of weeks. If we experience any technical issues during that period, we will get in touch.
CORE harvests all metadata records in a repository, but it is in position to harvest full text records in PDF only. We are working though to include other file types, such as HTML webpages, images etc.
Yes it can, provided that the repository offers its content as Open Access.
CORE works at the level of repositories and cannot update specific records. You can upload the new record into your institutional repository or journal and CORE will synchronise it at the next scheduled re-harvesting.
No, CORE follows an automated re-harvesting process and your repository will be re-harvested at the next automated re-harvesting.
Google Scholar is a search engine containing scholarly research papers but it is not designed to aggregate repository and journal systems. More specifically:
- Google Scholar crawls and indexes research papers that can be found on the web and links to the original source, while CORE harvests and caches the full text.
- Google Scholar limits its access only at the granularity level, i.e. its search engine, whereas CORE is in position to extend access to raw data and, apart from the CORE search engine.
- Google Scholar offers both closed access and open access resources, while CORE offers open access resources only, enabling immediate access to full text.
- The audience is different. Even though CORE has a search engine where users can retrieve scientific literature, it focuses also on other types of research stakeholders, such as text miners and repository managers, and offers services designed for them, such as the CORE API and Dataset.
CORE does not create the metadata, but rather harvests them from its content providers. If the metadata are wrong then contact the repository or journal where you had originally deposited or published your content.
Yes it does. CORE harvests content from repositories and journals. The first do not perform peer review of the deposited content but the latter do. In some occassions the content deposited in a repository is already published in a journal and is peer-reviewed. In addition, repositories may contain grey literature and these resources are not peer-reviewed.
A journal's oai base url looks similar to this http://journaldomain.com/cgi/oai2 or http://journaldomain.com/oai/request. A journal's webpage looks similar to this http://journaldomain.com and CORE cannot harvest the journal's content via its webpage url. If you are not sure whether your journal has an oai base url do contact the team that provides technical support to your journal.
We would expect that harvesting could take from one hour to a couple of days for a typical repository. In some repository systems, such as EPrints, most of these recommendations are followed by default. You can find more details here.
To provide its service, it is essential for CORE to be able to store a cached copy of the harvested content. This is needed to verify open access sources, offer analytical services, support text and data mining, recommendation tools, etc. By cashing a copy of the harvested resource, CORE is not different from many commercial and non-commercial, academic and non-academic, search engines including Google or CiteSeerX. The primary difference from such systems is that CORE caches only copies of open access content. More information on the benefits of this approach is available in the “CORE: Three Access Levels to Underpin Open Access” article.
CORE can change the way it treats your repository's content. The resources will be indexed and made discoverable via the CORE Search, but the PDFs will not be downloadble directry from the CORE Reader, a link will be provided only to the repository of origin. If you are interested in that option contact us.
CORE uses information from various registries, such as OpenDOAR, to include new repositories, journals and archives into CORE. If the circumstances have changed in your repository, you can restrict harvesting and crawling activities by modifying your rules in your “robots.txt” file by using the Standard for Robots Exclusion. This will also guarantee the content cannot be cached by search engines and harvesting systems. In addition, you could withdraw your repository from all open access registries lists; when this takes place, please notify us.
Removing full text or metadata
CORE aggregates content from repositories registered in OpenDOAR, journals registered in DOAJ or those content providers that requested their content to be aggregated. This means that all the content sources aggregated by CORE must be open access as this is a requirement for the providers to be included in these registries. According to the official BOAI definition of open access, CORE is allowed to, "distribute, search, or link to the full texts of articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the Internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain is to give authors control over the integrity of their work and the right to be properly acknowledged and cited."
CORE’s system is fully automated and relies on data made available in a machine readable form. If your repository hosts full text with a restrictive license that prohibits harvesting, this needs to be properly communicated in a machine readable form. All non open access items should be blocked in the robots.txt file. If this information is provided in the metadata for each record and CORE exposes the full text, please get in touch with us.
CORE’s system is fully automated and relies on data made available in a machine readable form. Our system understands that the full text of a record was removed only when the record is marked as deleted in the metadata of your repository. See here how to take down full text from CORE.
It is a CORE policy to remove only the full text and not the metadata.
In order for the metadata to be removed from CORE they need to be marked as "deleted/removed" in the repository. If the metadata are marked as "restricted" CORE will still display it.
To register a new key to access the CORE API interface please use the form provided here.
If you plan to use the CORE API we kindly ask the following:
Yes, this is possible but there is usually a cost associated with it. Please email us the name of your company or organisation, business entity, the number of requests you estimate to send and how often you will send them and we will get back to your with a quote.
Yes, you can use the CORE API for commercial purposes, but you need to contact us first and let us know about it. Commercial use of our API could increase the traffic in our servers and we would like to be aware of the expected traffic.
Please note the dataset has been created from information that was publicly available on the Internet. Every effort has been made to ensure this dataset contains only open access content. We have included only content from repositories and journals that are listed in registries where the condition for inclusion is the provision of content under open access compatible license. However, as metadata are often inconsistent, licensed information is often not machine readable, and repositories from time to time leak information that is not open access, we cannot take any responsibility for the license of the content in the dataset. It is therefore up to the user of this dataset to ensure that the way in which they use the dataset does not breach copyright. The dataset is in no way intended for the purposes of reading the original publications, but for machine processing only.
We aim to generate a new public dataset at least once a year. If you need a more recent dataset, please get in touch with us as we might be able to arrange it.
If you have access to the CORE Repositories Dashboard, log into the CORE Repository Dashboard and you will get the instructions on how to download the CORE Recommender. Otherwise, visit our recommender registration page, where you will also find the installation instructions. Repository managers are highly recommended to use the CORE Repositories Dashboard.
The CORE recommender uses the popular content-based filtering system. The similar resources that appear in the CORE Recommender and their quality are highly impacted by the metadata information supplied by the repository of origin. If that information is incorrect or incomplete, you should contact the repository of origin. To improve the CORE recommendations, you can use the feedback button, with which you can remove any undesirable articles.
General information about CORE
Download a high resolution logo of CORE.
You can find CORE’s brochure here.
You can find CORE’s flyer here.
Visit our research page.