FAQs
Joining CORE as data provider
Last updated 22.02.2024To check whether your repository or journal is indexed by CORE, go to our data providers list.
CORE uses information from various registries, such as OpenDOAR and DOAJ, to include new repositories and journals into CORE. If your repository or journal is already registered with some authoritative registry, you don't need to do anything. If your repository or journal has not been registered yet use the form to add it.
CORE is an international service and indexes repositories from various locations around the world. This information is displayed in a map at our data providers page.
CORE is a indexing service and is not similar to research networking sites, e.g. ResearchGate or Academia.edu, where authors can deposit papers, so please do not email us the full text of your papers. If you have deposited your articles in a repository let us know the name of the repository. There are chances that we index it already and if we donāt we could start indexing it.
CORE indexes DOAJ as a single entry, which means that each journal title does not appear separately in CORE. If you wish to have a separate entry for your journal in CORE, do send us the journal's OAI base URL and we will create a new entry.
Indexing
Last updated 22.02.2024General
CORE does not index all the repositories that exist in our database with the same frequency. Repositories are indexed as frequently as our HW infrastructure allows. The specific time of indexing for a repository is determined by the CORE Scheduler. The CORE Scheduler is a software component that ensures that our indexing cluster of machines is close to fully utilised 24/7 for 365 days every year. As soon as some resource is freed, the CORE Scheduler decides which repository needs to be indexed based on several criteria. These criteria include, but are not limited to, the previous time of the repository being indexed, the size of the repository, the location of the repository, the repository's indexing performance and information about potential previous indexing errors. We review the functionality of the scheduler on a regular basis to ensure that its decisions on what to index next maximise the number of ingested documents over a unit of time.
If you have a question regarding a specific repository do get in touch with us.
Depending on the size of the repository and the existing traffic in CORE's servers, the indexing can last from a couple of hours to a couple of weeks. If we experience any technical issues during that period, we will get in touch.
CORE indexes all metadata records in a repository, but it is in position to index full text records in PDF only. We are working though to include other file types, such as HTML webpages, images etc.
Yes it can, provided that the repository offers its content as Open Access.
Log in to the CORE Repository Dashboard and look under the issues tab. Examine whether CORE has the correct oai base url of your repository or if there are any technical issues listed there. If there are no technical issues or you do not have an account for the CORE Repository Dashboard contact us.
CORE works at the level of repositories and cannot update specific records. You can upload the new record into your institutional repository or journal and CORE will synchronise it at the next scheduled re-indexing.
No, CORE follows an automated re-indexing process and your repository will be re-indexed at the next automated re-indexing.
Google Scholar is a search engine containing scholarly research papers but it is not designed to collect information from repository and journal systems. More specifically:
- Google Scholar crawls and indexes the full text of research papers that can be found on the web, while CORE indexes also the metadata as supplied by the repository opr journal system.
- The audience is different. Even though CORE has a search engine in the same way as Google Scholar, COREās delivers value by making research information machine readable, delivering an open access scholarly infrastructures which others can build on via the CORE API and Dataset.
CORE does not create the metadata, but rather indexes them from its content providers. If the metadata are wrong then contact the repository or journal where you had originally deposited or published your content.
CORE can offer repository or journal full text download statistics per month via the CORE Repository Dashboard. If you do not have access to the Dashboard, you can claim it.
Yes it does. CORE indexes content from repositories and journals. The first do not perform peer review of the deposited content but the latter do. In some occassions the content deposited in a repository is already published in a journal and is peer-reviewed. In addition, repositories may contain grey literature and these resources are not peer-reviewed.
CORE has created definitions with regards to the statistics it provided to OpenDOAR.
Metadata: The total number of metadata records with a unique OAI identifier provided by the repository as this appears in the application profile which CORE index - if CORE indexes from the RIOXX endpoint, CORE will provide RIOXX counts instead of Dublin Core counts.
Full text: Count of metadata records - as above - with a least one attachment provided by the repository being a pdf file, which a) is publicly downloadable (no-login required or output is not under embargo, etc.) and b) the full text is machine readable, i.e. it has an extractable text not via OCR.
CORE does not own any rights of the aggregated content and each resource has its own license, which should be respected by the CORE users.
You cannot access it ā a 404 error indicates that the full text has been removed from CORE.
Yes, this is possible. Please contact us and we will enable this for your repository.
Technical
In order to realise the data transfer and regular data updates of CORE and your system, CORE uses a variety of protocols to ingest the content. The easiest way to get your content integrated with CORE is the OAI-PMH protocol. If you wish to join CORE get in touch.
OAI base URL looks similar to http://journaldomain.com/cgi/oai2
or
http://journaldomain.com/oai/request
when homepage URL is
http://journaldomain.com
. CORE cannot index the
journalās/repositoryās content via its webpage URL.
If you are not sure whether your journal/repository has an OAI base
URL, contact our team and we will provide
technical support to you.
A more technical answer for https://support.core.ac.uk
Target audience: repository managers, Technical staff.
An OAI Identifier is a unique identifier which distinguishes items in a repository.
It āunambiguously identifies an
item within a repository; the unique identifier is used in OAI-PMH requests for extracting metadata
from the itemā
The Identifier contains 3 parts, split using:
āoaiā : Unique identifier oai. This describes the type of the identifier
āwebsite addressā: Where the item is hosted.
āUnique identifierā: An identifier of the object
For example:
oai:eprints.gla.ac.uk:129357
oai:digitalcommons.odu.edu:oaweek-1012
oai:oro.open.ac.uk:75049
oai:dspace.stir.ac.uk:1893/24654
Not all OAI Identifiers look like this, but they are non-standard and their use is discouraged. OAI Identifiers must follow the URI (Uniform Resource Identifier) syntax. For more information about how OAI Identifiers are formed, visit Specification and XML Schema for the OAI Identifier Format.
We would expect that indexing could take from one hour to a couple of days for a typical repository. In some repository systems, such as EPrints, most of these recommendations are followed by default. Find more details how it works.
We mainly support oai_dc, the mainstream metadata format used in the OAI-PMH Protocol, utilising the Dublin Core vocabulary, a popular vocabulary for bibliographic data. We also support RIOXX, a richer metadata protocol, used mostly by the UK repositories.
To provide its service, it is essential for CORE to be able to store a cached copy of the indexed content. This is needed to verify open access sources, offer analytical services, support text and data mining, recommendation tools, etc. By cashing a copy of the indexed resource, CORE is not different from many commercial and non-commercial, academic and non-academic, search engines including Google or CiteSeerX. The primary difference from such systems is that CORE caches only copies of open access content. More information on the benefits of this approach is available in the āCORE: Three Access Levels to Underpin Open Accessā article.
CORE uses information from various registries, such as OpenDOAR, to include new repositories, journals and archives into CORE. If the circumstances have changed in your repository, you can restrict indexing and crawling activities by modifying your rules in your ārobots.txtā file by using the Standard for Robots Exclusion. This will also guarantee the content cannot be cached by search engines and indexing systems. In addition, you could withdraw your repository from all open access registries lists; when this takes place, please notify us.
Removing full text or metadata
CORE aggregates content from repositories registered in OpenDOAR, journals registered in DOAJ or those content providers that requested their content to be aggregated. This means that all the content sources aggregated by CORE must be open access as this is a requirement for the providers to be included in these registries. According to the official BOAI definition of open access, CORE is allowed to, "distribute, search, or link to the full texts of articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the Internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain is to give authors control over the integrity of their work and the right to be properly acknowledged and cited."
COREās system is fully automated and relies on data made available in a machine readable form. If your repository hosts full text with a restrictive license that prohibits indexing, this needs to be properly communicated in a machine readable form. All non open access items should be blocked in the robots.txt file. If this information is provided in the metadata for each record and CORE exposes the full text, please get in touch with us.
An output's license is not consistently exposed by content providers in a machine readable form. In some circumstances it may be possible to extract this from the "fulltextUrls" field in the CORE API. However, this is subject to the license the data offered by the data provider.
COREās system is fully automated and relies on data made available in a machine readable form. Our system understands that the full text of a record was removed only when the record is marked as deleted in the metadata of your repository. See how to take down full text from CORE in the related FAQ.
If full text content appears in CORE but not in the hosting service your repository manager can take it down via the CORE Repository Dashboard anytime without notifying us. Alternatively, you can use the update or remove article form.
It is a CORE policy to remove only the full text and not the metadata. Only in limited cases - e.g. when a publication did not happen - it is possible that the metadata can be removed. In that case,email us.
In order for the metadata to be removed from CORE they need to be marked as "deleted/removed" in the repository. If the metadata are marked as "restricted" CORE will still display it.
Membership
Last updated 22.02.2024We founded CORE to provide free indexing, and discovery for institutions. Initially, we were funded to provide these tools, but much of our funding ended in July 2023. As a result, we have created a membership model, with both free and paid membership. Our founding goal remains as before: we index any institutional repository. If you register as a starting member, you can see a freely available dashboard that shows useful information about your repository. We have added optional paid tools that assist in analytics and compliance.
By joining CORE you are supporting the open research community. Moreover, we believe that our paid membership provides tools that save your institution time and money. It enables you to check automatically what would otherwise have to be done by hand, for example finding copies of papers by a member of your institution that were deposited in another repository (what we call the ācress-repository checkā). Finally, as a paying member, we provide anyone from your institution with fast API access to the full CORE dataset for text and data-mining (TDM) purposes.
Our aim is to make membership available for the widest range of institutions. Accordingly, we have set three tiers of membership, using the widely accepted World Bank criteria for low- and middle-income countries. Our aim is to make membership available to all, because the more institutions who join, the better the service for all members: specifically, we can check other repositories for relevant content to your institution, wherever it appears.
Many institutions donāt have the staff or the knowledge to manage their repository effectively. For sustaining members, we provide a free repository health check every year. We go through all the results from our automatic checks and show you how to fix them. In other words, we provide a kind of out-sourced technical resource for you.
Both the Recommender and the Discovery tools are free of charge for any institution that provides their data for indexing. They are both benefits of the free starting membership.
For many countries, including the United States, institutions may not track compliant across the institution. However, individual faculties and departments want to ensure that their publications are comipliant with federal and with funder mandates. Membership of CORE enables you to identify that a paper has been deposited, even if it was not deposited in your local repository.
Dashboard
Last updated 22.02.2024For any institution, we provide a dashboard that enables them to see their repository from the outside - the way that any external service sees them. We give you statistics about the number of items indexed, how many of them are full text, and the proportion of content that has a DOI identifier. The dashboard is free to access for any institution who signs up for free starting membership.
You can only see the dashboard for your organisation if you have signed up as a member. Becoming a starting member is free of charge for any institution that provides content for CORE to index. Simply contact the administrator at [email protected] to be sent an invitation to open an account.
Open access articles are free to access, but the publisher may not maintain a freely available copy for researchers to access. It makes sense for researchers to self-archive their content. The author (or the institution) posts the article in a repository, typically the institutional repository of the university where they work. However, in many countries, funding requirements are not only that the article should be available open access, but that the article should be available within a certain time period after acceptance. For the UK REF 2021, this time period, known as the deposit delay, was 90 days.
This introduces a reporting requirement for any institution that wants to be compliant with funder mandates: it is necessary to demonstrate that articles are compliant, that is, available open access and deposited within the correct time frame.
For every institutional repository where CORE indexes content, we try to assist you with compliance, identifying the date of deposit wherever possible, as well as the date of publication.
The date of deposit is increasingly added as a metadata field when content is uploaded to repositories. RIOXX, for example, is a metadata protocol that enables metadata to be shared across repositories, particularly date of deposit metadata. If all institutions make the date of deposit information available, CORE can provide a service for every repository by finding compliant versions of papers in other repositories.
Simply put, any CRIS system (Current Research Information System) works for an individual institution. Although it can often show compliance for papers within that institution, a CRIS system (or an individual repository) cannot find duplicate copies of articles that were deposited in another institution, for example if a paper has a co-author from Sheffield when the main author is based at Leeds. CORE is unique in indexing repositories from around the world, and can identify duplicate copies of papers. As a result, CORE and CRIS systems are complements, not substitutes, for each other.
CORE services
Last updated 22.02.2024Use the registration form to retrieve your personal access key for the CORE API.
If you plan to use the CORE API we kindly ask the following:
- attribute CORE by including in your website this snippet,
- send us an email with a brief summary on how you are using the CORE API,
- grant us permission to present this summary to our funders and/or display it on our website,
- allow us to list your companyās name, url and logo on our website.
Yes, this is possible but there is usually a cost associated with it. Please email us the name of your company or organisation, business entity, the number of requests you estimate to send and how often you will send them and we will get back to your with a quote.
Yes, you can use the CORE API for commercial purposes, (Terms & Conditions) apply. We provide 30 Day Free Trial for Institution and Enterprise. We will ask you for your circumstances during the registration process and we might contact you afterwards to clarify any points and assess your eligibility for a free licence. If your circumstances change, please let us know.
CORE API is free and does not require registration, subject to our rate limits. However, organisations that register get a faster rate that is typically not free. For Supporting and Sustaining Members, the faster rate comes as a free member benefit.
Please note the dataset has been created from information that was publicly available on the Internet. Every effort has been made to ensure this dataset contains only open access content. We have included only content from repositories and journals that are listed in registries where the condition for inclusion is the provision of content under open access compatible license. However, as metadata are often inconsistent, licensed information is often not machine readable, and repositories from time to time leak information that is not open access, we cannot take any responsibility for the license of the content in the dataset. It is therefore up to the user of this dataset to ensure that the way in which they use the dataset does not breach copyright. The dataset is in no way intended for the purposes of reading the original publications, but for machine processing only.
We aim to generate a new public dataset at least once a year. If you need a more recent dataset, please get in touch with us as we might be able to arrange it.
If you have access to the CORE Repositories Dashboard, log into the CORE Repository Dashboard and you will get the instructions on how to download the CORE Recommender. Otherwise, visit our recommender registration page, where you will also find the installation instructions. Repository managers are highly recommended to use the CORE Repositories Dashboard.
The CORE recommender uses the popular content-based filtering system. The similar resources that appear in the CORE Recommender and their quality are highly impacted by the metadata information supplied by the repository of origin. If that information is incorrect or incomplete, you should contact the repository of origin. To improve the CORE recommendations, you can use the feedback button, with which you can remove any undesirable articles.
Unfortunately CORE does not provide any language specific datasets at the moment. Users can use the CORE API to download individual PDFs.
Non-commercial means:
- The organisation is a registered charity or a not-for-profit AND
- the use of the COREĀ service will not enable, contribute to or support the use of any paid-for service of the organisation or of another third party organisation linked to this organisation.
General information about CORE
Last updated 22.02.2024Download a high resolution logo of CORE.
You can find the most recent CORE brochure in our resources list.
Access the COREās flyer in our resources.
CORE badges
You can use the below badges on your website to show that your content is indexed by CORE and that you are a part of CORE and Open Research community. Please chose the badges according to your membership tier and include the badges in your system by means of the supplied html tags.
Visit our research page.
CORE guidance on REF2021 Audit
Last updated 24.08.2023Due to cross university collaborations some outputs that could be considered non-compliant due to a late deposit may be compliant due to a deposit that was made within the policy timeframe at another institution or subject repository. Individual institutions could benefit as they might not be fully aware of all compliant outputs and might consider some of their outputs non-compliant, while in fact they are compliant.
CORE captures data as explained in the CORE recommendations. By following our recommendations CORE should have the same deposit dates as the repository.
CORE can identify deposits of the same articles from across repositories. By doing so, an output deposited late in repository A could be technically compliant provided that it was deposited within the timeframe at repository B, i.e. the earliest deposit date irrespective of the repository could be used. However, for the time being, CORE agreed to supply the data to Research England and it will be up the discretion of Research England to interpret the data. We understand that the motivation is to mark outputs as non-compliant only in cases where there is clear evidence that they are truly non-compliant.
CORE indexes both metadata and full texts, currently only in PDF format but we will include support for other formats in the future. While the presence of the full text is preferred, CORE has all information necessary to support the REF2021 audit as long as the metadata of your outputs are in CORE. To minimise the possibility of some of your outputs not being captured by CORE, please follow the CORE recommendations.
CORE captures data as explained in the CORE recommendations. By following our recommendations CORE should have the same deposit date as the repository. The date you see in the CORE API is the date the document was last seen in your repository and imported to CORE. The date exposed in the CORE Repositories Dashboard uses instead a new indexing system that reads the ādeposited dateā exposed by your own repository system.
Deposit dates are available via the CORE RepositorŃ Edition. Repository managers can access the percentage of papers that are non-compliant, e.g. outputs that were deposited 90 days or more after publication, according to the REF 2021 Open Access Policy.
This is possible via a subscription to the CORE Repository Edition. Repository managers can access the percentage of papers that are non-compliant, e.g. outputs that were deposited 90 days or more after publication, according to the REF 2021 Open Access Policy.
CORE captures data as explained in the CORE recommendations.
How does CORE know that the version of the deposited full text is the correct and compliant version?
The validation that the deposited full text is the first compliant version is currently not in the scope of CORE's support for the REF2021 Audit. Research England might use other alternative methods to check this.