Search CORE

19 research outputs found

Corpora for the conceptualisation and zoning of scientific papers

Author: Batchelor Colin
Liakata Maria
Siddharthan Advaith
Teufel Simone
Publication venue
Publication date: 01/01/2010
Field of study

We present two complementary annotation schemes for sentence based annotation of full scientific papers, CoreSC and AZ-II, which have been applied to primary research articles in chemistry. The AZ scheme is based on the rhetorical structure of a scientific paper and follows the knowledge claims made by the authors. It has been shown to be reliably annotated by independent human coders and has proven useful for various information access tasks. AZ-II is its extended version, which has been successfully applied to chemistry. The CoreSC scheme takes a different view of scientific papers, treating them as the humanly readable representations of scientific investigations. It therefore seeks to retrieve the structure of the investigation from the paper as generic high-level Core Scientific Concepts (CoreSC). CoreSCs have been annotated by 16 chemistry experts over a total of 265 full papers in physical chemistry and biochemistry. We describe the differences and similarities between the two schemes in detail and present the two corpora produced using each scheme. There are 36 shared papers in the corpora, which allows us to quantitatively compare aspects of the annotation schemes. We show the correlation between the two schemes, their strengths and weaknesses and discuss the benefits of combining a rhetorical based analysis of the papers with a content-based one

CiteSeerX

Open Research Online (The Open University)

A Three-Way Perspective on Scientific Discourse Annotation for Knowledge Extraction

Author: Ananiadou S
de Waard A
Liakata M
Nawaz R
Pander Maat H
Thompson P
Publication venue
Publication date: 01/07/2012
Field of study

E-space: Manchester Metropolitan University's Research Repository

The University of Manchester - Institutional Repository

TechMiner: Extracting Technologies from Academic Publications

Author: A Bandrowski
C Bizer
C Fellbaum
F Osborne
F Osborne
F Ronzano
K Scanning Douw
P Corbett
R Usbeck
S Peroni
T Groza
W Huang
Publication venue
Publication date: 01/01/2016
Field of study

In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision

Crossref

Online Research @ Cardiff

Open Research Online (The Open University)

Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers

Author: Ananiadou S
Nawaz R
Thompson P
Publication venue
Publication date: 01/05/2012
Field of study

The University of Manchester - Institutional Repository

Domain-independent Extraction of Scientific Concepts from Research Articles

Author: A Constantin
D Jurgens
J Beel
J Cohen
J Lehmann
K Balog
M Liakata
N Lao
O Bodenreider
S Hochreiter
V Pertsas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.Comment: Accepted for publishing in 42nd European Conference on IR Research, ECIR 202

arXiv.org e-Print Archive

Crossref

Repositorium für Naturwissenschaften und Technik

Requirements Analysis for an Open Research Knowledge Graph

Author: A Brack
A Constantin
A Fink
A Hars
A Hoppe
AR Hevner
C Lange
C Okoli
D Vrandečić
G Petasis
IA Klampanos
J Beel
J Lehmann
K Balog
K Degtyarenko
L Bornmann
LN Soldatova
M Färber
M Liakata
M Lubani
M Stocker
MAMA Harris
MY Jaradeh
O Bodenreider
R Braun
S Fathalla
S Mesbah
S Peroni
S Vahdati
V Pertsas
Z Nasar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get an overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KGs) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective by presenting a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications and outline possible solutions.Comment: Accepted for publishing in 24th International Conference on Theory and Practice of Digital Libraries, TPDL 202

arXiv.org e-Print Archive

Crossref

Repositorium für Naturwissenschaften und Technik

NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature

Author: Auer Sören
D'Souza Jennifer
Publication venue
Publication date: 01/01/2020
Field of study

We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development. Our pilot annotated dataset of 50 NLP-ML scholarly articles according to the NLPContributions scheme is openly available to the research community at https://doi.org/10.25835/0019761.Comment: In Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE 2020) co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020), Virtual Event, China, August 1. http://ceur-ws.org/Vol-2658

arXiv.org e-Print Archive

Repositorium für Naturwissenschaften und Technik

NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature

Author: Auer Sören
D'Souza Jennifer
Lu Wie
Mayr Philipp
Zhang Chengzhi
Zhang Yi
Publication venue: Aachen, Germany : RWTH Aachen
Publication date: 01/01/2020
Field of study

We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. Question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers [18] of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development. Our pilot annotated dataset of 50 NLP-ML scholarly articles according to the NLPContributions scheme is openly available to the research community at https://doi.org/10.25835/0019761

Institutionelles Repositorium der Leibniz Universität Hannover

Recommended from our members

Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions - A Trial Dataset

Author: Auer Sören
D’Souza Jennifer
Publication venue: Beijing : National Science Library, Chinese Academy of Sciences
Publication date: 01/01/2021
Field of study

This work aims to normalize the NlpContributions scheme (henceforward, NlpContributionGraph) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage—to define the scheme (described in prior work); and 2) adjudication stage—to normalize the graphing model (the focus of this paper). We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme. The application of NlpContributionGraph on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1-score, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triple statements indicating that with increased granularity of the information, the annotation decision variance is greater. NlpContributionGraph has limited scope for structuring scholarly contributions compared with STEM (Science, Technology, Engineering, and Medicine) scholarly knowledge at large. Further, the annotation scheme in this work is designed by only an intra-annotator consensus—a single annotator first annotated the data to propose the initial scheme, following which, the same annotator reannotated the data to normalize the annotations in an adjudication stage. However, the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles. This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a “single” set of structures and relationships as the final scheme. Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe, our intra-annotation procedure is well-suited. Nevertheless, the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews. This is planned as future work to produce a robust model. We demonstrate NlpContributionGraph data integrated into the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks. NlpContributionGraph is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph, which to the best of our knowledge does not exist in the community. Furthermore, our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty

Repositorium für Naturwissenschaften und Technik

Something old, something new: Identifying knowledge source in bio-events

Author: Ananiadou Sophia
Nawaz Raheel
Thompson Paul
Publication venue: Bahri Publications
Publication date: 01/01/2013
Field of study

Locating new experimental knowledge in biomedical texts is important for several tasks undertaken by biologists. Although several systems can distinguish between new and existing knowledge, this generally happens at the text zone level. In contrast to text zones, bio-events constitute structured representations of biomedical knowledge. They bridge text with domain knowledge and can be used to develop sophisticated semantic search systems. Typically, event extraction systems locate and classify events and their arguments, but ignore interpretative information (meta-knowledge) from their textual context. Since several events (often nested) can occur in a sentence, determining which event(s) are affected by which textual clues can be complex. We have analysed knowledge source annotation in two bio-event corpora: GENIA-MK (abstracts) and FP-MK (full papers), and have developed a system to classify bioevents automatically according to their knowledge source. Our system performs with an accuracy of over 99% on both abstracts and full papers

E-space: Manchester Metropolitan University's Research Repository

The University of Manchester - Institutional Repository