Search CORE

41 research outputs found

Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles

Author: Auer Sören
D’Souza Jennifer
Ke Hao-Ren
Lee Chei Sian
Sugiyama Kazunari
Publication venue: New York, NY : Springer
Publication date: 01/01/2021
Field of study

We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%

Institutionelles Repositorium der Leibniz Universität Hannover

Open Information Extraction for Knowledge Graph Construction

Author: A Kearney
AE Jinha
D Weld
DS Weld
F Belleau
L Han
O Bodenreider
S Haussmann
Z Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

University of Liverpool Repository

Crossref

Recommended from our members

Requirements Analysis for an Open Research Knowledge Graph

Author: Auer Sören
Brack Arthur
Ewerth Ralph
Hoppe Anett
Stocker Markus
Publication venue: Berlin ; Heidelberg : Springer
Publication date: 01/01/2020
Field of study

Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KGs) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective by presenting a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications and outline possible solutions

Repositorium für Naturwissenschaften und Technik

Domain-independent Extraction of Scientific Concepts from Research Articles

Author: A Constantin
D Jurgens
J Beel
J Cohen
J Lehmann
K Balog
M Liakata
N Lao
O Bodenreider
S Hochreiter
V Pertsas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.Comment: Accepted for publishing in 42nd European Conference on IR Research, ECIR 202

arXiv.org e-Print Archive

Crossref

Repositorium für Naturwissenschaften und Technik