Search CORE

21 research outputs found

MORTY: Structured Summarization for Targeted Information Extraction from Scholarly Articles

Author: Auer Sören
Jaradeh Mohamad Yaser
Stocker Markus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/12/2022
Field of study

Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles. Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary. We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles, which we openly publish as a resource for the research community. Our results show that structured summarization is a suitable approach for targeted information extraction that complements other commonly used methods such as question answering and named entity recognition.Comment: Published as a short paper in ICADL 202

arXiv.org e-Print Archive

Comparing research contributions in a scholarly knowledge graph

Author: Auer Sören
Farfar Kheir Eddine
Jaradeh Mohamad Yaser
Oelen Allard
Stocker Markus
Publication venue: Aachen : RWTH Aachen
Publication date: 01/01/2019
Field of study

Conducting a scientific literature review is a time consuming activity. This holds for both finding and comparing the related literature. In this paper, we present a workflow and system designed to, among other things, compare research contributions in a scientific knowledge graph. In order to compare contributions, multiple tasks are performed, including finding similar contributions, mapping properties and visualizing the comparison. The presented workflow is implemented in the Open Research Knowledge Graph (ORKG) which enables researchers to find and compare related literature. A preliminary evaluation has been conducted with researchers. Results show that researchers are satisfied with the usability of the user interface, but more importantly, they acknowledge the need and usefulness of contribution comparisons

Institutionelles Repositorium der Leibniz Universität Hannover

The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources

Author: Auer Sören
Brack Arthur
D'Souza Jennifer
Ewerth Ralph
Hoppe Anett
Jaradeh Mohamad Yaser
Publication venue
Publication date: 01/01/2020
Field of study

We introduce the STEM (Science, Technology, Engineering, and Medicine) Dataset for Scientific Entity Extraction, Classification, and Resolution, version 1.0 (STEM-ECR v1.0). The STEM-ECR v1.0 dataset has been developed to provide a benchmark for the evaluation of scientific entity extraction, classification, and resolution tasks in a domain-independent fashion. It comprises abstracts in 10 STEM disciplines that were found to be the most prolific ones on a major publishing platform. We describe the creation of such a multidisciplinary corpus and highlight the obtained findings in terms of the following features: 1) a generic conceptual formalism for scientific entities in a multidisciplinary scientific context; 2) the feasibility of the domain-independent human annotation of scientific entities under such a generic formalism; 3) a performance benchmark obtainable for automatic extraction of multidisciplinary scientific entities using BERT-based neural models; 4) a delineated 3-step entity resolution procedure for human annotation of the scientific entities via encyclopedic entity linking and lexicographic word sense disambiguation; and 5) human evaluations of Babelfy returned encyclopedic links and lexicographic senses for our entities. Our findings cumulatively indicate that human annotation and automatic learning of multidisciplinary scientific concepts as well as their semantic disambiguation in a wide-ranging setting as STEM is reasonable.Comment: Published in LREC 2020. Publication URL https://www.aclweb.org/anthology/2020.lrec-1.268/; Dataset DOI https://doi.org/10.25835/001754

arXiv.org e-Print Archive

Repositorium für Naturwissenschaften und Technik

Generate FAIR Literature Surveys with Scholarly Knowledge Graphs

Author: Araujo Samur
Bikakis Nikos
Gromann Dagmar
Iglesias Alejandro Rodrí
Jaradeh Mohamad Yaser
Jaradeh Mohamad Yaser
Kohl Christian
Levenshtein Vladimir I
Maillot Pierre
Oelen Allard
Petrova Alina
Randolph Justus J.
Vahdati Sahar
Webster Jane
Winkler William E.
Publication venue: New York City, NY : Association for Computing Machinery
Publication date: 01/01/2020
Field of study

Reviewing scientific literature is a cumbersome, time consuming but crucial activity in research. Leveraging a scholarly knowledge graph, we present a methodology and a system for comparing scholarly literature, in particular research contributions describing the addressed problem, utilized materials, employed methods and yielded results. The system can be used by researchers to quickly get familiar with existing work in a specific research domain (e.g., a concrete research question or hypothesis). Additionally, it can be used to publish literature surveys following the FAIR Data Principles. The methodology to create a research contribution comparison consists of multiple tasks, specifically: (a) finding similar contributions, (b) aligning contribution descriptions, (c) visualizing and finally (d) publishing the comparison. The methodology is implemented within the Open Research Knowledge Graph (ORKG), a scholarly infrastructure that enables researchers to collaboratively describe, find and compare research contributions. We evaluate the implementation using data extracted from published review articles. The evaluation also addresses the FAIRness of comparisons published with the ORKG

arXiv.org e-Print Archive

Crossref

Repositorium für Naturwissenschaften und Technik

Improving Access to Scientific Literature with Knowledge Graphs

Author: Auer Sören
D’Souza Jennifer
Eddine Farfar Kheir
Haris Muhammad
Jaradeh Mohamad Yaser
Oelen Allard
Prinz Manuel
Stocker Markus
Vogt Lars
Wiens Vitalis
Publication venue: Humboldt-Universität zu Berlin
Publication date: 15/10/2020
Field of study

The transfer of knowledge has not changed fundamentally for many hundreds of years: It is usually document-based - formerly printed on paper as a classic essay and nowadays as PDF. With around 2.5 million new research contributions every year, researchers drown in a flood of pseudo-digitized PDF publications. As a result research is seriously weakened. In this article, we argue for representing scholarly contributions in a structured and semantic way as a knowledge graph. The advantage is that information represented in a knowledge graph is readable by machines and humans. As an example, we give an overview on the Open Research Knowledge Graph (ORKG), a service implementing this approach. For creating the knowledge graph representation, we rely on a mixture of manual (crowd/expert sourcing) and (semi-)automated techniques. Only with such a combination of human and machine intelligence, we can achieve the required quality of the representation to allow for novel exploration and assistance services for researchers. As a result, a scholarly knowledge graph such as the ORKG can be used to give a condensed overview on the state-of-the-art addressing a particular research quest, for example as a tabular comparison of contributions according to various characteristics of the approaches. Further possible intuitive access interfaces to such scholarly knowledge graphs include domain-specific (chart) visualizations or answering of natural language questions.Der Verbreitung wissenschaftlicher Erkenntnisse hat sich seit vielen hundert Jahren nicht grundlegend verändert: Er erfolgt in der Regel dokumentenbasiert - früher als klassischer Aufsatz auf Papier gedruckt und heute online als PDF. Mit rund 2,5 Millionen neuen Forschungsbeiträgen pro Jahr ertrinken Forscher in einer Flut von pseudo-digitalisierten PDF-Publikationen. Als Folge davon wird die Forschung stark geschwächt. In diesem Artikel plädieren wir dafür, wissenschaftliche Beiträge in strukturierter und semantischer Form als Wissensgraph zu repräsentieren. Der Vorteil ist, dass die in einem Wissensgraph dargestellten Informationen für Maschinen und Menschen lesbar sind. Als Beispiel geben wir einen Überblick über den Open Research Knowledge Graph (ORKG), einen Dienst, der diesen Ansatz umsetzt. Für die Erstellung des Wissensgraph setzen wir eine Mischung aus manuellen (crowd/expert sourcing) und (halb-)automatisierten Techniken ein. Nur mit einer solchen Kombination aus menschlicher und maschineller Intelligenz können wir die erforderliche Qualität der Darstellung erreichen, um neuartige Explorations- und Unterstützungsdienste für Forscher zu ermöglichen. Im Ergebnis kann ein Wissensgraph wie der ORKG verwendet werden, um einen komprimierten Überblick über den Stand der Technik in Bezug auf eine bestimmte Forschungsaufgabe zu geben, z.B. als tabellarischer Vergleich der Beiträge nach verschiedenen Merkmalen der Ansätze. Weitere mögliche intuitive Nutzungsschnittstellen zu solchen wissenschaftlichen Wissensgraphen sind domänenspezifische Visualisierungen oder die Beantwortung natürlichsprachlicher Fragen mittels Question Answering.Peer Reviewe

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge

Author: Auer Sören
Barone Dante A. C.
Bartz Cassiano
Cortes Eduardo G.
Jaradeh Mohamad Yaser
Karras Oliver
Koubarakis Manolis
Mouromtsev Dmitry
Pliukhin Dmitrii
Radyush Daniil
Shilin Ivan
Stocker Markus
Tsalapati Eleni
Publication venue: [London] : Macmillan Publishers Limited, part of Springer Nature
Publication date: 01/01/2023
Field of study

Knowledge graphs have gained increasing popularity in the last decade in science and technology. However, knowledge graphs are currently relatively simple to moderate semantic structures that are mainly a collection of factual statements. Question answering (QA) benchmarks and systems were so far mainly geared towards encyclopedic knowledge graphs such as DBpedia and Wikidata. We present SciQA a scientific QA benchmark for scholarly knowledge. The benchmark leverages the Open Research Knowledge Graph (ORKG) which includes almost 170,000 resources describing research contributions of almost 15,000 scholarly articles from 709 research fields. Following a bottom-up methodology, we first manually developed a set of 100 complex questions that can be answered using this knowledge graph. Furthermore, we devised eight question templates with which we automatically generated further 2465 questions, that can also be answered with the ORKG. The questions cover a range of research fields and question types and are translated into corresponding SPARQL queries over the ORKG. Based on two preliminary evaluations, we show that the resulting SciQA benchmark represents a challenging task for next-generation QA systems. This task is part of the open competitions at the 22nd International Semantic Web Conference 2023 as the Scholarly Question Answering over Linked Data (QALD) Challenge

Institutionelles Repositorium der Leibniz Universität Hannover

Recommended from our members

The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge

Author: Auer Sören
Barone Dante A.C.
Bartz Cassiano
Cortes Eduardo G.
Jaradeh Mohamad Yaser
Karras Oliver
Koubarakis Manolis
Mouromtsev Dmitry
Pliukhin Dmitrii
Radyush Daniil
Shilin Ivan
Stocker Markus
Tsalapati Eleni
Publication venue: London : Nature Publishing Group
Publication date: 01/01/2023
Field of study

Repositorium für Naturwissenschaften und Technik