Search CORE

26,053 research outputs found

Metadata Extraction from Scientific Papers

Author: Vácha Petr
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2009
Field of study

Práce porovnává dostupné vědecké vyhledávače se skriptem pro extrakci metadat z vědeckých prací vyvinutým Tomášem Lokajem na FIT VUT v Brně. Výsledky potvrzují nedostatky v extrakci metadat. Tato bakalářská práce zároveň představuje ucelený návod na porovnání různých informací.Work compares accessible scientific locater and program for extraction metadata from scientific papers created by Tomáš Lokaj on FIT BUT. Results affirm imperfections in extraction metadata. This bachelor thesis introduce integral manual for comparing various informations.

Digital library of Brno University of Technology

National Repository of Grey Literature

Extraction and Evaluation of Statistical Information from Social and Behavioral Science Papers

Author: Giles C. Lee
Lanka Sree Sai Teja
Rajtmajer Sarah
Wu Jian
Publication venue: ODU Digital Commons
Publication date: 01/01/2021
Field of study

With substantial and continuing increases in the number of published papers across the scientific literature, development of reliable approaches for automated discovery and assessment of published findings is increasingly urgent. Tools which can extract critical information from scientific papers and metadata can support representation and reasoning over existing findings, and offer insights into replicability, robustness and generalizability of specific claims. In this work, we present a pipeline for the extraction of statistical information (p-values, sample size, number of hypotheses tested) from full-text scientific documents. We validate our approach on 300 papers selected from the social and behavioral science literatures, and suggest directions for next steps

Old Dominion University

Epistemic logic for metadata modelling from scientific papers on Covid-19

Author: Cuconato Simone
Publication venue: APAV
Publication date: 31/12/2021
Field of study

The field of epistemic logic developed into an interdisciplinary area focused on explicating epistemic issues in, for example, artificial intelligence, computer security, game theory, economics, multiagent systems and the social sciences. Inspired, in part, by issues in these different ‘application’ areas, in this paper I propose an epistemic logic T for metadata extracted from scientific papers on COVID-19. More in details, I introduce a structure S to syntactically and semantically modelling metadata extracted with systems for extracting structured metadata from scientific articles in a born-digital form. These systems will be considered, in the logical model created, as ‘Metadata extraction agents’ (MEA). In this case MEA taken into consideration are CERMINE and TeamBeam. In an increasingly data-driven world, modelling data or metadata means to help systematise existing information and support the research community in building solutions to the COVID-19 pandemic

Science & Philosophy

APAV - Academy of Sciences, Letters, Arts and Technology (E-Journals)

Evaluation of header metadata extraction approaches and tools for scientific PDF documents

Author: Beel Joeran
Breitinger Corinna
Gipp Bela
Lipinski Mario
Yao Kevin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

This paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers looking to integrate the most suitable and effective metadata extraction tool into their software. We shed light on the strengths and weaknesses of seven tools in common use. In our evaluation using papers from the arXiv collection, GROBID delivered the best results, followed by Mendeley Desktop. SciPlore Xtract, PDFMeat, and SVMHeaderParse also delivered good results depending on the metadata type to be extracted

KOPS - The Institutional Repository of the University of Konstanz

Crossref

VisualBib(va): A Visual Analytics Platform for Authoring and Reviewing Bibliographies

Author: Antonina Dattolo
Marco Angelini
Marco Corbatto
Publication venue
Publication date: 01/01/2022
Field of study

Researchers are daily engaged in bibliographic tasks concerning literature search and review, both in the role of authors of scientific papers and when they are reviewers or evaluators. Current indexing platforms poorly support the visual exploration and comparative metadata analysis coming from subsequent searches. To address these issues, we designed and realized VisualBib(va), an online visual analytics solution, where a visual environment includes analysis control, bibliography exploration, automatic metadata extraction, and metrics visualization for real-time scenarios. We introduce and discuss here the relevant functions that VisualBib(va) supports through one usage scenarios related to the creation of a bibliography

Archivio istituzionale della ricerca - Università degli Studi di Udine

Theory Entity Extraction for Social and Behavioral Sciences Papers Using Distant Supervision

Author: Salsabil Lamia
Wei Xin
Wu Jian
Publication venue: ODU Digital Commons
Publication date: 01/01/2022
Field of study

Theories and models, which are common in scientific papers in almost all domains, usually provide the foundations of theoretical analysis and experiments. Understanding the use of theories and models can shed light on the credibility and reproducibility of research works. Compared with metadata, such as title, author, keywords, etc., theory extraction in scientific literature is rarely explored, especially for social and behavioral science (SBS) domains. One challenge of applying supervised learning methods is the lack of a large number of labeled samples for training. In this paper, we propose an automated framework based on distant supervision that leverages entity mentions from Wikipedia to build a ground truth corpus consisting of more than 4500 automatically annotated sentences containing theory/model mentions. We use this corpus to train models for theory extraction in SBS papers. We compared four deep learning architectures and found the RoBERTa-BiLSTM-CRF is the best one with a precision as high as 89.72%. The model is promising to be conveniently extended to domains other than SBS. The code and data are publicly available at https://github.com/lamps-lab/theory

Old Dominion University

VisualBib(va): A Visual Analytics Platform for Authoring and Reviewing Bibliographies

Author: Antonina Dattolo
Marco Angelini
Marco Corbatto
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Researchers are daily engaged in bibliographic tasks concerning literature search and review, both in the role of authors of scientific papers and when they are reviewers or evaluators. Current indexing platforms poorly support the visual exploration and comparative metadata analysis coming from subsequent searches. To address these issues, we designed and realized VisualBib(va), an online visual analytics solution, where a visual environment includes analysis control, bibliography exploration, automatic metadata extraction, and metrics visualization for real-time scenarios. We introduce and discuss here the relevant functions that VisualBib(va) supports through two usage scenarios related to the creation and the review of a bibliography. Full details about the VisualBib(va) design, implementation and evaluation are available in [3]. A fully interactive environment is available at http://visualbib.uniud.it/ (video demo: http://bit.ly/3fKuZNg)

Archivio istituzionale della ricerca - Università degli Studi di Udine

Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics

Author: Atanassova Iana
Bertin Marc
Mayr Philipp
Publication venue
Publication date: 17/06/2015
Field of study

The workshop "Mining Scientific Papers: Computational Linguistics and Bibliometrics" (CLBib 2015), co-located with the 15th International Society of Scientometrics and Informetrics Conference (ISSI 2015), brought together researchers in Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing (NLP). The goals of the workshop were to answer questions like: How can we enhance author network analysis and Bibliometrics using data obtained by text analytics? What insights can NLP provide on the structure of scientific writing, on citation networks, and on in-text citation analysis? This workshop is the first step to foster the reflection on the interdisciplinarity and the benefits that the two disciplines Bibliometrics and Natural Language Processing can drive from it.Comment: 4 pages, Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics at ISSI 201

arXiv.org e-Print Archive

HAL - Université de Franche-Comté

HAL Descartes

Hal-Diderot

A-posteriori provenance-enabled linking of publications and datasets via crowdsourcing

Author: Berendt Bettina
Dragan Laura
Luczak-Rösch Markus
Moreau Luc
Packer Heather S.
Simperl Elena
Publication venue
Publication date: 12/09/2014
Field of study

This paper aims to share with the digital library community different opportunities to leverage crowdsourcing for a-posteriori capturing of dataset citation graphs. We describe a practical approach, which exploits one possible crowdsourcing technique to collect these graphs from domain experts and proposes their publication as Linked Data using the W3C PROV standard. Based on our findings from a study we ran during the USEWOD 2014 workshop, we propose a semi-automatic approach that generates metadata by leveraging information extraction as an additional step to crowdsourcing, to generate high-quality data citation graphs. Furthermore, we consider the design implications on our crowdsourcing approach when non-expert participants are involved in the process<br/

Southampton (e-Prints Soton)