26,053 research outputs found

    Metadata Extraction from Scientific Papers

    Get PDF
    Práce porovnává dostupné vědecké vyhledávače se skriptem pro extrakci metadat z vědeckých prací vyvinutým Tomášem Lokajem na FIT VUT v Brně. Výsledky potvrzují nedostatky v extrakci metadat. Tato bakalářská práce zároveň představuje ucelený návod na porovnání různých informací.Work compares accessible scientific locater and program for extraction metadata from scientific papers created by Tomáš Lokaj on FIT BUT. Results affirm imperfections in extraction metadata. This bachelor thesis introduce integral manual for comparing various informations.

    Extraction and Evaluation of Statistical Information from Social and Behavioral Science Papers

    Get PDF
    With substantial and continuing increases in the number of published papers across the scientific literature, development of reliable approaches for automated discovery and assessment of published findings is increasingly urgent. Tools which can extract critical information from scientific papers and metadata can support representation and reasoning over existing findings, and offer insights into replicability, robustness and generalizability of specific claims. In this work, we present a pipeline for the extraction of statistical information (p-values, sample size, number of hypotheses tested) from full-text scientific documents. We validate our approach on 300 papers selected from the social and behavioral science literatures, and suggest directions for next steps

    Epistemic logic for metadata modelling from scientific papers on Covid-19

    Get PDF
    The field of epistemic logic developed into an interdisciplinary area focused on explicating epistemic issues in, for example, artificial intelligence, computer security, game theory, economics, multiagent systems and the social sciences. Inspired, in part, by issues in these different ‘application’ areas, in this paper I propose an epistemic logic T for metadata extracted from scientific papers on COVID-19. More in details, I introduce a structure S to syntactically and semantically modelling metadata extracted with systems for extracting structured metadata from scientific articles in a born-digital form. These systems will be considered, in the logical model created, as ‘Metadata extraction agents’ (MEA). In this case MEA taken into consideration are CERMINE and TeamBeam. In an increasingly data-driven world, modelling data or metadata means to help systematise existing information and support the research community in building solutions to the COVID-19 pandemic

    Evaluation of header metadata extraction approaches and tools for scientific PDF documents

    Full text link
    This paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers looking to integrate the most suitable and effective metadata extraction tool into their software. We shed light on the strengths and weaknesses of seven tools in common use. In our evaluation using papers from the arXiv collection, GROBID delivered the best results, followed by Mendeley Desktop. SciPlore Xtract, PDFMeat, and SVMHeaderParse also delivered good results depending on the metadata type to be extracted

    VisualBib(va): A Visual Analytics Platform for Authoring and Reviewing Bibliographies

    Get PDF
    Researchers are daily engaged in bibliographic tasks concerning literature search and review, both in the role of authors of scientific papers and when they are reviewers or evaluators. Current indexing platforms poorly support the visual exploration and comparative metadata analysis coming from subsequent searches. To address these issues, we designed and realized VisualBib(va), an online visual analytics solution, where a visual environment includes analysis control, bibliography exploration, automatic metadata extraction, and metrics visualization for real-time scenarios. We introduce and discuss here the relevant functions that VisualBib(va) supports through one usage scenarios related to the creation of a bibliography

    Theory Entity Extraction for Social and Behavioral Sciences Papers Using Distant Supervision

    Get PDF
    Theories and models, which are common in scientific papers in almost all domains, usually provide the foundations of theoretical analysis and experiments. Understanding the use of theories and models can shed light on the credibility and reproducibility of research works. Compared with metadata, such as title, author, keywords, etc., theory extraction in scientific literature is rarely explored, especially for social and behavioral science (SBS) domains. One challenge of applying supervised learning methods is the lack of a large number of labeled samples for training. In this paper, we propose an automated framework based on distant supervision that leverages entity mentions from Wikipedia to build a ground truth corpus consisting of more than 4500 automatically annotated sentences containing theory/model mentions. We use this corpus to train models for theory extraction in SBS papers. We compared four deep learning architectures and found the RoBERTa-BiLSTM-CRF is the best one with a precision as high as 89.72%. The model is promising to be conveniently extended to domains other than SBS. The code and data are publicly available at https://github.com/lamps-lab/theory

    VisualBib(va): A Visual Analytics Platform for Authoring and Reviewing Bibliographies

    Get PDF
    Researchers are daily engaged in bibliographic tasks concerning literature search and review, both in the role of authors of scientific papers and when they are reviewers or evaluators. Current indexing platforms poorly support the visual exploration and comparative metadata analysis coming from subsequent searches. To address these issues, we designed and realized VisualBib(va), an online visual analytics solution, where a visual environment includes analysis control, bibliography exploration, automatic metadata extraction, and metrics visualization for real-time scenarios. We introduce and discuss here the relevant functions that VisualBib(va) supports through two usage scenarios related to the creation and the review of a bibliography. Full details about the VisualBib(va) design, implementation and evaluation are available in [3]. A fully interactive environment is available at http://visualbib.uniud.it/ (video demo: http://bit.ly/3fKuZNg)

    Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics

    Full text link
    The workshop "Mining Scientific Papers: Computational Linguistics and Bibliometrics" (CLBib 2015), co-located with the 15th International Society of Scientometrics and Informetrics Conference (ISSI 2015), brought together researchers in Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing (NLP). The goals of the workshop were to answer questions like: How can we enhance author network analysis and Bibliometrics using data obtained by text analytics? What insights can NLP provide on the structure of scientific writing, on citation networks, and on in-text citation analysis? This workshop is the first step to foster the reflection on the interdisciplinarity and the benefits that the two disciplines Bibliometrics and Natural Language Processing can drive from it.Comment: 4 pages, Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics at ISSI 201

    A-posteriori provenance-enabled linking of publications and datasets via crowdsourcing

    No full text
    This paper aims to share with the digital library community different opportunities to leverage crowdsourcing for a-posteriori capturing of dataset citation graphs. We describe a practical approach, which exploits one possible crowdsourcing technique to collect these graphs from domain experts and proposes their publication as Linked Data using the W3C PROV standard. Based on our findings from a study we ran during the USEWOD 2014 workshop, we propose a semi-automatic approach that generates metadata by leveraging information extraction as an additional step to crowdsourcing, to generate high-quality data citation graphs. Furthermore, we consider the design implications on our crowdsourcing approach when non-expert participants are involved in the process<br/
    corecore