199 research outputs found

    Obtaining Reference's Topic Congruity in Indonesian Publications using Machine Learning Approach

    Get PDF
    There are some criteria on how an article is categorized as a good article for publications. It could depend on some aspect like formatting and clarity, but mainly it depends on how the content of the article is constructed. The consistency of the topic that the article was written could show us how the authors construct the main idea in the article content. One indication that shows this consistency is congruity in the article’s topic and the topic of literature or reference cited in the document listed in the bibliography. This works attempting to automate the topic detection on the article’s references then obtain the congruity to the article title’s topic through metadata extraction and text classification. This is done by extracting metadata of an article file to obtain all possible reference title using GROBID than classify the topic using a supervised classification model. We found that some refinements in the whole approach should be considered in the next step of this work

    Bootstrapping Multilingual Metadata Extraction: A Showcase in Cyrillic

    Get PDF

    ORKG-Leaderboards: a systematic workflow for mining leaderboards as a knowledge graph

    Get PDF
    The purpose of this work is to describe the orkg-Leaderboard software designed to extract leaderboards defined as task–dataset–metric tuples automatically from large collections of empirical research papers in artificial intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, the system is integrated with the open research knowledge graph (ORKG) platform, which fosters the machine-actionable publishing of scholarly findings. Thus, the systemsss output, when integrated within the ORKG’s supported Semantic Web infrastructure of representing machine-actionable ‘resources’ on the Web, enables: (1) broadly, the integration of empirical results of researchers across the world, thus enabling transparency in empirical research with the potential to also being complete contingent on the underlying data source(s) of publications; and (2) specifically, enables researchers to track the progress in AI with an overview of the state-of-the-art across the most common AI tasks and their corresponding datasets via dynamic ORKG frontend views leveraging tables and visualization charts over the machine-actionable data. Our best model achieves performances above 90% F1 on the leaderboard extraction task, thus proving orkg-Leaderboards a practically viable tool for real-world usage. Going forward, in a sense, orkg-Leaderboards transforms the leaderboard extraction task to an automated digitalization task, which has been, for a long time in the community, a crowdsourced endeavor

    Evaluating the Quality of the Indonesian Scientific Journal References using ParsCit, CERMINE and GROBID

    Get PDF
    There are several open-source tools available to extract the bibliographic references of the Pdf. Those tools based on the various approaches including rule-based approach, knowledge-based approach, machine learning-based approach, and the combination. To improve the services of the Indonesian Scientific Journal Database (ISJD), Center for Scientific Data and Documentation – Indonesian Institute of Sciences (PDDI-LIPI) intends to have an automatic bibliographic references extraction tool. The paper aims to analyze the quality of the reference metadata of the local journals with the three open-source tools, namely ParsCit, CERMINE and GROBID. The accuracy test of the three tools are poor. Those are 0.555, 0.633, and 0.605 for ParsCit, CERMINE, and GROBID respectively. It caused by many authors do not use a reference manager when they write the bibliography section. On the such condition this paper proposed to build an application to identify and correct errors in the bibliographic references of paper in ISJD. This application become a liaison between ISJD and open source tool for the bibliographic reference extraction. This paper proposed the combination of building software and using an open source

    Sequence Labeling for Citation Field Extraction from Cyrillic Script References

    Get PDF
    Extracting structured data from bibliographic references is a crucial task for the creation of scholarly databases. While approaches, tools, and evaluation data sets for the task exist, there is a distinct lack of support for languages other than English and scripts other than the Latin alphabet. A significant portion of the scientific literature that is thereby excluded consists of publications written in Cyrillic script languages. To address this problem, we introduce a new multilingual and multidisciplinary data set of over 100,000 labeled reference strings. The data set covers multiple Cyrillic languages and contains over 700 manually labeled references, while the remaining are generated synthetically. With random samples of varying size of this data, we train multiple well performing sequence labeling BERT models and thus show the usability of our proposed data set. To this end, we showcase an implementation of a multilingual BERT model trained on the synthetic data and evaluated on the manually labeled references. Our model achieves an F1 score of 0.93 and thereby significantly outperforms a state-of-the-art model we retrain and evaluate on our data
    • …
    corecore