8 research outputs found

    IDENTIFIKASI KATA KUNCI PADA KONTEN PUBLIKASI JURNAL ILMIAH UNTUK STUDI KASUS PENCARIAN PUBLIKASI ONLINE ITS (POMITS)

    Get PDF
    Publikasi Online ITS (POMITS) adalah jurnal yang diperuntukkan sebagai jurnal publikasi bagi mahasiswa program sarjana ITS. Artikel yang terbit di dalamnya sudah cukup banyak dan seringkali diperlukan sebagai bahan referensi untuk penelitian mahasiswa lainnya. Proses pencarian yang ada saat ini masih berdasarkan judul, abstrak, nama penulis, dan kata kunci. Data-data tersebut masih dimasukkan secara manual oleh penulis. Proses ini memungkinkan adanya pemilihan kata kunci yang kurang sesuai. Sehingga diperlukan suatu upaya agar pemilihan kata kunci tersebut bisa lebih tepat dan merepresentasikan artikel tersebut.Tujuan dari penelitian ini adalah melakukan identifikasi kata kunci dalam artikel secara otomatis. Kata kunci tersebut dibedakan menjadi perangkat lunak yang digunakan, metode, dan kata kunci lain yang representatif. Dengan adanya identifikasi ini, pencarian artikel dapat mengembalikan hasil pencarian yang lebih tepat. Masalah ini dapat diatasi dengan menggunakan Named Entity Recognition (NER). Namun, model NER bahasa Indonesia yang dimiliki SpaCy masih belum tersedia, maka diperlukan pembangunan model NER tersebut.Dalam penelitian ini, identifikasi setiap anotasi kata kunci pada konten POMITS menjadi metadata dilakukan dengan mendeteksi named entity berupa perangkat lunak, metode, dan kata kunci representatif menggunakan model NER. Hasil anotasi NER disimpan dalam bentuk pasangan triplets pada triple store Apache Jena Fuseki. Selanjutnya, triple store tersebut dapat digunakan untuk menjawab pencarian tentang perangkat lunak, metode, dan kata kunci. Berdasarkan hasil pengujian, sistem berhasil mendeteksi entitas NER serta menyimpan anotasi dalam bentuk pasangan triplets pada Apache Jena Fuseki. Identifikasi kata kunci menghasilkan rata-rata nilai presisi 84,76% dan recall 63.59%

    Finding datasets in publications: the Syracuse University approach

    Get PDF
    Datasets are critical for scientific research, playing a role in replication, reproducibility, and efficiency. Researchers have recently shown that datasets are becoming more important for science to function properly, even serving as artifacts of study themselves. However, citing datasets is not a common or standard practice in spite of recent efforts by data repositories and funding agencies. This greatly affects our ability to track their usage and importance. A potential solution to this problem is to automatically extract dataset mentions from scientific articles. In this work, we propose to achieve such extraction by using a neural network based on a BiLSTM-CRF architecture. Our method achieves F1=0.885 in social science articles released as part of the Rich Context Dataset. We discuss future improvements to the model and applications beyond social sciences

    The Automatic Detection of Dataset Names in Scientific Articles

    Get PDF
    We study the task of recognizing named datasets in scientific articles as a Named Entity Recognition (NER) problem. Noticing that available annotated datasets were not adequate for our goals, we annotated 6000 sentences extracted from four major AI conferences, with roughly half of them containing one or more named datasets. A distinguishing feature of this set is the many sentences using enumerations, conjunctions and ellipses, resulting in long BI+ tag sequences. On all measures, the SciBERT NER tagger performed best and most robustly. Our baseline rule based tagger performed remarkably well and better than several state-of-the-art methods. The gold standard dataset, with links and offsets from each sentence to the (open access available) articles together with the annotation guidelines and all code used in the experiments, is available on GitHub

    Positioning and power in academic publishing: players, agents and agendas

    Get PDF
    The field of electronic publishing has grown exponentially in the last two decades, but we are still in the middle of this digital transformation. With technologies coming and going for all kinds of reasons, the distribution of economic, technological and discursive power continues to be negotiated. This book presents the proceedings of the 20th Conference on Electronic Publishing (Elpub), held in Göttingen, Germany, in June 2016. This year’s conference explores issues of positioning and power in academic publishing, and it brings together world leading stakeholders such as academics, practitioners, policymakers, students and entrepreneurs from a wide variety of fields to exchange information and discuss the advent of innovations in the areas of electronic publishing, as well as reflect on the development in the field over the last 20 years. Topics covered in the papers include how to maintain the quality of electronic publications, modeling processes and the increasingly prevalent issue of open access, as well as new systems, database repositories and datasets. This overview of the field will be of interest to all those who work in or make use of electronic publishing
    corecore