18 research outputs found

    Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017)

    Full text link
    The large scale of scholarly publications poses a challenge for scholars in information seeking and sensemaking. Bibliometrics, information retrieval (IR), text mining and NLP techniques could help in these search and look-up activities, but are not yet widely used. This workshop is intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, text mining and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The BIRNDL workshop at SIGIR 2017 will incorporate an invited talk, paper sessions and the third edition of the Computational Linguistics (CL) Scientific Summarization Shared Task.Comment: 2 pages, workshop paper accepted at the SIGIR 201

    Denmark's Participation in the Search Engine TREC COVID-19 Challenge: Lessons Learned about Searching for Precise Biomedical Scientific Information on COVID-19

    Full text link
    This report describes the participation of two Danish universities, University of Copenhagen and Aalborg University, in the international search engine competition on COVID-19 (the 2020 TREC-COVID Challenge) organised by the U.S. National Institute of Standards and Technology (NIST) and its Text Retrieval Conference (TREC) division. The aim of the competition was to find the best search engine strategy for retrieving precise biomedical scientific information on COVID-19 from the largest, at that point in time, dataset of curated scientific literature on COVID-19 -- the COVID-19 Open Research Dataset (CORD-19). CORD-19 was the result of a call to action to the tech community by the U.S. White House in March 2020, and was shortly thereafter posted on Kaggle as an AI competition by the Allen Institute for AI, the Chan Zuckerberg Initiative, Georgetown University's Center for Security and Emerging Technology, Microsoft, and the National Library of Medicine at the US National Institutes of Health. CORD-19 contained over 200,000 scholarly articles (of which more than 100,000 were with full text) about COVID-19, SARS-CoV-2, and related coronaviruses, gathered from curated biomedical sources. The TREC-COVID challenge asked for the best way to (a) retrieve accurate and precise scientific information, in response to some queries formulated by biomedical experts, and (b) rank this information decreasingly by its relevance to the query. In this document, we describe the TREC-COVID competition setup, our participation to it, and our resulting reflections and lessons learned about the state-of-art technology when faced with the acute task of retrieving precise scientific information from a rapidly growing corpus of literature, in response to highly specialised queries, in the middle of a pandemic

    Atıf Klasiklerinin Etkisinin ve İlgililik Sıralamalarının Pennant Diyagramları ile Analizi

    Get PDF
    Citation indexes are important authority resources for measuring the contribution of scientists and scientific publications to literature. Many studies in information retrieval are based on research aiming to develop retrieval algorithms. These studies tend to receive citations from different fields because of the interdisciplinary nature of information retrieval. Therefore, it is important to analyze the so-called “citation classics” retrospectively to find out their impact on other fields. Yet, it is not easy to do this using citation indexes, especially for relatively old papers, as traditional citation analysis tends not to reveal the full impact of a work on other studies at its time and periods that follow. In order to see the big picture it is important to study the contribution of these studies on other disciplines as well. In this study the impact of Maron and Kuhns’ citation classic on “probabilistic retrieval” published in 1960 has been visualized using pennant diagrams that were developed on the basis of relevance theory, information retrieval and bibliometrics. We hypothesized that “The interdisciplinary relations that are unobservable with traditional citation analysis can be revealed using the pennant diagrams method”. In order to test the hypothesis works that cited Maron and Kuhns’ study between the years of 1960 and 2015 have been downloaded with their references (a total of 4,176 unique works) and graphics have been prepared by the macros written in MS Excel. Of 4,176 works, 90 were selected using convenience sampling techniques to create static and interactive pennant diagrams for further analysis. Another important output of this study is the relevance rankings. As an alternative to the relevance rankings based on the similarity of references already used in citation indexes, relevance rankings have been created using the pennant diagrams that took into account not only items that cited the core (seed) paper but also citations to the items that cited the core paper. Relevance rankings based on the similarity of references and that of pennant diagrams have been compared. Findings support the hypothesis in that pennant diagrams provide information as to which papers that the core paper on probabilistic model influenced or got influenced from, directly or indirectly. Relevance ranking based on pennant diagrams revealed the impact of the core paper on information retrieval field as well as on other disciplines. Furthermore, it identified the relations between these somewhat disconnected fields, between authors, works, and journals that cannot be readily identified using traditional citation analysis. Relevance rankings using pennant diagrams seem to have been more successful than the relevance rankings based on references similarity. This study is the first such study in Turkey that uses pennant diagrams for relevance rankings. The data used in graphs and relevance rankings are available through citation indexes (the frequencies of total citations and co-citations). Thus, alternative relevance rankings based on pennant diagrams can be offered to users. Pennant diagrams can help researchers track the relevant literature more easily as well as identify how a core work influences other works in a specific field or in other fields

    Three real-world datasets and neural computational models for classification tasks in patent landscaping

    Get PDF
    Patent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics.Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-oriented classification tasks, going beyond title and abstract is crucial, CPC labels are an effective source of information, and combining all features yields the best results

    Citation recommendation: approaches and datasets

    Get PDF
    Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction to automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles

    Citation Recommendation: Approaches and Datasets

    Get PDF
    Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction into automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods, and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles.Comment: to be published in the International Journal on Digital Librarie
    corecore