4 research outputs found

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Pretrained Transformers for Text Ranking: BERT and Beyond

    Get PDF
    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading

    Information between Data and Knowledge: Information Science and its Neighbors from Data Science to Digital Humanities

    Get PDF
    Digital humanities as well as data science as neighboring fields pose new challenges and opportunities for information science. The recent focus on data in the context of big data and deep learning brings along new tasks for information scientist for example in research data management. At the same time, information behavior changes in the light of the increasing digital availability of information in academia as well as in everyday life. In this volume, contributions from various fields like information behavior and information literacy, information retrieval, digital humanities, knowledge representation, emerging technologies, and information infrastructure showcase the development of information science research in recent years. Topics as diverse as social media analytics, fake news on Facebook, collaborative search practices, open educational resources or recent developments in research data management are some of the highlights of this volume. For more than 30 years, the International Symposium of Information Science has been the venue for bringing together information scientists from the German speaking countries. In addition to the regular scientific contributions, six of the best competitors for the prize for the best information science master thesis present their work
    corecore