6 research outputs found

    Vector Space Proximity Based Document Retrieval For Document Embeddings Built By Transformers

    Get PDF
    Internet publications are staying atop of local and international events, generating hundreds, sometimes thousands of news articles per day, making it difficult for readers to navigate this stream of information without assistance. Competition for the reader’s attention has never been greater. One strategy to keep readers’ attention on a specific article and help them better understand its content is news recommendation, which automatically provides readers with references to relevant complementary articles. However, to be effective, news recommendation needs to select from a large collection of candidate articles only a handful of articles that are relevant yet provide diverse information. In this thesis, we propose and experiment with three methods for news recommendation and evaluate them in the context of the NIST News Track. Our first approach is based on the classic BM25 information retrieval approach and assumes that relevant articles will share common key- words with the current article. Our second approach is based on novel document embedding repre- sentations and uses various proximity measures to retrieve the closest documents. For this approach, we experimented with a substantial number of models, proximity measures, and hyperparameters, yielding a total of 47,332 distinct models. Finally, our third approach combines the BM25 and the embedding models to increase the diversity of the results. The results on the 2020 TREC News Track show that the performance of the BM25 model (nDCG@5 of 0.5924) greatly exceeds the TREC median performance (nDCG@5 of 0.5250) and achieves the highest score at the shared task. The performance of the embedding model alone (nDCG@5 of 0.4541) is lower than the TREC median and BM25. The performance of the combined model (nDCG@5 of 0.5873) is rather close to that of the BM25 model; however, an analysis of the results shows that the recommended articles are different from those proposed by BM25, hence may constitute a promising approach to reach diversity without much loss in relevance

    Big Brother: A Drop-In Website Interaction Logging Service

    Get PDF
    Fine-grained logging of interactions in user studies is important for studying user behaviour, among other reasons. However, in many research scenarios, the way interactions are logged are usually tied to a monolithic system. We present a generic, application independent service for logging interactions in web-pages, specifically targeting user studies. Our service, Big Brother, can be dropped-in to existing user interfaces with almost no configuration required by researchers. Big Brother has already been used in several user studies to record interactions in a number of user study research scenarios, such as lab-based and crowdsourcing environments. We further demonstrate the ability for Big Brother to scale to very large user studies through benchmarking experiments. Big Brother also provides a number of additional tools for visualising and analysing interactions. Big Brother significantly lowers the barrier to entry for logging user interactions by providing a minimal but powerful, no configuration necessary, service for researchers and practitioners of user studies that can scale to thousands of concurrent sessions. We have made the source code and releases for Big Brother available for download at https://github.com/hscells/bigbro

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Grammar and Corpora 2016

    Get PDF
    In recent years, the availability of large annotated corpora, together with a new interest in the empirical foundation and validation of linguistic theory and description, has sparked a surge of novel work using corpus methods to study the grammar of natural languages. This volume presents recent developments and advances, firstly, in corpus-oriented grammar research with a special focus on Germanic, Slavic, and Romance languages and, secondly, in corpus linguistic methodology as well as the application of corpus methods to grammar-related fields. The volume results from the sixth international conference Grammar and Corpora (GaC 2016), which took place at the Institute for the German Language (IDS) in Mannheim, Germany, in November 2016.Die Verfügbarkeit großer annotierter und durchsuchbarer Korpora, verbunden mit einem neuerwachten Interesse an der empirischen Grundlegung und Validierung linguistischer Theorie und Beschreibung hat in letzter Zeit zu einer regelrechten Welle interessanter Arbeiten zur Grammatik natürlicher Sprachen geführt. Dieser Band präsentiert zum einen neuere Entwicklungen in der korpusorientierten Forschung zur Grammatik germanischer, romanischer und slawischer Sprachen und zum anderen innovative Ansätze in der einschlägigen korpuslinguistischen Methodologie, die auch Anwendung im Umfeld der Grammatik finden. Der Band fasst die Beiträge der sechsten internationalen Konferenz Grammar and Corpora zusammen, die im November 2016 am Institut für Deutsche Sprache (IDS) in Mannheim stattfand
    corecore