690 research outputs found

    UV follow-up observations of five recently active novae in M31

    Get PDF
    Recently we initiated the Transient UV Objects (TUVO) project, in which we search for serendipitous UV transients in near-real time in Swift/UVOT data using a purposely-built pipeline

    The user perspective in professional information search

    Get PDF
    Computer Systems, Imagery and Medi

    Is de zoekmachine van de toekomst een chatbot?

    Get PDF
    Oratie uitgesproken door Prof. Dr. Suzan Verberne bij de aanvaarding van het ambt van hoogleraar Natural Language Processing aan de Universiteit Leiden op maandag 3 juni 2024____________________________________________________________Text also in English : Is the search engine of the future a chatbot?Oratie uitgesproken door Prof. Dr. Suzan Verberne bij de aanvaarding van het ambt van hoogleraar Natural Language Processing aan de Universiteit Leiden op maandag 3 juni 2024Computer Systems, Imagery and Medi

    A Test Collection of Synthetic Documents for Training Rankers:ChatGPT vs. Human Experts

    Get PDF
    We investigate the usefulness of generative large language models (LLMs) in generating training data for cross-encoder re-rankers in a novel direction: generating synthetic documents instead of synthetic queries. We introduce a new dataset, ChatGPT-RetrievalQA, and compare the effectiveness of strong models fine-tuned on both LLM-generated and human-generated data. We build ChatGPT-RetrievalQA based on an existing dataset, the human ChatGPT comparison corpus (HC3), consisting of multiple public question collections featuring both human- and ChatGPT-generated responses. We fine-tune a range of cross-encoder re-rankers on either human-generated or ChatGPT-generated data. Our evaluation on MS MARCO DEV, TREC DL'19, and TREC DL'20 demonstrates that cross-encoder re-ranking models trained on LLM-generated responses are significantly more effective for out-of-domain re-ranking than those trained on human responses. For in-domain re-ranking, however, the human-trained re-rankers outperform the LLM-trained re-rankers. Our novel findings suggest that generative LLMs have high potential in generating training data for neural retrieval models and can be used to augment training data, especially in domains with less labeled data. ChatGPT-RetrievalQA presents various opportunities for analyzing and improving rankers with both human- and LLM-generated data. Our data, code, and model checkpoints are publicly available.</p

    Overview of the SBS 2016 Mining Track

    Get PDF

    Using skipgrams and POS-based feature selection for patent classification

    Get PDF
    Contains fulltext : 116289.pdf (publisher's version ) (Open Access)19 p

    Transfer Learning for Health-related Twitter Data

    Get PDF
    Algorithms and the Foundations of Software technolog

    Citation Metrics for Legal Information Retrieval Systems

    Get PDF
    This paper examines citations in legal information retrieval. Citation metrics can be a factor of relevance in the ranking algorithms of legal information retrieval systems. We provide an overview of the Dutch legal publishing culture. To analyze citations in legal publications, we manually analyze a set of documents and register by what (type of) documents they are cited: document type, intended audience of documents, actual audience of documents and author affiliations. An analysis of 9 cited documents and 217 citing documents shows no strict separation in citations between documents aimed at scholars and documents aimed at practitioners. Our results suggest that citations in legal documents do not measure the impact on scholarly publications and scholars, but measure a broader scope of impact, or relevance, for the legal field.Computer Science

    CLosER: Conversational Legal Longformer with Expertise-Aware Passage Response Ranker for Long Contexts

    Get PDF
    In this paper, we investigate the task of response ranking in conversational legal search. We propose a novel method for conversational passage response retrieval (ConvPR) for long conversations in domains with mixed levels of expertise. Conversational legal search is challenging because the domain includes long, multi-participant dialogues with domain-specific language. Furthermore, as opposed to other domains, there typically is a large knowledge gap between the questioner (a layperson) and the responders (lawyers), participating in the same conversation. We collect and release a large-scale real-world dataset called LegalConv with nearly one million legal conversations from a legal community question answering (CQA) platform. We address the particular challenges of processing legal conversations, with our novel Conversational Legal Longformer with Expertise-Aware Response Ranker, called CLosER. The proposed method has two main innovations compared to state-of-the-art methods for ConvPR: (i) Expertise-Aware Post-Training; a learning objective that takes into account the knowledge gap difference between participants to the conversation; and (ii) a simple but effective strategy for re-ordering the context utterances in long conversations to overcome the limitations of the sparse attention mechanism of the Longformer architecture. Evaluation on LegalConv shows that our proposed method substantially and significantly outperforms existing state-of-the-art models on the response selection task. Our analysis indicates that our Expertise-Aware Post-Training, i.e., continued pre-training or domain/task adaptation, plays an important role in the achieved effectiveness. Our proposed method is generalizable to other tasks with domain-specific challenges and can facilitate future research on conversational search in other domains.</p
    corecore