102 research outputs found

    A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries

    Get PDF
    There is growing interest in systems that generate timeline summaries by filtering high-volume streams of documents to retain only those that are relevant to a particular event or topic. Continued advances in algorithms and techniques for this task depend on standardized and reproducible evaluation methodologies for comparing systems. However, timeline summary evaluation is still in its infancy, with competing methodologies currently being explored in international evaluation forums such as TREC. One area of active exploration is how to explicitly represent the units of information that should appear in a 'good' summary. Currently, there are two main approaches, one based on identifying nuggets in an external 'ground truth', and the other based on clustering system outputs. In this paper, by building test collections that have both nugget and cluster annotations, we are able to compare these two approaches. Specifically, we address questions related to evaluation effort, differences in the final evaluation products, and correlations between scores and rankings generated by both approaches. We summarize advantages and disadvantages of nuggets and clusters to offer recommendations for future system evaluation

    HIVE-MIND SPACE: A META-DESIGN APPROACH FOR CULTIVATING AND SUPPORTING COLLABORATIVE DESIGN.

    Get PDF
    The ever-growing complexity of design projects requires more knowledge than any individual can have and, therefore, needs the active engagement of all stakeholders in the design process. Collaborative design exploits synergies from multidisciplinary communities, encourages divergent thinking, and enhances social creativity. The research documented in this thesis supports and deepens the understanding of collaborative design in two dimensions: (1) It developed and evaluated socio-technical systems to support collaborative design projects; and (2) It defined and explored a meta- design framework focused on how these systems enable users, as active contributors, to modify and further develop them. The research is grounded in and simultaneously extends the following major dimensions of meta-design: (1) It exploits the contributions of social media and web 2.0 as innovative information technologies; (2) It facilitates the shift from consumer cultures to cultures of participation; (3) It fosters social creativity by harnessing contributions that occur in cultures of participation; (4) It empowers end-users to be active designers involved in creating situated solutions. In a world where change is the norm, meta-design is a necessity rather than a luxury because it is impossible to design software systems at design time for problems that occur only at use time. The co-evolution of systems and users\u2bc social practices pursued in this thesis requires a software environment that can evolve and be tailored continuously. End-user development explores tools and methods to support end users who tailor software artifacts. However, it addresses this objective primarily from a technical perspective and focuses mainly on tailorability. This thesis, centered on meta-design, extends end-user development by creating social conditions and design processes for broad participation in design activities both at design time and at use time. It builds on previous research into meta- design that has provided a strategic overview of design opportunities and principles. And it addresses some shortcomings of meta-design, such as the lack of guidelines for building concrete meta-design environments that can be assessed by empirical evaluation. Given the goal of this research, to explore meta-design approaches for cultivating and supporting collaborative design, the overarching research question guiding this work is: How do we provide a socio-technical environment to bring multidisciplinary design communities together to foster creativity, collaboration, and design evolution? 8 To answer this question, my research was carried out through four different phases: (1) synthesizing concepts, models, and theories; (2) framing conceptual models; (3) developing several systems in specific application areas; and (4) conducting empirical evaluation studies. The main contributions of this research are: \uf0a7 The Hive-Mind Space model, a meta-design framework derived from the \u201csoftware shaping workshop\u201d methodology and that integrates the \u201cseeding, evolutionary growth, reseeding\u201d model. The bottom-up approach inherent in this framework breaks down static social structures so as to support richer ecologies of participation. It provides the means for structuring communication and appropriation. The model\u2bcs open mediation mechanism tackles unanticipated communication gaps among different design communities. \uf0a7 MikiWiki, a structured programmable wiki I developed to demonstrate how the hive-mind space model can be implemented as a practical platform that benefits users and how its features and values can be specified so as to be empirically observable and assessable; \uf0a7 Empirical insights, such as those based on applying MikiWiki to different collaborative design studies, provide evidence that different phases of meta-design represent different modes rather than discrete levels

    Text mining of adverse events in clinical trials: Deep learning approach

    Get PDF
    Background: Pharmacovigilance and safety reporting, which involves processes for monitoring the use of medicines in clinical trials, plays a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events. Objective: This study aimed to demonstrate feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable a statistical analysis of the aforementioned patterns. Methods: We used the UniïŹed Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as ICD-10, MedDRA and SNOMED. We used MetaMap, highly configurable dictionary lookup software, to identify mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformer (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represent adverse events and those that do not. Results: The model achieved a high F1 score of 0.8080 despite the class imbalance. This is 10.15 percent points lower than human-like performance, but also 17.45 percent points higher than the baseline approach. Conclusions: These results confirmed that automated coding of adverse events described in the narrative section of the serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion. Keywords: natural language processing; deep learning; machine learning; classificatio

    Answering clinical questions with knowledge-based and statistical techniques

    Get PDF
    The combination of recent developments in question-answering research and the availability of unparalleled resources developed specifically for automatic semantic processing of text in the medical domain provides a unique opportunity to explore complex question answering in the domain of clinical medicine. This article presents a system designed to satisfy the information needs of physicians practicing evidence-based medicine. We have developed a series of knowledge extractors, which employ a combination of knowledge-based and statistical techniques, for automatically identifying clinically relevant aspects of MEDLINE abstracts. These extracted elements serve as the input to an algorithm that scores the relevance of citations with respect to structured representations of information needs, in accordance with the principles of evidencebased medicine. Starting with an initial list of citations retrieved by PubMed, our system can bring relevant abstracts into higher ranking positions, and from these abstracts generate responses that directly answer physicians ’ questions. We describe three separate evaluations: one focused on the accuracy of the knowledge extractors, one conceptualized as a document reranking task, and finally, an evaluation of answers by two physicians. Experiments on a collection of real-world clinical questions show that our approach significantly outperforms the already competitive PubMed baseline. 1

    Towards Context-free Information Importance Estimation

    Get PDF
    The amount of information contained in heterogeneous text documents such as news articles, blogs, social media posts, scientific articles, discussion forums, and microblogging platforms is already huge and is going to increase further. It is not possible for humans to cope with this flood of information, so that important information can neither be found nor be utilized. This situation is unfortunate since information is the key driver in many areas of society in the present Information Age. Hence, developing automatic means that can assist people to handle the information overload is crucial. Developing methods for automatic estimation of information importance is an essential step towards this goal. The guiding hypothesis of this work is that prior methods for automatic information importance estimation are inherently limited because they are based on merely correlated signals that are, however, not causally linked with information importance. To resolve this issue, we lay in this work the foundations for a fundamentally new approach for importance estimation. The key idea of context-free information importance estimation is to equip machine learning models with world knowledge so that they can estimate information importance based on causal reasons. In the first part of this work, we lay the theoretical foundations for context-free information importance estimation. First, we discuss how the abstract concept of information importance can be formally defined. So far, a formal definition of this concept is missing in the research community. We close this gap by discussing two information importance definitions, which equate the importance of information with its impact on the behavior and the impact on the course of life of the information recipients, respectively. Second, we discuss how information importance estimation abilities can be assessed. Usually, this is done by performing automatic summarization of text documents. However, we find that this approach is not ideal. Instead, we propose to consider ranking, regression, and preference prediction tasks as alternatives in future work. Third, we deduce context-free information importance estimation as a logical consequence of the previously introduced importance definitions. We find that reliable importance estimation, in particular for heterogeneous text documents, is only possible with context-free methods. In the second part, we develop the first machine learning models based on the idea of context-free information importance estimation. To this end, we first tackle the lack of suited datasets that are required to train and test machine learning models. In particular, large and heterogeneous datasets to investigate automatic summarization of multiple source documents are missing, because their construction is complicated and costly. To solve this problem, we present a simple and cost-efficient corpus construction approach and demonstrate its applicability by creating new multi-document summarization datasets. Second, we develop a new machine learning approach for context-free information importance estimation, implement a concrete realization, and demonstrate its advantages over contextual importance estimators. Third, we develop a new method to evaluate automatic summarization methods. Previous works are based on expensive reference summaries and unreliable semantic comparisons of text documents. On the contrary, our approach uses cheap pairwise preference annotations and only much simpler sentence-level similarity estimation. This work lays the foundations for context-free information importance estimation. We hope that future research will explore if this fundamentally new type of information importance estimation can eventually lead to human-level information importance estimation abilities

    Evaluating Information Retrieval and Access Tasks

    Get PDF
    This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one

    Democratizing Information Access through Low Overhead Systems

    Get PDF
    Despite its importance, accessing information in storage systems or raw data is challenging or impossible for most people due to the sheer amount and heterogeneity of data as well as the overheads and complexities of existing systems. In this thesis, we propose several approaches to improve on that and therefore democratize information access. Data-driven and AI based approaches make it possible to provide the necessary information access for many tasks at scale. Unfortunately, most existing approaches can only be built and used by IT experts and data scientists, yet the current demand for data scientists cannot be met by far. Furthermore, their application is expensive. To counter this, approaches with low overhead, i.e., without the need for large amounts of training data, manually annotating or extracting information, and extensive computation are needed. However, such systems still need to adapt to special terminology of different domains, and the individual information needs of the users. Moreover, they should be usable without extensive training; we thus aim to create ready-to-use systems that provide intuitive or familiar ways for interaction, e.g., chatbot-like natural language input or graphical user interfaces. In this thesis, we propose a number of contributions to three important subfields of data exploration and processing: Natural Language Interfaces for Data Access & Manipulation, Personalized Summarizations of Text Collections, and Information Extraction & Integration. These approaches allow data scientists, domain experts and end users to access and manipulate information in a quick and easy way. First, we propose two natural language interfaces for data access and manipulation. Natural language is a useful alternative interface for relational databases, since it allows users to formulate complex questions without requiring knowledge of SQL. We propose an approach based on weak supervision that augments existing deep learning techniques in order to improve the performance of models for natural language to SQL translation. Moreover, we apply the idea to build a training pipeline for conversational agents (i.e., chatbot-like systems allowing to interact with a database and perform actions like ticket booking). The pipeline uses weak supervision to generate the training data automatically from a relational database and its set of defined transactions. Our approach is data-aware, i.e., it leverages the data characteristics of the DB at runtime to optimize the dialogue flow and reduce necessary interactions. Additionally, we complement this research by presenting a meta-study on the reproducibility and availability of natural language interfaces for databases (NLIDBs) for real-world applications, and a benchmark to evaluate the linguistic robustness of NLIDBs. Second, we work on personalized summarization and its usage for data exploration. The central idea is to produce summaries that exactly cover the current information need of the users. By creating multiple summaries or shifting the focus during the interactive creation process, these summaries can be used to explore the contents of unknown text collections. We propose an approach to create such personalized summaries at interactive speed; this is achieved by carefully sampling from the inputs. As part of our research on multi-document summary, we noticed that there is a lack of diverse evaluation corpora for this task. We therefore present a framework that can be used to automatically create new summarization corpora, and apply and validate it. Third, we provide ways to democratize information extraction and integration. This becomes relevant when data is scattered across different sources and there is no tabular representation that already contains all information needed. Therefore, it might be necessary to integrate different structured sources, or to even extract the required information pieces from text collections first and then to organize them. To integrate existing structured data sources, we present and evaluate a novel end-to-end approach for schema matching based on neural embeddings. Finally, we tackle the automatic creation of tables from text for situations where no suitable structured source to answer an information need is available. Our proposed approach can execute SQL-like queries on text collections in an ad-hoc manner, both to directly extract facts from text documents, and to produce aggregated tables stating information that is not explicitly mentioned in the documents. Our approach works by generalizing user feedback and therefore does not need domain-specific resources for the domain adaption. It runs at interactive speed even on commodity hardware. Overall, our approaches can provide a quality level compared to state-of-the-art approaches, but often at a fraction of the associated costs. In other fields like the table extractions, we even provide functionality that is—to our knowledge—not covered by any generic tooling available to end users. There are still many interesting challenges to solve, and the recent rise of large language models has shifted what seems possible with regard to dealing with human language once more. Yet, we hope that our contributions provide a useful step towards democratization of information access

    Perspectives on Large Language Models for Relevance Judgment

    Full text link
    When asked, current large language models (LLMs) like ChatGPT claim that they can assist us with relevance judgments. Many researchers think this would not lead to credible IR research. In this perspective paper, we discuss possible ways for LLMs to assist human experts along with concerns and issues that arise. We devise a human-machine collaboration spectrum that allows categorizing different relevance judgment strategies, based on how much the human relies on the machine. For the extreme point of "fully automated assessment", we further include a pilot experiment on whether LLM-based relevance judgments correlate with judgments from trained human assessors. We conclude the paper by providing two opposing perspectives - for and against the use of LLMs for automatic relevance judgments - and a compromise perspective, informed by our analyses of the literature, our preliminary experimental evidence, and our experience as IR researchers. We hope to start a constructive discussion within the community to avoid a stale-mate during review, where work is dammed if is uses LLMs for evaluation and dammed if it doesn't
    • 

    corecore