862 research outputs found

    Biomedical information extraction for matching patients to clinical trials

    Get PDF
    Digital Medical information had an astonishing growth on the last decades, driven by an unprecedented number of medical writers, which lead to a complete revolution in what and how much information is available to the health professionals. The problem with this wave of information is that performing a precise selection of the information retrieved by medical information repositories is very exhaustive and time consuming for physicians. This is one of the biggest challenges for physicians with the new digital era: how to reduce the time spent finding the perfect matching document for a patient (e.g. intervention articles, clinical trial, prescriptions). Precision Medicine (PM) 2017 is the track by the Text REtrieval Conference (TREC), that is focused on this type of challenges exclusively for oncology. Using a dataset with a large amount of clinical trials, this track is a good real life example on how information retrieval solutions can be used to solve this types of problems. This track can be a very good starting point for applying information extraction and retrieval methods, in a very complex domain. The purpose of this thesis is to improve a system designed by the NovaSearch team for TREC PM 2017 Clinical Trials task, which got ranked on the top-5 systems of 2017. The NovaSearch team also participated on the 2018 track and got a 15% increase on precision compared to the 2017 one. It was used multiple IR techniques for information extraction and processing of data, including rank fusion, query expansion (e.g. Pseudo relevance feedback, Mesh terms expansion) and experiments with Learning to Rank (LETOR) algorithms. Our goal is to retrieve the best possible set of trials for a given patient, using precise documents filters to exclude the unwanted clinical trials. This work can open doors in what can be done for searching and perceiving the criteria to exclude or include the trials, helping physicians even on the more complex and difficult information retrieval tasks

    An overview of information extraction techniques for legal document analysis and processing

    Get PDF
    In an Indian law system, different courts publish their legal proceedings every month for future reference of legal experts and common people. Extensive manual labor and time are required to analyze and process the information stored in these lengthy complex legal documents. Automatic legal document processing is the solution to overcome drawbacks of manual processing and will be very helpful to the common man for a better understanding of a legal domain. In this paper, we are exploring the recent advances in the field of legal text processing and provide a comparative analysis of approaches used for it. In this work, we have divided the approaches into three classes NLP based, deep learning-based and, KBP based approaches. We have put special emphasis on the KBP approach as we strongly believe that this approach can handle the complexities of the legal domain well. We finally discuss some of the possible future research directions for legal document analysis and processing

    Rating News Claims: Feature Selection and Evaluation

    Get PDF
    News claims that travel the Internet and online social networks (OSNs) originate from different, sometimes unknown sources, which raises issues related to the credibility of those claims and the drivers behind them. Fact-checking websites such as Snopes, FactCheck, and Emergent use human evaluators to investigate and label news claims, but the process is labor- and time-intensive. Driven by the need to use data analytics and algorithms in assessing the credibility of news claims, we focus on what can be generalized about evaluating human-labeled claims. We developed tools to extract claims from Snopes and Emergent and used public datasets collected by and published on those websites. Claims extracted from those datasets were supervised or labeled with different claim ratings. We focus on claims with definite ratings—false, mostly false, true, and mostly true, with the goal of identifying distinctive features that can be used to distinguish true from false claims. Ultimately, those features can be used to predict future unsupervised or unlabeled claims. We evaluate different methods to extract features as well as different sets of features and their ability to predict the correct claim label. By far, we noticed that OSN websites report high rates of false claims in comparison with most of the other website categories. The rate of reported false claims is higher than the rate of true claims in fact-checking websites in most categories. At the content-analysis level, false claims tend to have more negative tones in sentiments and hence can provide supporting features to predict claim classification
    • …
    corecore