38 research outputs found

    Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review

    Get PDF
    Active learning strategies are often deployed in technology-assisted review tasks, such as e-discovery and sensitivity review, to learn a classifier that can assist the reviewers with their task. In particular, an active learning strategy selects the documents that are expected to be the most useful for learning an effective classifier, so that these documents can be reviewed before the less useful ones. However, when reviewing for sensitivity, the order in which the documents are reviewed can impact on the reviewers' ability to perform the review. Therefore, when deploying active learning in technology-assisted sensitivity review, we want to know when a sufficiently effective classifier has been learned, such that the active learning can stop and the reviewing order of the documents can be selected by the reviewer instead of the classifier. In this work, we propose two active learning stopping strategies for technology-assisted sensitivity review. We evaluate the effectiveness of our proposed approaches in comparison with three state-of-the-art stopping strategies from the literature. We show that our best performing approach results in a significantly more effective sensitivity classifier (+6.6% F2) than the best performing stopping strategy from the literature (McNemar's test, p<0.05)

    Combining active learning and semi-supervised learning techniques to extract protein interaction sentences

    Get PDF
    Background: Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. Methods: We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. Results: By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Conclusions: Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.X116sciescopu

    Quantitative modeling of the physiology of ascites in portal hypertension

    Get PDF
    Although the factors involved in cirrhotic ascites have been studied for a century, a number of observations are not understood, including the action of diuretics in the treatment of ascites and the ability of the plasma-ascitic albumin gradient to diagnose portal hypertension. This communication presents an explanation of ascites based solely on pathophysiological alterations within the peritoneal cavity. A quantitative model is described based on experimental vascular and intraperitoneal pressures, lymph flow, and peritoneal space compliance. The model's predictions accurately mimic clinical observations in ascites, including the magnitude and time course of changes observed following paracentesis or diuretic therapy

    Low incidence of SARS-CoV-2, risk factors of mortality and the course of illness in the French national cohort of dialysis patients

    Get PDF

    Using long short-term memory neural networks to analyze SEC 13D filings: a recipe for human and machine interaction

    No full text
    We implement an efficient methodology for extracting themes from Securities Exchange Commission 13D filings using aspects of human‐assisted active learning and long short‐term memory (LSTM) neural networks. Sentences from the ‘Purpose of Transaction’ section of each filing are extracted and a randomly chosen subset is labelled based on six filing themes that the existing literature on shareholder activism has shown to have an impact on stock returns. We find that an LSTM neural network that accepts sentences as input performs significantly better, with precision of 77%, than an alternately specified neural network that uses the common bag of words approach. This indicates that both sentence structure and vocabulary are important in classifying SEC 13D filings. Our study has important implications, as it addresses the recent cautions raised in the literature that analysis of finance and accounting‐related text sources should move beyond bag‐of‐words approaches to alternatives that incorporate the analysis of word sense and meaning reflecting context
    corecore