29 research outputs found

    Statistical learning methods for bias-aware HIV therapy screening

    Get PDF
    The human immunodeficiency virus (HIV) is the causative agent of the acquired immunodeficiency syndrome (AIDS) which claimed nearly 30 million lives and is arguably among the worst plagues in human history. With no cure or vaccine in sight, HIV patients are treated by administration of combinations of antiretroviral drugs. The very large number of such combinations makes the manual search for an effective therapy practically impossible, especially in advanced stages of the disease. Therapy selection can be supported by statistical methods that predict the outcomes of candidate therapies. However, these methods are based on clinical data sets that are biased in many ways. The main sources of bias are the evolving trends of treating HIV patients, the sparse, uneven therapy representation, the different treatment backgrounds of the clinical samples and the differing abundances of the various therapy-experience levels. In this thesis we focus on the problem of devising bias-aware statistical learning methods for HIV therapy screening -- predicting the effectiveness of HIV combination therapies. For this purpose we develop five novel approaches that when predicting outcomes of HIV therapies address the aforementioned biases in the clinical data sets. Three of the approaches aim for good prediction performance for every drug combination independent of its abundance in the HIV clinical data set. To achieve this, they balance the sparse and uneven therapy representation by using different routes of sharing common knowledge among related therapies. The remaining two approaches additionally account for the bias originating from the differing treatment histories of the samples making up the HIV clinical data sets. For this purpose, both methods predict the response of an HIV combination therapy by taking not only the most recent (target) therapy but also available information from preceding therapies into account. In this way they provide good predictions for advanced patients in mid to late stages of HIV treatment, and for rare drug combinations. All our methods use the time-oriented evaluation scenario, where models are trained on data from the less recent past while their performance is evaluated on data from the more recent past. This is the approach we adopt to account for the evolving treatment trends in the HIV clinical practice and thus offer a realistic model assessment.Das Humane Immundefizienz-Virus (HIV) ist der Erreger des erworbenen Immundefektsyndroms (AIDS), das fast 30 Millionen Menschen das Leben gekostet hat und wohl als eine der schlimmsten Seuchen in der Geschichte der Menschheit gelten kann. Da in absehbarer Zeit keine Heilung oder Impfung gegen diese Krankheit zu erwarten ist, werden HIV-Patienten durch die Verabreichung von Kombinationen von anti-retroviralen Medikamenten behandelt. Die sehr große Zahl solcher Kombinationen macht die manuelle Suche nach einer effektiven Therapie vor allem in fortgeschrittenen Stadien der Erkrankung praktisch unmöglich. Dieser Prozess der Therapieauswahl kann mit Hilfe statistischer Verfahren unterstützt werden, welche die Ergebnisse der Therapie vorherzusagen versuchen. Allerdings beruhen diese Methoden auf klinischen Datensätzen die verschiedene Biases enthalten. Die wichtigsten Quellen für Bias sind die sich entwickelnden Trends in der Behandlung von HIV-Patienten, die sparse, ungleichmäßige Repräsentation der Therapien, die verschiedenen Behandlungshintergründe der klinischen Proben sowie die variablen Häufigkeiten der Therapieerfahrungen. In dieser Arbeit konzentrieren wir uns auf die Aufgabe, Bias-bewusste statistische Lernverfahren für das HIV-Therapie Screening zu konzipieren und die Effektivität von HIV-Kombinationstherapien vorherzusagen. Zu diesem Zweck entwickeln wir fünf neue Ansätze, welche die erwähnten Biases in klinischen Datensätzen bei der Vorhersage von HIV-Therapien berücksichtigen. Drei dieser Ansätze zielen auf eine gute Vorhersageleistung für jede Medikamentenkombination unabhängig von deren Frequenz in den klinischen Daten. Um dies zu erreichen versuchen die Ansätze die sparsen und ungleichmäßig verteilten Therapie-Repräsentationen auszugleichen, indem sie Informationen über verwandte Therapien auf verschiedene Weise ausnutzen. Die verbleibenden zwei Ansätze berücksichtigen zudem den Bias, der von den verschiedenen Behandlungshintergründen der Proben in den klinischen Datensätzen herrührt. Zu diesem Zweck sagen die Methoden das Therapie-Ansprechen für HIV-Kombinationstherapien auf eine Weise vorher, die nicht nur die direkt vorhergehende Therapie berücksichtigt sondern auch auch Informationen über andere, zeitlich früher gelegene Therapien mit einbezieht. Auf diese Weise bieten die vorgestellten Ansätze gute Vorhersagen für fortgeschrittene Patienten im mittleren bis späten Stadium der HIV-Behandlung sowie für seltene Medikamentenkombinationen. Alle unsere Methoden verwenden ein zeitorientiertes Evaluierungsszenario, in dem Modelle auf Daten aus der entfernteren Vergangenheit trainiert werden, während ihre Vorhersageleistung auf Daten aus der jüngeren Vergangenheit ausgewertet werden. Dieser Ansatz wurde gewählt, um die entwickelnden Trends in der klinischen HIV-Behandlung zu berücksichtigen und damit eine realistische Bewertung der vorgestellten Modelle zu ermöglichen

    Combining Kernel and Model Based Learning for HIV Therapy Selection

    Get PDF
    We present a mixture-of-experts approach for HIV therapy selection. The heterogeneity in patient data makes it difficult for one particular model to succeed at providing suitable therapy predictions for all patients. An appropriate means for addressing this heterogeneity is through combining kernel and model-based techniques. These methods capture different kinds of information: kernel-based methods are able to identify clusters of similar patients, and work well when modelling the viral response for these groups. In contrast, model-based methods capture the sequential process of decision making, and are able to find simpler, yet accurate patterns in response for patients outside these groups. We take advantage of this information by proposing a mixture-of-experts model that automatically selects between the methods in order to assign the most appropriate therapy choice to an individual. Overall, we verify that therapy combinations proposed using this approach significantly outperform previous methods

    Improving Low-Resource Question Answering using Active Learning in Multiple Stages

    Full text link
    Neural approaches have become very popular in the domain of Question Answering, however they require a large amount of annotated data. Furthermore, they often yield very good performance but only in the domain they were trained on. In this work we propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low resource settings, where the target domains are diverse in terms of difficulty and similarity to the source domain. We also investigate Active Learning for question answering in different stages, overall reducing the annotation effort of humans. For this purpose, we consider target domains in realistic settings, with an extremely low amount of annotated samples but with many unlabeled documents, which we assume can be obtained with little effort. Additionally, we assume sufficient amount of labeled data from the source domain is available. We perform extensive experiments to find the best setup for incorporating domain experts. Our findings show that our novel approach, where humans are incorporated as early as possible in the process, boosts performance in the low-resource, domain-specific setting, allowing for low-labeling-effort question answering systems in new, specialized domains. They further demonstrate how human annotation affects the performance of QA depending on the stage it is performed.Comment: 16 pages, 8 figure

    Reinforced active learning for low-resource, domain-specific, multi-label text classification

    Get PDF
    Text classification datasets from specialised or technical domains are in high demand, especially in industrial applications. However, due to the high cost of annotation such datasets are usually expensive to create. While Active Learning (AL) can reduce the labeling cost, required AL strategies are often only tested on general knowledge domains and tend to use information sources that are not consistent across tasks. We propose Reinforced Active Learning (RAL) to train a Reinforcement Learning policy that utilizes many different aspects of the data and the task in order to select the most informative unlabeled subset dynamically over the course of the AL procedure. We demonstrate the superior performance of the proposed RAL framework compared to strong AL baselines across four intricate multi-class, multi-label text classification datasets taken from specialised domains. In addition, we experiment with a unique data augmentation approach to further reduce the number of samples RAL needs to annotate

    Evaluating pre-trained Sentence-BERT with class embeddings in active learning for multi-label text classification

    Get PDF
    The Transformer Language Model is a powerful tool that has been shown to excel at various NLP tasks and has become the de-facto standard solution thanks to its versatility. In this study, we employ pre-trained document embeddings in an Active Learning task to group samples with the same labels in the embedding space on a legal document corpus. We find that the calculated class embeddings are not close to the respective samples and consequently do not partition the embedding space in a meaningful way. In addition, we explore using the class embeddings as an Active Learning strategy with dramatically reduced results compared to all baselines

    Stability analysis of mixtures of mutagenetic trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mixture models of mutagenetic trees are evolutionary models that capture several pathways of ordered accumulation of genetic events observed in different subsets of patients. They were used to model HIV progression by accumulation of resistance mutations in the viral genome under drug pressure and cancer progression by accumulation of chromosomal aberrations in tumor cells. From the mixture models a genetic progression score (GPS) can be derived that estimates the genetic status of single patients according to the corresponding progression along the tree models. GPS values were shown to have predictive power for estimating drug resistance in HIV or the survival time in cancer. Still, the reliability of the exact values of such complex markers derived from graphical models can be questioned.</p> <p>Results</p> <p>In a simulation study, we analyzed various aspects of the stability of estimated mutagenetic trees mixture models. It turned out that the induced probabilistic distributions and the tree topologies are recovered with high precision by an EM-like learning algorithm. However, only for models with just one major model component, also GPS values of single patients can be reliably estimated.</p> <p>Conclusion</p> <p>It is encouraging that the estimation process of mutagenetic trees mixture models can be performed with high confidence regarding induced probability distributions and the general shape of the tree topologies. For a model with only one major disease progression process, even genetic progression scores for single patients can be reliably estimated. However, for models with more than one relevant component, alternative measures should be introduced for estimating the stage of disease progression.</p

    Laurent inversion

    Get PDF
    There are well-understood methods, going back to Givental and Hori--Vafa, that to a Fano toric complete intersection X associate a Laurent polynomial f that corresponds to X under mirror symmetry. We describe a technique for inverting this process, constructing the toric complete intersection X directly from its Laurent polynomial mirror f. We use this technique to construct a new four-dimensional Fano manifold

    Suggests Rgraphviz

    No full text
    Description Rtreemix is a package that offers an environment for estimating the mutagenetic trees mixture models from cross-sectional data and using them for various predictions. It includes functions for fitting the trees mixture models, likelihood computations, model comparisons, waiting time estimations, stability analysis, etc

    Statistische Lernverfahren für das Bias-bewusste HIV-Therapie Screening

    No full text
    The human immunodeficiency virus (HIV) is the causative agent of the acquired immunodeficiency syndrome (AIDS) which claimed nearly 30 million lives and is arguably among the worst plagues in human history. With no cure or vaccine in sight, HIV patients are treated by administration of combinations of antiretroviral drugs. The very large number of such combinations makes the manual search for an effective therapy practically impossible, especially in advanced stages of the disease. Therapy selection can be supported by statistical methods that predict the outcomes of candidate therapies. However, these methods are based on clinical data sets that are biased in many ways. The main sources of bias are the evolving trends of treating HIV patients, the sparse, uneven therapy representation, the different treatment backgrounds of the clinical samples and the differing abundances of the various therapy-experience levels. In this thesis we focus on the problem of devising bias-aware statistical learning methods for HIV therapy screening -- predicting the effectiveness of HIV combination therapies. For this purpose we develop five novel approaches that when predicting outcomes of HIV therapies address the aforementioned biases in the clinical data sets. Three of the approaches aim for good prediction performance for every drug combination independent of its abundance in the HIV clinical data set. To achieve this, they balance the sparse and uneven therapy representation by using different routes of sharing common knowledge among related therapies. The remaining two approaches additionally account for the bias originating from the differing treatment histories of the samples making up the HIV clinical data sets. For this purpose, both methods predict the response of an HIV combination therapy by taking not only the most recent (target) therapy but also available information from preceding therapies into account. In this way they provide good predictions for advanced patients in mid to late stages of HIV treatment, and for rare drug combinations. All our methods use the time-oriented evaluation scenario, where models are trained on data from the less recent past while their performance is evaluated on data from the more recent past. This is the approach we adopt to account for the evolving treatment trends in the HIV clinical practice and thus offer a realistic model assessment.Das Humane Immundefizienz-Virus (HIV) ist der Erreger des erworbenen Immundefektsyndroms (AIDS), das fast 30 Millionen Menschen das Leben gekostet hat und wohl als eine der schlimmsten Seuchen in der Geschichte der Menschheit gelten kann. Da in absehbarer Zeit keine Heilung oder Impfung gegen diese Krankheit zu erwarten ist, werden HIV-Patienten durch die Verabreichung von Kombinationen von anti-retroviralen Medikamenten behandelt. Die sehr große Zahl solcher Kombinationen macht die manuelle Suche nach einer effektiven Therapie vor allem in fortgeschrittenen Stadien der Erkrankung praktisch unmöglich. Dieser Prozess der Therapieauswahl kann mit Hilfe statistischer Verfahren unterstützt werden, welche die Ergebnisse der Therapie vorherzusagen versuchen. Allerdings beruhen diese Methoden auf klinischen Datensätzen die verschiedene Biases enthalten. Die wichtigsten Quellen für Bias sind die sich entwickelnden Trends in der Behandlung von HIV-Patienten, die sparse, ungleichmäßige Repräsentation der Therapien, die verschiedenen Behandlungshintergründe der klinischen Proben sowie die variablen Häufigkeiten der Therapieerfahrungen. In dieser Arbeit konzentrieren wir uns auf die Aufgabe, Bias-bewusste statistische Lernverfahren für das HIV-Therapie Screening zu konzipieren und die Effektivität von HIV-Kombinationstherapien vorherzusagen. Zu diesem Zweck entwickeln wir fünf neue Ansätze, welche die erwähnten Biases in klinischen Datensätzen bei der Vorhersage von HIV-Therapien berücksichtigen. Drei dieser Ansätze zielen auf eine gute Vorhersageleistung für jede Medikamentenkombination unabhängig von deren Frequenz in den klinischen Daten. Um dies zu erreichen versuchen die Ansätze die sparsen und ungleichmäßig verteilten Therapie-Repräsentationen auszugleichen, indem sie Informationen über verwandte Therapien auf verschiedene Weise ausnutzen. Die verbleibenden zwei Ansätze berücksichtigen zudem den Bias, der von den verschiedenen Behandlungshintergründen der Proben in den klinischen Datensätzen herrührt. Zu diesem Zweck sagen die Methoden das Therapie-Ansprechen für HIV-Kombinationstherapien auf eine Weise vorher, die nicht nur die direkt vorhergehende Therapie berücksichtigt sondern auch auch Informationen über andere, zeitlich früher gelegene Therapien mit einbezieht. Auf diese Weise bieten die vorgestellten Ansätze gute Vorhersagen für fortgeschrittene Patienten im mittleren bis späten Stadium der HIV-Behandlung sowie für seltene Medikamentenkombinationen. Alle unsere Methoden verwenden ein zeitorientiertes Evaluierungsszenario, in dem Modelle auf Daten aus der entfernteren Vergangenheit trainiert werden, während ihre Vorhersageleistung auf Daten aus der jüngeren Vergangenheit ausgewertet werden. Dieser Ansatz wurde gewählt, um die entwickelnden Trends in der klinischen HIV-Behandlung zu berücksichtigen und damit eine realistische Bewertung der vorgestellten Modelle zu ermöglichen
    corecore