Search CORE

65 research outputs found

Statistical learning methods for bias-aware HIV therapy screening

Author: Bogojeska Jasmina
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2011
Field of study

The human immunodeficiency virus (HIV) is the causative agent of the acquired immunodeficiency syndrome (AIDS) which claimed nearly 30 million lives and is arguably among the worst plagues in human history. With no cure or vaccine in sight, HIV patients are treated by administration of combinations of antiretroviral drugs. The very large number of such combinations makes the manual search for an effective therapy practically impossible, especially in advanced stages of the disease. Therapy selection can be supported by statistical methods that predict the outcomes of candidate therapies. However, these methods are based on clinical data sets that are biased in many ways. The main sources of bias are the evolving trends of treating HIV patients, the sparse, uneven therapy representation, the different treatment backgrounds of the clinical samples and the differing abundances of the various therapy-experience levels. In this thesis we focus on the problem of devising bias-aware statistical learning methods for HIV therapy screening -- predicting the effectiveness of HIV combination therapies. For this purpose we develop five novel approaches that when predicting outcomes of HIV therapies address the aforementioned biases in the clinical data sets. Three of the approaches aim for good prediction performance for every drug combination independent of its abundance in the HIV clinical data set. To achieve this, they balance the sparse and uneven therapy representation by using different routes of sharing common knowledge among related therapies. The remaining two approaches additionally account for the bias originating from the differing treatment histories of the samples making up the HIV clinical data sets. For this purpose, both methods predict the response of an HIV combination therapy by taking not only the most recent (target) therapy but also available information from preceding therapies into account. In this way they provide good predictions for advanced patients in mid to late stages of HIV treatment, and for rare drug combinations. All our methods use the time-oriented evaluation scenario, where models are trained on data from the less recent past while their performance is evaluated on data from the more recent past. This is the approach we adopt to account for the evolving treatment trends in the HIV clinical practice and thus offer a realistic model assessment.Das Humane Immundefizienz-Virus (HIV) ist der Erreger des erworbenen Immundefektsyndroms (AIDS), das fast 30 Millionen Menschen das Leben gekostet hat und wohl als eine der schlimmsten Seuchen in der Geschichte der Menschheit gelten kann. Da in absehbarer Zeit keine Heilung oder Impfung gegen diese Krankheit zu erwarten ist, werden HIV-Patienten durch die Verabreichung von Kombinationen von anti-retroviralen Medikamenten behandelt. Die sehr große Zahl solcher Kombinationen macht die manuelle Suche nach einer effektiven Therapie vor allem in fortgeschrittenen Stadien der Erkrankung praktisch unmöglich. Dieser Prozess der Therapieauswahl kann mit Hilfe statistischer Verfahren unterstützt werden, welche die Ergebnisse der Therapie vorherzusagen versuchen. Allerdings beruhen diese Methoden auf klinischen Datensätzen die verschiedene Biases enthalten. Die wichtigsten Quellen für Bias sind die sich entwickelnden Trends in der Behandlung von HIV-Patienten, die sparse, ungleichmäßige Repräsentation der Therapien, die verschiedenen Behandlungshintergründe der klinischen Proben sowie die variablen Häufigkeiten der Therapieerfahrungen. In dieser Arbeit konzentrieren wir uns auf die Aufgabe, Bias-bewusste statistische Lernverfahren für das HIV-Therapie Screening zu konzipieren und die Effektivität von HIV-Kombinationstherapien vorherzusagen. Zu diesem Zweck entwickeln wir fünf neue Ansätze, welche die erwähnten Biases in klinischen Datensätzen bei der Vorhersage von HIV-Therapien berücksichtigen. Drei dieser Ansätze zielen auf eine gute Vorhersageleistung für jede Medikamentenkombination unabhängig von deren Frequenz in den klinischen Daten. Um dies zu erreichen versuchen die Ansätze die sparsen und ungleichmäßig verteilten Therapie-Repräsentationen auszugleichen, indem sie Informationen über verwandte Therapien auf verschiedene Weise ausnutzen. Die verbleibenden zwei Ansätze berücksichtigen zudem den Bias, der von den verschiedenen Behandlungshintergründen der Proben in den klinischen Datensätzen herrührt. Zu diesem Zweck sagen die Methoden das Therapie-Ansprechen für HIV-Kombinationstherapien auf eine Weise vorher, die nicht nur die direkt vorhergehende Therapie berücksichtigt sondern auch auch Informationen über andere, zeitlich früher gelegene Therapien mit einbezieht. Auf diese Weise bieten die vorgestellten Ansätze gute Vorhersagen für fortgeschrittene Patienten im mittleren bis späten Stadium der HIV-Behandlung sowie für seltene Medikamentenkombinationen. Alle unsere Methoden verwenden ein zeitorientiertes Evaluierungsszenario, in dem Modelle auf Daten aus der entfernteren Vergangenheit trainiert werden, während ihre Vorhersageleistung auf Daten aus der jüngeren Vergangenheit ausgewertet werden. Dieser Ansatz wurde gewählt, um die entwickelnden Trends in der klinischen HIV-Behandlung zu berücksichtigen und damit eine realistische Bewertung der vorgestellten Modelle zu ermöglichen

Universaar

MPG.PuRe

Acronym

Computer Vision Problems in 3D Plant Phenotyping

Author: Aleksandra Bogojeska (590009)
Kire Trivodaliev (590008)
Ljupco Kocarev (577406)
Publication venue: Scholarship@Western
Publication date: 24/08/2017
Field of study

In recent years, there has been significant progress in Computer Vision based plant phenotyping (quantitative analysis of biological properties of plants) technologies. Traditional methods of plant phenotyping are destructive, manual and error prone. Due to non-invasiveness and non-contact properties as well as increased accuracy, imaging techniques are becoming state-of-the-art in plant phenotyping. Among several parameters of plant phenotyping, growth analysis is very important for biological inference. Automating the growth analysis can result in accelerating the throughput in crop production. This thesis contributes to the automation of plant growth analysis. First, we present a novel system for automated and non-invasive/non-contact plant growth measurement. We exploit the recent advancements of sophisticated robotic technologies and near infrared laser scanners to build a 3D imaging system and use state-of-the-art Computer Vision algorithms to fully automate growth measurement. We have set up a gantry robot system having 7 degrees of freedom hanging from the roof of a growth chamber. The payload is a range scanner, which can measure dense depth maps (raw 3D coordinate points in mm) on the surface of an object (the plant). The scanner can be moved around the plant to scan from different viewpoints by programming the robot with a specific trajectory. The sequence of overlapping images can be aligned to obtain a full 3D structure of the plant in raw point cloud format, which can be triangulated to obtain a smooth surface (triangular mesh), enclosing the original plant. We show the capability of the system to capture the well known diurnal pattern of plant growth computed from the surface area and volume of the plant meshes for a number of plant species. Second, we propose a technique to detect branch junctions in plant point cloud data. We demonstrate that using these junctions as feature points, the correspondence estimation can be formulated as a subgraph matching problem, and better matching results than state-of-the-art can be achieved. Also, this idea removes the requirement of a priori knowledge about rotational angles between adjacent scanning viewpoints imposed by the original registration algorithm for complex plant data. Before, this angle information had to be approximately known. Third, we present an algorithm to classify partially occluded leaves by their contours. In general, partial contour matching is a NP-hard problem. We propose a suboptimal matching solution and show that our method outperforms state-of-the-art on 3 public leaf datasets. We anticipate using this algorithm to track growing segmented leaves in our plant range data, even when a leaf becomes partially occluded by other plant matter over time. Finally, we perform some experiments to demonstrate the capability and limitations of the system and highlight the future research directions for Computer Vision based plant phenotyping

Scholarship@Western

FigShare

Improving Low-Resource Question Answering using Active Learning in Multiple Stages

Author: Bartezzaghi Andrea
Bogojeska Jasmina
Malossi A. Cristiano I.
Schmidt Maximilian
Vu Thang
Publication venue
Publication date: 27/11/2022
Field of study

Neural approaches have become very popular in the domain of Question Answering, however they require a large amount of annotated data. Furthermore, they often yield very good performance but only in the domain they were trained on. In this work we propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low resource settings, where the target domains are diverse in terms of difficulty and similarity to the source domain. We also investigate Active Learning for question answering in different stages, overall reducing the annotation effort of humans. For this purpose, we consider target domains in realistic settings, with an extremely low amount of annotated samples but with many unlabeled documents, which we assume can be obtained with little effort. Additionally, we assume sufficient amount of labeled data from the source domain is available. We perform extensive experiments to find the best setup for incorporating domain experts. Our findings show that our novel approach, where humans are incorporated as early as possible in the process, boosts performance in the low-resource, domain-specific setting, allowing for low-labeling-effort question answering systems in new, specialized domains. They further demonstrate how human annotation affects the performance of QA depending on the stage it is performed.Comment: 16 pages, 8 figure

arXiv.org e-Print Archive

Combining Kernel and Model Based Learning for HIV Therapy Selection

Author: Parbhoo Sonali
Bogojeska Jasmina
Zazzi Maurizio
Roth Volker
Doshi-Velez Finale
Publication venue: 'Dinamia''cet-IUL'
Publication date: 01/09/1975
Field of study

We present a mixture-of-experts approach for HIV therapy selection. The heterogeneity in patient data makes it difficult for one particular model to succeed at providing suitable therapy predictions for all patients. An appropriate means for addressing this heterogeneity is through combining kernel and model-based techniques. These methods capture different kinds of information: kernel-based methods are able to identify clusters of similar patients, and work well when modelling the viral response for these groups. In contrast, model-based methods capture the sequential process of decision making, and are able to find simpler, yet accurate patterns in response for patients outside these groups. We take advantage of this information by proposing a mixture-of-experts model that automatically selects between the methods in order to assign the most appropriate therapy choice to an individual. Overall, we verify that therapy combinations proposed using this approach significantly outperform previous methods

edoc

Activos Digitales IAPH

Reinforced active learning for low-resource, domain-specific, multi-label text classification

Author: Bogojeska Jasmina
Kuhn Jonas
Mirylenka Katsiaryna
Wertz Lukas
Publication venue: Association for Computational Linguistics (ACL)
Publication date: 01/07/2023
Field of study

Text classification datasets from specialised or technical domains are in high demand, especially in industrial applications. However, due to the high cost of annotation such datasets are usually expensive to create. While Active Learning (AL) can reduce the labeling cost, required AL strategies are often only tested on general knowledge domains and tend to use information sources that are not consistent across tasks. We propose Reinforced Active Learning (RAL) to train a Reinforcement Learning policy that utilizes many different aspects of the data and the task in order to select the most informative unlabeled subset dynamically over the course of the AL procedure. We demonstrate the superior performance of the proposed RAL framework compared to strong AL baselines across four intricate multi-class, multi-label text classification datasets taken from specialised domains. In addition, we experiment with a unique data augmentation approach to further reduce the number of samples RAL needs to annotate

ZHAW digitalcollection

Evaluating pre-trained Sentence-BERT with class embeddings in active learning for multi-label text classification

Author: Bogojeska Jasmina
Kuhn Jonas
Mirylenka Katsiaryna
Wertz Lukas
Publication venue: Association for Computational Linguistics
Publication date: 01/11/2022
Field of study

The Transformer Language Model is a powerful tool that has been shown to excel at various NLP tasks and has become the de-facto standard solution thanks to its versatility. In this study, we employ pre-trained document embeddings in an Active Learning task to group samples with the same labels in the embedding space on a legal document corpus. We find that the calculated class embeddings are not close to the respective samples and consequently do not partition the embedding space in a meaningful way. In addition, we explore using the class embeddings as an Active Learning strategy with dramatically reduced results compared to all baselines

ZHAW digitalcollection

Stability analysis of mixtures of mutagenetic trees

Author: A Cayley
AP Dempster
B Efron
BA Larder
BA Larder
CA Boucher
H Prüfer
HW Kuhn
J Edmonds
J Rahnenführer
J Yin
Jasmina Bogojeska
Jörg Rahnenführer
N Beerenwinkel
R Desper
S Rhee
T Hastie
Thomas Lengauer
TV Allen
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Mixture models of mutagenetic trees are evolutionary models that capture several pathways of ordered accumulation of genetic events observed in different subsets of patients. They were used to model HIV progression by accumulation of resistance mutations in the viral genome under drug pressure and cancer progression by accumulation of chromosomal aberrations in tumor cells. From the mixture models a genetic progression score (GPS) can be derived that estimates the genetic status of single patients according to the corresponding progression along the tree models. GPS values were shown to have predictive power for estimating drug resistance in HIV or the survival time in cancer. Still, the reliability of the exact values of such complex markers derived from graphical models can be questioned. Results In a simulation study, we analyzed various aspects of the stability of estimated mutagenetic trees mixture models. It turned out that the induced probabilistic distributions and the tree topologies are recovered with high precision by an EM-like learning algorithm. However, only for models with just one major model component, also GPS values of single patients can be reliably estimated. Conclusion It is encouraging that the estimation process of mutagenetic trees mixture models can be performed with high confidence regarding induced probability distributions and the general shape of the tree topologies. For a model with only one major disease progression process, even genetic progression scores for single patients can be reliably estimated. However, for models with more than one relevant component, alternative measures should be introduced for estimating the stage of disease progression.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Laurent inversion

There are well-understood methods, going back to Givental and Hori--Vafa, that to a Fano toric complete intersection X associate a Laurent polynomial f that corresponds to X under mirror symmetry. We describe a technique for inverting this process, constructing the toric complete intersection X directly from its Laurent polynomial mirror f. We use this technique to construct a new four-dimensional Fano manifold

Infoscience - École polytechnique fédérale de Lausanne

Serveur académique lausannois

edoc

Spiral - Imperial College Digital Repository

Bern Open Repository and Information System (BORIS)

Archive ouverte UNIGE

Suggests Rgraphviz

Author: Jasmina Bogojeska
Maintainer Jasmina Bogojeska
Publication venue
Publication date
Field of study

Description Rtreemix is a package that offers an environment for estimating the mutagenetic trees mixture models from cross-sectional data and using them for various predictions. It includes functions for fitting the trees mixture models, likelihood computations, model comparisons, waiting time estimations, stability analysis, etc

CiteSeerX

Statistische Lernverfahren für das Bias-bewusste HIV-Therapie Screening

Author: Bogojeska Jasmina
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 12/01/2012
Field of study

Acronym