38 research outputs found

    Characterization of Novel HIV Drug Resistance Mutations Using Clustering, Multidimensional Scaling and SVM-Based Feature Ranking

    Full text link
    We present a case study on the discovery of clinically relevant domain knowledge in the field of HIV drug resistance. Novel mutations in the HIV genome associated with treatment failure were identified by mining a relational clinical database. Hierarchical cluster analysis suggests that two of these mutations form a novel mutational complex, while all others are involved in known resistance-conferring evolutionary pathways. The clustering is shown to be highly stable in a bootstrap procedure. Multidimensional scaling in mutation space indicates that certain mutations can occur within multiple pathways. Feature ranking based on support vector machines and matched genotype-phenotype pairs comprehensively reproduces current domain knowledge. Moreover, it indicates a prominent role of novel mutations in determining phenotypic resistance and in resensitization effects. These effects may be exploited deliberately to reopen lost treatment options. Together, these findings provide valuable insight into the interpretation of genotypic resistance tests

    Genotypic analysis of HIV-1 coreceptor usage

    Get PDF
    The acquired immunodeficiency syndrome (AIDS) is one of the biggest medical challenges in the world today. Its causative pathogen, the human immunodeficiency virus (HIV), is responsible for millions of deaths per year. Although about two dozen antiviral drugs are currently available, progression of the disease can only be delayed but patients cannot be cured. In recent years, the new class of coreceptor antagonists has been added to the arsenal of antiretroviral drugs. These drugs block viral cell-entry by binding to one of the receptors the virus requires for infection of a cell. However, some HIV variants can also use another coreceptor so that coreceptor usage has to be tested before administration of the drug. This thesis analyzes the use of statistical learning methods to infer HIV coreceptor usage from viral genotype. Improvements over existing methods are achieved by using sequence information of so far not used genomic regions, next generation sequencing technologies, and by combining different existing prediction systems. In addition, HIV coreceptor usage prediction is analyzed with respect to clinical outcome in patients treated with coreceptor antagonists. The results demonstrate that inferring HIV coreceptor usage from viral genotype can be reliably used in daily routine.Die Immunschwächekrankheit AIDS ist eine der größten Herausforderungen weltweit. Das verursachende Humane Immundefizienz-Virus (HIV) ist verantwortlich für Millionen Tote jährlich. Obwohl es bereits mehr als zwei Dutzend verschiedene AIDS-Medikamente gibt, können diese den Krankheitsverlauf nur verlangsamen, die Patienten jedoch nicht heilen. In den letzten Jahren wurde eine weitere Medikamentenklasse den bestehenden Therapieansätzen hinzugefügt: die Korezeptorantagonisten. Diese Wirkstoffe binden an Rezeptoren, die das Virus zum Eintritt in die Zelle benötigt und blockieren es somit. Allerdings gibt es auch Virusvarianten, die in der Lage sind Zellen mit Hilfe eines anderen Rezeptors zu infizieren. Daher sollte man vor Verschreibung eines Korezeptorantagonisten den Korezeptorgebrauch des Virus testen. Diese Arbeit befasst sich mit der Bestimmung des Korezeptorgebrauchs aus dem viralen Erbgut mit Hilfe von statistischen Lernverfahren. Verbesserungen gegenüber existierenden Methoden werden erreicht in dem bisher nicht verwendete Genomregionen analysiert werden, durch den Gebrauch von neuesten Hochdurchsatz-Sequenziertechniken, sowie durch die Kombination von zwei existierenden Vorhersagesystemen. Schließlich wird die Qualität der Korezeptorvorhersagen bezüglich klinischem Ansprechens bei Patienten untersucht, die mit Korezeptorantagonisten therapiert wurden. Die Ergebnisse zeigen, dass die Vorhersage des Korezeptorgebrauchs aus dem viralen Erbgut eine verläßliche Methode für den klinischen Alltag darstellt

    Functional characterization of single amino acid variants

    Get PDF
    Single amino acid variants (SAVs) are one of the main causes of Mendelian disorders, and play an important role in the development of many complex diseases. At the same time, they are the most common kind of variation affecting coding DNA, without generally presenting any damaging effect. With the advent of next generation sequencing technologies, the detection of these variants in patients and the general population is easier than ever, but the characterization of the functional effects of each variant remains an open challenge. It is our objective in this work to tackle this problem by developing machine learning based in silico SAVs pathology predictors. Having the PMut classic predictor as a starting point, we have rethought the entire supervised learning pipeline, elaborating new training sets, features and classifiers. PMut2017 is the first result of these efforts, a new general-purpose predictor based on SwissVar and trained on 12 different conservation scores. Its performance, evaluated bothby cross-validation and different blind tests, was in line with the best predictors published to date. Continuing our efforts in search for more accurate predictors, especially for those cases were general predictors tend to fail, we developed PMut-S, a suite of 215 protein-specific predictors. Similar to PMut in nature, Pmut-S introduced the use of co-evolution conservation features and balanced training sets, and showed improved performance, specially for those proteins that were more commonly misclassified by PMut. Comparing PMut-S to other specific predictors we proved that it is possible to train specific predictors using a unique automated pipeline and match the results of most gene specific predictors released to date. The implementation of the machine learning pipeline of both PMut and PMut-S was released as an open source Python module: PyMut, which bundles functions implementing the features computation and selection, classifier training and evaluation, plots drawing, among others. Their predictions were also made available in a rich web portal, which includes a precomputed repository with analyses of more than 700 million variants on over 100,000 human proteins, together with relevant contextual information such as 3D visualizationsof protein structures, links to databases, functional annotations, and more.Les mutacions puntuals d’aminoàcids són la principal causa de moltes malalties mendelianes, i juguen un paper important en el desenvolupament de moltes malalties complexes. Alhora, són el tipus de variant més comuna que afecta l’ADN codificant de proteïnes, sense provocar, en general, cap efecte advers. Amb l’adveniment de la seqüenciació de nova generació, la detecció d’aquestes variants en pacients i en la població general és més fàcil que mai, però la caracterització dels efectes funcionals de cada variant segueix sent un repte. El nostre objectiu en aquest treball és abordar aquest problema desenvolupant predictors de patologia in silico basats en l’aprenentatge automàtic. Prenent el predictor clàssic PMut com a punt de partida, hem repensat tot el procés d’aprenentatge supervisat, elaborant nous conjunts d’entrenament, descriptors i classificadors. PMut2017 és el primer resultat d’aquests esforços, un nou predictor basat en SwissVar i entrenat amb 12 mètriques de conservació de seqüència. La seva precisió, mesurada mitjançant validació creuada i amb tests cecs, s’ha mostrar en línia amb els millors predictors publicats a dia d’avui. Continuant els nostres esforços en la cerca de predictors més acurats, hem desenvolupat PMut-S, un conjunt de 215 predictors específics per cada proteïna. Similar a PMut en la seva concepció, PMut-S introdueix l’ús de descriptors basats en la coevolució i conjunts d’entrenament balancejats, millorant el rendiment de PMut2017 en 0.1 punts del coeficient de correlació de Matthews. Comparant PMut-S a d’altres predictors específics hem provat que és possible entrenar predictors específics seguint un únic procediment automatitzat i assolir uns resultats tan bon com els de la majoria de predictors específics publicats. La implementació del procediment d’aprenentatge automàtic tant de PMut com de PMut-S ha sigut publicat com a un mòdul de Python de codi obert: PyMut, el qual inclou les funcions que implementen el càlcul dels descriptors i la seva selecció, l’entrenament i avaluació dels classificadors, el dibuix de diverses gràfiques... Les prediccions també estan disponibles en un portal web que inclou un repositori precalculat amb els anàlisis de més de 700 milions de variants en més de 100 mil proteïnes humanes, junt a rellevant informació de context com visualitzacions 3D de les proteïnes, enllaços a bases de dades, anotacions funcionals i molt més

    The role of visual adaptation in cichlid fish speciation

    Get PDF
    D. Shane Wright (1) , Ole Seehausen (2), Ton G.G. Groothuis (1), Martine E. Maan (1) (1) University of Groningen; GELIFES; EGDB(2) Department of Fish Ecology & Evolution, EAWAG Centre for Ecology, Evolution and Biogeochemistry, Kastanienbaum AND Institute of Ecology and Evolution, Aquatic Ecology, University of Bern.In less than 15,000 years, Lake Victoria cichlid fishes have radiated into as many as 500 different species. Ecological and sexual sel ection are thought to contribute to this ongoing speciation process, but genetic differentiation remains low. However, recent work in visual pigment genes, opsins, has shown more diversity. Unlike neighboring Lakes Malawi and Tanganyika, Lake Victoria is highly turbid, resulting in a long wavelength shift in the light spectrum with increasing depth, providing an environmental gradient for exploring divergent coevolution in sensory systems and colour signals via sensory drive. Pundamilia pundamila and Pundamilia nyererei are two sympatric species found at rocky islands across southern portions of Lake Victoria, differing in male colouration and the depth they reside. Previous work has shown species differentiation in colour discrimination, corresponding to divergent female preferences for conspecific male colouration. A mechanistic link between colour vision and preference would provide a rapid route to reproductive isolation between divergently adapting populations. This link is tested by experimental manip ulation of colour vision - raising both species and their hybrids under light conditions mimicking shallow and deep habitats. We quantify the expression of retinal opsins and test behaviours important for speciation: mate choice, habitat preference, and fo raging performance

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Network biology methods for functional characterization and integrative prioritization of disease genes and proteins

    Get PDF
    Nowadays, large amounts of experimental data have been produced by high-throughput techniques, in order to provide more insight into complex phenotypes and cellular processes. The development of a variety of computational and, in particular, network-based approaches to analyze these data have already shed light on previously unknown mechanisms. However, we are still far from a comprehensive understanding of human diseases and their causes as well as appropriate preventive measures and successful therapies. This thesis describes the development of methods and user-friendly software tools for the integrative analysis and interactive visualization of biological networks as well as their application to biomedical data for understanding diseases. We design an integrative phenotype-specific framework for prioritizing candidate disease genes and functionally characterizing similar phenotypes. It is applied to the identification of several disease-relevant genes and processes for inflammatory bowel diseases and primary sclerosing cholangitis as well as for Parkinson's disease. Since finding the causative disease genes does often not suffice to understand diseases, we also concentrate on the molecular characterization of sequence mutations and their effect on protein structure and function. We develop a software suite to support the interactive, multi-layered visual analysis of molecular interaction mechanisms such as protein binding, allostery and drug resistance. To capture the dynamic nature of proteins, we also devise an approach to visualizing and analyzing ensembles of protein structures as, for example, generated by molecular dynamics simulations.In den letzten Jahren wurde mittels Hochdurchsatzverfahren eine große Menge experimenteller Daten generiert, um einen Einblick in komplexe Phänotypen und zelluläre Prozesse zu ermöglichen. Die Entwicklung von verschiedenen bioinformatischen und insbesondere netzwerkbasierten Ansätzen zur Analyse dieser Daten konnte bereits Aufschluss über bisher unbekannte Mechanismen geben. Dennoch sind wir weit entfernt von einem umfassenden Verständnis menschlicher Krankheiten und ihrer Ursachen sowie geeigneter präventiver Maßnahmen und erfolgreicher Therapien. Diese Dissertation beschreibt die Entwicklung von Methoden und benutzerfreundlichen Softwarewerkzeugen für die integrative Analyse und interaktive Visualisierung biologischer Netzwerke sowie ihre Anwendung auf biomedizinische Daten zum Verständnis von http://scidok.sulb.uni-saarland.de/volltexte/incoming/2016/6595/Krankheiten. Wir entwerfen ein integratives, phänotypspezifisches Framework für die Priorisierung potentiell krankheitserregender Gene und die funktionelle Charakterisierung ähnlicher Phänotypen. Es wird angewandt, um mehrere krankheitsspezifische Gene und Prozesse von chronisch-entzündlichen Darmerkrankungen und primär sklerosierender Cholangitis sowie von Parkinson zu bestimmen. Da es für das Verständnis von Krankheiten oft nicht genügt, die krankheitserregenden Gene zu entdecken, konzentrieren wir uns auch auf die molekulare Charakterisierung von Sequenzmutationen und ihren Effekt auf die Proteinstruktur und -funktion. Wir entwickeln eine Software, um die interaktive, vielschichtige visuelle Analyse von molekularen Mechanismen wie Proteinfaltung, Allosterie und Arzneimittelresistenz zu unterstützen. Um den dynamischen Charakter von Proteinen zu erfassen, ersinnen wir auch eine Methode für die Visualisierung und Analyse von Proteinstrukturen, welche sich zum Beispiel während Molekulardynamiksimulationen ergeben

    On quantitative issues pertaining to the detection of epistatic genetic architectures

    Get PDF
    Converging empirical evidence portrays epistasis (i.e., gene-gene interaction) as a ubiquitous property of genetic architectures and protagonist in complex trait variability. While researchers employ sophisticated technologies to detect epistasis, the scarcity of robust instances of detection in human populations is striking. To evaluate the empirical issues pertaining to epistatic detection, we analytically characterize the statistical detection problem and elucidate two candidate explanations. The first examines whether population-level manifestations of epistasis arising in nature are small; consequently, for sample-sizes employed in research, the power delivered by detectors may be disadvantageously small. The second considers whether gene-environmental association generates bias in estimates of genotypic values diminishing the power of detection. By simulation study, we adjudicate the merits of both explanations and the power to detect epistasis under four digenic architectures. In agreement with both explanations, our findings implicate small epistatic effect-sizes and gene-environmental association as mechanisms that obscure the detection of epistasis
    corecore