86 research outputs found

    Machine learning and applications in microbiology

    Full text link
    To understand the intricacies of microorganisms at the molecular level requires making sense of copious volumes of data such that it may now be humanly impossible to detect insightful data patterns without an artificial intelligence application called machine learning. Applying machine learning to address biological problems is expected to grow at an unprecedented rate, yet it is perceived by the uninitiated as a mysterious and daunting entity entrusted to the domain of mathematicians and computer scientists. The aim of this review is to identify key points required to start the journey of becoming an effective machine learning practitioner. These key points are further reinforced with an evaluation of how machine learning has been applied so far in a broad scope of real-life microbiology examples. This includes predicting drug targets or vaccine candidates, diagnosing microorganisms causing infectious diseases, classifying drug resistance against antimicrobial medicines, predicting disease outbreaks and exploring microbial interactions. Our hope is to inspire microbiologists and other related researchers to join the emerging machine learning revolution

    Predicting and analyzing HIV-1 adaptation to broadly neutralizing antibodies and the host immune system using machine learning

    Get PDF
    Thanks to its extraordinarily high mutation and replication rate, the human immunodeficiency virus type 1 (HIV-1) is able to rapidly adapt to the selection pressure imposed by the host immune system or antiretroviral drug exposure. With neither a cure nor a vaccine at hand, viral control is a major pillar in the combat of the HIV-1 pandemic. Without drug exposure, interindividual differences in viral control are partly influenced by host genetic factors like the human leukocyte antigen (HLA) system, and viral genetic factors like the predominant coreceptor usage of the virus. Thus, a close monitoring of the viral population within the patients and adjustments in the treatment regimens, as well as a continuous development of new drug components are indispensable measures to counteract the emergence of viral escape variants. To this end, a fast and accurate determination of the viral adaptation is essential for a successful treatment. This thesis is based upon four studies that aim to develop and apply statistical learning methods to (i) predict adaptation of the virus to broadly neutralizing antibodies (bNAbs), a promising new treatment option, (ii) advance antibody-mediated immunotherapy for clinical usage, and (iii) predict viral adaptation to the HLA system to further understand the switch in HIV-1 coreceptor usage. In total, this thesis comprises several statistical learning approaches to predict HIV-1 adaptation, thereby, enabling a better control of HIV-1 infections.Dank seiner außergewöhnlich hohen Mutations- und Replikationsrate ist das humane Immundefizienzvirus Typ 1 (HIV-1) in der Lage sich schnell an den vom Immunsystem des Wirtes oder durch die antiretrovirale Arzneimittelexposition ausgeübten Selektionsdruck anzupassen. Da weder ein Heilmittel noch ein Impfstoff verfügbar sind, ist die Viruskontrolle eine wichtige Säule im Kampf gegen die HIV-1-Pandemie. Ohne Arzneimittelexposition werden interindividuelle Unterschiede in der Viruskontrolle teilweise durch genetische Faktoren des Wirts wie das humane Leukozytenantigensystem (HLA) und virale genetische Faktoren wie die vorherrschende Korezeptornutzung des Virus beeinflusst. Eine genaue Überwachung der Viruspopulation innerhalb des Patienten, gegebenfalls Anpassungen der Behandlungsschemata sowie eine kontinuierliche Entwicklung neuer Wirkstoffkomponenten sind daher unerlässliche Maßnahmen, um dem Auftreten viraler Fluchtvarianten entgegenzuwirken. Für eine erfolgreiche Behandlung ist eine schnelle und genaue Bestimmung der Anpassung einer Variante essentiell. Die Thesis basiert auf vier Studien, deren Ziel es ist statistische Lernverfahren zu entwickeln und anzuwenden, um (1) die Anpassung von HIV-1 an breit neutralisierende Antikörper, eine neuartige vielversprechende Therapieoption, vorherzusagen, (2) den Einsatz von Antikörper-basierte Immuntherapien für den klinischen Einsatz voranzutreiben, und (3) die virale Anpassung von HIV-1 an das HLA-System vorherzusagen, um den Wechsel der HIV-1 Korezeptornutzung besser zu verstehen. Zusammenfassend umfasst diese Thesis mehrere statistische Lernverfahrenansätze, um HIV Anpassung vorherzusagen, wodurch eine bessere Kontrolle von HIV-1 Infektionen ermöglicht wird

    Validation of resistome signatures through the application of a machine learning prediction algorithm on metagenomic data

    Get PDF
    Dissertação de Mestrado Integrado em Medicina Veterinária, área científica de Sanidade AnimalABSTRACT- Metagenomic data has been increasingly used in antimicrobial resistance (AMR) studies, but there is still a need for accurate and reliable methods for predicting the relative attribution of AMR determinants to different animal reservoirs. AMR data availability has increased exponentially over the past few years, as has global awareness of the threat that AMR poses to public health, often known as the silent pandemic. This has led to an upsurge in interest in applying machine learning to AMR data. In this study, shot-gun sequences were used from fecal samples of pigs, broilers, turkeys, and veal calves, previously collected during national cross-sectional studies across Europe. The data used in this study corresponded to these samples and their associated relative abundance of AMR determinants. A random forest (RF) model was developed to investigate the relative attribution of AMR determinants to those different reservoirs. Additionally, a descriptive analysis was made to further investigate the 15 most important variables for the RF model. A principal component analysis (PCA) and all-subsets regression were performed to identify reservoir-specific AMR determinants. Ultimately, the reservoir-specific AMR determinants identified here were compared with the resistome signatures identified in a previous study. The results demonstrated that the RF model successfully classified resistomes into corresponding reservoir classes, with high accuracy and reliability. The RF model had more difficulty differentiating pig from veal and broiler from turkey, indicating the similarity of resistome composition between each of these two species. The analyses validated several AMR determinants as resistome signatures of specific animal reservoirs, such as tet(40) and sul2 of veal, tet(Q), mef(A) and cfxA2 of veal and pig, blaTEM-126 of broiler, and tet(A) of broiler and turkey. This study describes a reliable and accurate method for the relative attribution of AMR determinants to different animal reservoirs using metagenomic data. Such results are essential for effective surveillance and control of AMR in animal and human populationsRESUMO - Validação de resistome-signatures através da aplicação de um algoritmo de previsão de machine learning em dados metagenómicos - Dados metagenómicos têm sido cada vez mais usados em estudos de resistência aos antimicrobianos, mas ainda há uma escassez de métodos precisos e fidedignos para prever a atribuição relativa de genes de resistência a diferentes espécies animais. A disponibilidade de dados de resistência aos antimicrobianos aumentou exponencialmente nos últimos anos, assim como a consciencialização global sobre a ameaça que as resistências representam para a saúde pública, geralmente conhecida como pandemia silenciosa. Isto levou a um aumento no interesse em aplicar métodos de machine learning a esses dados. Neste estudo, sequências shot-gun foram usadas a partir de amostras fecais de porcos, frangos, perús e vitelos, recolhidas anteriormente durante estudos nacionais por toda a Europa. Os dados utilizados neste estudo corresponderam a essas amostras e os seus valores FPKM associados. Um modelo de random forest (RF) foi desenvolvido para prever a atribuição relativa de gene de resistência para essas diferentes espécies. Além disso, uma análise descritiva foi feita para investigar melhor as 15 variáveis mais importantes para o modelo de RF. Uma análise de componentes principais (PCA) e regressão all-subsets foram realizadas para identificar genes de resistência específicos de certas espécies. Por fim, esses genes específicos aqui identificados foram comparados com os resistome-signatures identificados num estudo anterior. Os nossos resultados demonstraram que o modelo classificou com sucesso as amostras em classes de espécies correspondentes, com alta precisão e confiabilidade. O modelo teve mais dificuldade em diferenciar porco de vitela, e frango de perú, indicando uma semelhança da composição do resistoma entre cada uma dessas duas espécies. Esta análise validou vários genes como resistome-signatures de animais específicos, como tet(40) e sul2 de vitelos, tet(Q), mef(A) e cfxA2 de vitelos e porcos, blaTEM-126 de frangos, e tet(A) de frangos e perús. Este estudo descreve um método confiável e preciso para a atribuição relativa de genes de resistência a diferentes reservatórios animais usando dados metagenómicos. Estes resultados são essenciais para a vigilância e controlo das resistências aos antimicrobianos em populações animais e humanasN/

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Aggregation of biological knowledge for immunological and virological applications

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Pacific Symposium on Biocomputing 2023

    Get PDF
    The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field

    Enabling cardiovascular multimodal, high dimensional, integrative analytics

    Get PDF
    While traditionally the understanding of cardiovascular morbidity relied on the acquisition and interpretation of health data, the advances in health technologies has enabled us to collect far larger amount of health data. This thesis explores the application of advanced analytics that utilise powerful mechanisms for integrating health data across different modalities and dimensions into a single and holistic environment to better understand different diseases, with a focus on cardiovascular conditions. Different statistical methodologies are applied across a number of case studies supported by a novel methodology to integrate and simplify data collection. The work culminates in the different dataset modalities explaining different effects on morbidity: blood biomarkers, electrocardiogram recordings, RNA-Seq measurements, and different population effects piece together the understanding of a person morbidity. More specifically, explainable artificial intelligence methods were employed on structured datasets from patients with atrial fibrillation to improve the screening for the disease. Omics datasets, including RNA-sequencing and genotype datasets, were examined and new biomarkers were discovered allowing a better understanding of atrial fibrillation. Electrocardiogram signal data were used to assess the early risk prediction of heart failure, enabling clinicians to use this novel approach to estimate future incidences. Population-level data were applied to the identification of associations and temporal trajectory of diseases to better understand disease dependencies in different clinical cohorts

    Previsão e análise da estrutura e dinâmica de redes biológicas

    Get PDF
    Increasing knowledge about the biological processes that govern the dynamics of living organisms has fostered a better understanding of the origin of many diseases as well as the identification of potential therapeutic targets. Biological systems can be modeled through biological networks, allowing to apply and explore methods of graph theory in their investigation and characterization. This work had as main motivation the inference of patterns and rules that underlie the organization of biological networks. Through the integration of different types of data, such as gene expression, interaction between proteins and other biomedical concepts, computational methods have been developed so that they can be used to predict and study diseases. The first contribution, was the characterization a subsystem of the human protein interactome through the topological properties of the networks that model it. As a second contribution, an unsupervised method using biological criteria and network topology was used to improve the understanding of the genetic mechanisms and risk factors of a disease through co-expression networks. As a third contribution, a methodology was developed to remove noise (denoise) in protein networks, to obtain more accurate models, using the network topology. As a fourth contribution, a supervised methodology was proposed to model the protein interactome dynamics, using exclusively the topology of protein interactions networks that are part of the dynamic model of the system. The proposed methodologies contribute to the creation of more precise, static and dynamic biological models through the identification and use of topological patterns of protein interaction networks, which can be used to predict and study diseases.O conhecimento crescente sobre os processos biológicos que regem a dinâmica dos organismos vivos tem potenciado uma melhor compreensão da origem de muitas doenças, assim como a identificação de potenciais alvos terapêuticos. Os sistemas biológicos podem ser modelados através de redes biológicas, permitindo aplicar e explorar métodos da teoria de grafos na sua investigação e caracterização. Este trabalho teve como principal motivação a inferência de padrões e de regras que estão subjacentes à organização de redes biológicas. Através da integração de diferentes tipos de dados, como a expressão de genes, interação entre proteínas e outros conceitos biomédicos, foram desenvolvidos métodos computacionais, para que possam ser usados na previsão e no estudo de doenças. Como primeira contribuição, foi proposto um método de caracterização de um subsistema do interactoma de proteínas humano através das propriedades topológicas das redes que o modelam. Como segunda contribuição, foi utilizado um método não supervisionado que utiliza critérios biológicos e topologia de redes para, através de redes de co-expressão, melhorar a compreensão dos mecanismos genéticos e dos fatores de risco de uma doença. Como terceira contribuição, foi desenvolvida uma metodologia para remover ruído (denoise) em redes de proteínas, para obter modelos mais precisos, utilizando a topologia das redes. Como quarta contribuição, propôs-se uma metodologia supervisionada para modelar a dinâmica do interactoma de proteínas, usando exclusivamente a topologia das redes de interação de proteínas que fazem parte do modelo dinâmico do sistema. As metodologias propostas contribuem para a criação de modelos biológicos, estáticos e dinâmicos, mais precisos, através da identificação e uso de padrões topológicos das redes de interação de proteínas, que podem ser usados na previsão e no estudo doenças.Programa Doutoral em Engenharia Informátic

    2013 Annual Research Symposium Abstract Book

    Get PDF
    2013 annual volume of abstracts for science research projects conducted by students at Trinity College
    corecore