9,060 research outputs found

    Protein Networks as Logic Functions in Development and Cancer

    Get PDF
    Many biological and clinical outcomes are based not on single proteins, but on modules of proteins embedded in protein networks. A fundamental question is how the proteins within each module contribute to the overall module activity. Here, we study the modules underlying three representative biological programs related to tissue development, breast cancer metastasis, or progression of brain cancer, respectively. For each case we apply a new method, called Network-Guided Forests, to identify predictive modules together with logic functions which tie the activity of each module to the activity of its component genes. The resulting modules implement a diverse repertoire of decision logic which cannot be captured using the simple approximations suggested in previous work such as gene summation or subtraction. We show that in cancer, certain combinations of oncogenes and tumor suppressors exert competing forces on the system, suggesting that medical genetics should move beyond cataloguing individual cancer genes to cataloguing their combinatorial logic

    Phenotypic landscape inference reveals multiple evolutionary paths to C4_4 photosynthesis

    Get PDF
    C4_4 photosynthesis has independently evolved from the ancestral C3_3 pathway in at least 60 plant lineages, but, as with other complex traits, how it evolved is unclear. Here we show that the polyphyletic appearance of C4_4 photosynthesis is associated with diverse and flexible evolutionary paths that group into four major trajectories. We conducted a meta-analysis of 18 lineages containing species that use C3_3, C4_4, or intermediate C3_3-C4_4 forms of photosynthesis to parameterise a 16-dimensional phenotypic landscape. We then developed and experimentally verified a novel Bayesian approach based on a hidden Markov model that predicts how the C4_4 phenotype evolved. The alternative evolutionary histories underlying the appearance of C4_4 photosynthesis were determined by ancestral lineage and initial phenotypic alterations unrelated to photosynthesis. We conclude that the order of C4_4 trait acquisition is flexible and driven by non-photosynthetic drivers. This flexibility will have facilitated the convergent evolution of this complex trait

    Previsão e análise da estrutura e dinâmica de redes biológicas

    Get PDF
    Increasing knowledge about the biological processes that govern the dynamics of living organisms has fostered a better understanding of the origin of many diseases as well as the identification of potential therapeutic targets. Biological systems can be modeled through biological networks, allowing to apply and explore methods of graph theory in their investigation and characterization. This work had as main motivation the inference of patterns and rules that underlie the organization of biological networks. Through the integration of different types of data, such as gene expression, interaction between proteins and other biomedical concepts, computational methods have been developed so that they can be used to predict and study diseases. The first contribution, was the characterization a subsystem of the human protein interactome through the topological properties of the networks that model it. As a second contribution, an unsupervised method using biological criteria and network topology was used to improve the understanding of the genetic mechanisms and risk factors of a disease through co-expression networks. As a third contribution, a methodology was developed to remove noise (denoise) in protein networks, to obtain more accurate models, using the network topology. As a fourth contribution, a supervised methodology was proposed to model the protein interactome dynamics, using exclusively the topology of protein interactions networks that are part of the dynamic model of the system. The proposed methodologies contribute to the creation of more precise, static and dynamic biological models through the identification and use of topological patterns of protein interaction networks, which can be used to predict and study diseases.O conhecimento crescente sobre os processos biológicos que regem a dinâmica dos organismos vivos tem potenciado uma melhor compreensão da origem de muitas doenças, assim como a identificação de potenciais alvos terapêuticos. Os sistemas biológicos podem ser modelados através de redes biológicas, permitindo aplicar e explorar métodos da teoria de grafos na sua investigação e caracterização. Este trabalho teve como principal motivação a inferência de padrões e de regras que estão subjacentes à organização de redes biológicas. Através da integração de diferentes tipos de dados, como a expressão de genes, interação entre proteínas e outros conceitos biomédicos, foram desenvolvidos métodos computacionais, para que possam ser usados na previsão e no estudo de doenças. Como primeira contribuição, foi proposto um método de caracterização de um subsistema do interactoma de proteínas humano através das propriedades topológicas das redes que o modelam. Como segunda contribuição, foi utilizado um método não supervisionado que utiliza critérios biológicos e topologia de redes para, através de redes de co-expressão, melhorar a compreensão dos mecanismos genéticos e dos fatores de risco de uma doença. Como terceira contribuição, foi desenvolvida uma metodologia para remover ruído (denoise) em redes de proteínas, para obter modelos mais precisos, utilizando a topologia das redes. Como quarta contribuição, propôs-se uma metodologia supervisionada para modelar a dinâmica do interactoma de proteínas, usando exclusivamente a topologia das redes de interação de proteínas que fazem parte do modelo dinâmico do sistema. As metodologias propostas contribuem para a criação de modelos biológicos, estáticos e dinâmicos, mais precisos, através da identificação e uso de padrões topológicos das redes de interação de proteínas, que podem ser usados na previsão e no estudo doenças.Programa Doutoral em Engenharia Informátic

    Developing statistical and bioinformatic analysis of genomic data from tumours

    Get PDF
    Previous prognostic signatures for melanoma based on tumour transcriptomic data were developed predominantly on cohorts of AJCC (American Joint Committee on Cancer) stages III and IV melanoma. Since 92% of melanoma patients are diagnosed at AJCC stages I and II, there is an urgent need for better prognostic biomarkers to allow patient stratification for receiving early adjuvant therapies. This study uses genome-wide tumour gene expression levels and clinico-histopathological characteristics of patients from the Leeds Melanoma Cohort (LMC). Several unsupervised and supervised classification approaches were applied to the transcriptomic data, to identify biological classes of melanoma, and to develop prognostic classification models respectively. Unsupervised clustering identified six biologically distinct primary melanoma classes (LMC classes). Unlike previous molecular classes of melanoma, the LMC classes were prognostic in both the whole LMC dataset and in stage I tumours. The prognostic value of the LMC classes was replicated in an independent dataset, but insufficient data were available to replicate in an AJCC stage I subset. Supervised classification using the Random Forest (RF) approach provided improved performances when adjustments were made to deal with class imbalance, while this did not improve performance of the Support Vector Machine (SVM). However, RF and SVM had similar results overall, with RF only marginally better. Combining clinical and transcriptomic information in the RF further improved the performance of the prediction model in comparison to using clinical information alone. Finally, the agnostically derived LMC classes and the supervised RF model showed convergence in their association with outcome in some groups of patients, but not in others. In conclusion, this study reports six molecular classes of primary melanoma with prognostic value in stage I disease and overall, and a prognostic classification model that predicts outcome in primary melanoma

    MapReduce based Classification for Microarray data using Parallel Genetic Algorithm

    Get PDF
    Inorder to uncover thousands of genes Microarray   produces high throughput is used. Only few gene expression data out of thousands of data is used for disease predication and also for disease classification in medical environment.  To find such initial coexpressed gene groups of clusters whose joint expression is strongly related with the class label A Supervised attribute clustering is used. By sharing the information between each attributes the Mutual Information uses the information of sample varieties to measure the similarity among the attributes. From this the redundant and irrelevant attributes are removed. After forming the clusters the PGA is used to find the optimal feature and is given as mapper function so as to improve the class separability. Using this method the diagnosis can be made easier and effective since its done parallelly. The predictive accuracy is estimated using all the three classifiers such as K-nearest neighbours including naive bayes and Support Vector machine. Thus the overall approach used reducer function which provides excellent predictive capability for accurate medical diagnosis

    MACHINE LEARNING APPROACHES FOR BIOMARKER IDENTIFICATION AND SUBGROUP DISCOVERY FOR POST-TRAUMATIC STRESS DISORDER

    Get PDF
    Post-traumatic stress disorder (PTSD) is a psychiatric disorder caused by environmental and genetic factors resulting from alterations in genetic variation, epigenetic changes and neuroimaging characteristics. There is a pressing need to identify reliable molecular and physiological biomarkers for accurate diagnosis, prognosis, and treatment, as well to deepen the understanding of PTSD pathophysiology. Machine learning methods are widely used to infer patterns from biological data, identify biomarkers, and make predictions. The objective of this research is to apply machine learning methods for the accurate classification of human diseases from genome-scale datasets, focusing primarily on PTSD.The DoD-funded Systems Biology of PTSD Consortium has recruited combat veterans with and without PTSD for measurement of molecular and physiological data from blood or urine samples with the goal of identifying accurate and specific PTSD biomarkers. As a member of the Consortium with access to these PTSD multiple omics datasets, we first completed a project titled Clinical Subgroup-Specific PTSD Classification and Biomarker Discovery. We applied machine learning approaches to these data to build classification models consisting of molecular and clinical features to predict PTSD status. We also identified candidate biomarkers for diagnosis, which improves our understanding of PTSD pathogenesis. In a second project, entitled Multi-Omic PTSD Subgroup Identification and Clinical Characterization, we applied methods for integrating multiple omics datasets to investigate the complex, multivariate nature of the biological systems underlying PTSD. We identified an optimal 2 PTSD subgroups using two different machine learning approaches from 82 PTSD positive samples, and we found that the subgroups exhibited different remitting behavior as inferred from subjects recalled at a later time point. The results from our association, differential expression, and classification analyses demonstrated the distinct clinical and molecular features characterizing these subgroups.Taken together, our work has advanced our understanding of PTSD biomarkers and subgroups through the use of machine learning approaches. Results from our work should strongly contribute to the precise diagnosis and eventual treatment of PTSD, as well as other diseases. Future work will involve continuing to leverage these results to enable precision medicine for PTSD
    • …
    corecore