9,060 research outputs found
Protein Networks as Logic Functions in Development and Cancer
Many biological and clinical outcomes are based not on single proteins, but on modules of proteins embedded in protein networks. A fundamental question is how the proteins within each module contribute to the overall module activity. Here, we study the modules underlying three representative biological programs related to tissue development, breast cancer metastasis, or progression of brain cancer, respectively. For each case we apply a new method, called Network-Guided Forests, to identify predictive modules together with logic functions which tie the activity of each module to the activity of its component genes. The resulting modules implement a diverse repertoire of decision logic which cannot be captured using the simple approximations suggested in previous work such as gene summation or subtraction. We show that in cancer, certain combinations of oncogenes and tumor suppressors exert competing forces on the system, suggesting that medical genetics should move beyond cataloguing individual cancer genes to cataloguing their combinatorial logic
Phenotypic landscape inference reveals multiple evolutionary paths to C photosynthesis
C photosynthesis has independently evolved from the ancestral C
pathway in at least 60 plant lineages, but, as with other complex traits, how
it evolved is unclear. Here we show that the polyphyletic appearance of C
photosynthesis is associated with diverse and flexible evolutionary paths that
group into four major trajectories. We conducted a meta-analysis of 18 lineages
containing species that use C, C, or intermediate C-C forms of
photosynthesis to parameterise a 16-dimensional phenotypic landscape. We then
developed and experimentally verified a novel Bayesian approach based on a
hidden Markov model that predicts how the C phenotype evolved. The
alternative evolutionary histories underlying the appearance of C
photosynthesis were determined by ancestral lineage and initial phenotypic
alterations unrelated to photosynthesis. We conclude that the order of C
trait acquisition is flexible and driven by non-photosynthetic drivers. This
flexibility will have facilitated the convergent evolution of this complex
trait
Previsão e análise da estrutura e dinâmica de redes biológicas
Increasing knowledge about the biological processes that govern the
dynamics of living organisms has fostered a better understanding of the
origin of many diseases as well as the identification of potential therapeutic
targets. Biological systems can be modeled through biological networks,
allowing to apply and explore methods of graph theory in their investigation
and characterization. This work had as main motivation the inference of
patterns and rules that underlie the organization of biological networks.
Through the integration of different types of data, such as gene expression,
interaction between proteins and other biomedical concepts, computational
methods have been developed so that they can be used to predict and study
diseases.
The first contribution, was the characterization a subsystem of the human
protein interactome through the topological properties of the networks that
model it. As a second contribution, an unsupervised method using biological
criteria and network topology was used to improve the understanding of
the genetic mechanisms and risk factors of a disease through co-expression
networks. As a third contribution, a methodology was developed to remove
noise (denoise) in protein networks, to obtain more accurate models, using
the network topology. As a fourth contribution, a supervised methodology
was proposed to model the protein interactome dynamics, using exclusively
the topology of protein interactions networks that are part of the dynamic
model of the system.
The proposed methodologies contribute to the creation of more precise,
static and dynamic biological models through the identification and use of
topological patterns of protein interaction networks, which can be used to
predict and study diseases.O conhecimento crescente sobre os processos biológicos que regem a
dinâmica dos organismos vivos tem potenciado uma melhor compreensão da
origem de muitas doenças, assim como a identificação de potenciais alvos
terapêuticos. Os sistemas biológicos podem ser modelados através de redes
biológicas, permitindo aplicar e explorar métodos da teoria de grafos na sua
investigação e caracterização. Este trabalho teve como principal motivação
a inferência de padrões e de regras que estão subjacentes à organização de
redes biológicas.
Através da integração de diferentes tipos de dados, como a expressão
de genes, interação entre proteÃnas e outros conceitos biomédicos, foram
desenvolvidos métodos computacionais, para que possam ser usados na
previsão e no estudo de doenças.
Como primeira contribuição, foi proposto um método de caracterização de
um subsistema do interactoma de proteÃnas humano através das propriedades
topológicas das redes que o modelam. Como segunda contribuição, foi
utilizado um método não supervisionado que utiliza critérios biológicos e
topologia de redes para, através de redes de co-expressão, melhorar a
compreensão dos mecanismos genéticos e dos fatores de risco de uma
doença. Como terceira contribuição, foi desenvolvida uma metodologia
para remover ruÃdo (denoise) em redes de proteÃnas, para obter modelos
mais precisos, utilizando a topologia das redes. Como quarta contribuição,
propôs-se uma metodologia supervisionada para modelar a dinâmica do
interactoma de proteÃnas, usando exclusivamente a topologia das redes de
interação de proteÃnas que fazem parte do modelo dinâmico do sistema.
As metodologias propostas contribuem para a criação de modelos biológicos,
estáticos e dinâmicos, mais precisos, através da identificação e uso de
padrões topológicos das redes de interação de proteÃnas, que podem ser
usados na previsão e no estudo doenças.Programa Doutoral em Engenharia Informátic
Developing statistical and bioinformatic analysis of genomic data from tumours
Previous prognostic signatures for melanoma based on tumour transcriptomic data were developed predominantly on cohorts of AJCC (American Joint Committee on Cancer) stages III and IV melanoma. Since 92% of melanoma patients are diagnosed at AJCC stages I and II, there is an urgent need for better prognostic biomarkers to allow patient stratification for receiving early adjuvant therapies.
This study uses genome-wide tumour gene expression levels and clinico-histopathological characteristics of patients from the Leeds Melanoma Cohort (LMC). Several unsupervised and supervised classification approaches were applied to the transcriptomic data, to identify biological classes of melanoma, and to develop prognostic classification models respectively.
Unsupervised clustering identified six biologically distinct primary melanoma classes (LMC classes). Unlike previous molecular classes of melanoma, the LMC classes were prognostic in both the whole LMC dataset and in stage I tumours. The prognostic value of the LMC classes was replicated in an independent dataset, but insufficient data were available to replicate in an AJCC stage I subset.
Supervised classification using the Random Forest (RF) approach provided improved performances when adjustments were made to deal with class imbalance, while this did not improve performance of the Support Vector Machine (SVM). However, RF and SVM had similar results overall, with RF only marginally better. Combining clinical and transcriptomic information in the RF further improved the performance of the prediction model in comparison to using clinical information alone. Finally, the agnostically derived LMC classes and the supervised RF model showed convergence in their association with outcome in some groups of patients, but not in others.
In conclusion, this study reports six molecular classes of primary melanoma with prognostic value in stage I disease and overall, and a prognostic classification model that predicts outcome in primary melanoma
MapReduce based Classification for Microarray data using Parallel Genetic Algorithm
Inorder to uncover thousands of genes Microarray   produces high throughput is used. Only few gene expression data out of thousands of data is used for disease predication and also for disease classification in medical environment.  To find such initial coexpressed gene groups of clusters whose joint expression is strongly related with the class label A Supervised attribute clustering is used. By sharing the information between each attributes the Mutual Information uses the information of sample varieties to measure the similarity among the attributes. From this the redundant and irrelevant attributes are removed. After forming the clusters the PGA is used to find the optimal feature and is given as mapper function so as to improve the class separability. Using this method the diagnosis can be made easier and effective since its done parallelly. The predictive accuracy is estimated using all the three classifiers such as K-nearest neighbours including naive bayes and Support Vector machine. Thus the overall approach used reducer function which provides excellent predictive capability for accurate medical diagnosis
MACHINE LEARNING APPROACHES FOR BIOMARKER IDENTIFICATION AND SUBGROUP DISCOVERY FOR POST-TRAUMATIC STRESS DISORDER
Post-traumatic stress disorder (PTSD) is a psychiatric disorder caused by environmental and genetic factors resulting from alterations in genetic variation, epigenetic changes and neuroimaging characteristics. There is a pressing need to identify reliable molecular and physiological biomarkers for accurate diagnosis, prognosis, and treatment, as well to deepen the understanding of PTSD pathophysiology. Machine learning methods are widely used to infer patterns from biological data, identify biomarkers, and make predictions. The objective of this research is to apply machine learning methods for the accurate classification of human diseases from genome-scale datasets, focusing primarily on PTSD.The DoD-funded Systems Biology of PTSD Consortium has recruited combat veterans with and without PTSD for measurement of molecular and physiological data from blood or urine samples with the goal of identifying accurate and specific PTSD biomarkers. As a member of the Consortium with access to these PTSD multiple omics datasets, we first completed a project titled Clinical Subgroup-Specific PTSD Classification and Biomarker Discovery. We applied machine learning approaches to these data to build classification models consisting of molecular and clinical features to predict PTSD status. We also identified candidate biomarkers for diagnosis, which improves our understanding of PTSD pathogenesis. In a second project, entitled Multi-Omic PTSD Subgroup Identification and Clinical Characterization, we applied methods for integrating multiple omics datasets to investigate the complex, multivariate nature of the biological systems underlying PTSD. We identified an optimal 2 PTSD subgroups using two different machine learning approaches from 82 PTSD positive samples, and we found that the subgroups exhibited different remitting behavior as inferred from subjects recalled at a later time point. The results from our association, differential expression, and classification analyses demonstrated the distinct clinical and molecular features characterizing these subgroups.Taken together, our work has advanced our understanding of PTSD biomarkers and subgroups through the use of machine learning approaches. Results from our work should strongly contribute to the precise diagnosis and eventual treatment of PTSD, as well as other diseases. Future work will involve continuing to leverage these results to enable precision medicine for PTSD
- …