31 research outputs found
Previsão e análise da estrutura e dinâmica de redes biológicas
Increasing knowledge about the biological processes that govern the
dynamics of living organisms has fostered a better understanding of the
origin of many diseases as well as the identification of potential therapeutic
targets. Biological systems can be modeled through biological networks,
allowing to apply and explore methods of graph theory in their investigation
and characterization. This work had as main motivation the inference of
patterns and rules that underlie the organization of biological networks.
Through the integration of different types of data, such as gene expression,
interaction between proteins and other biomedical concepts, computational
methods have been developed so that they can be used to predict and study
diseases.
The first contribution, was the characterization a subsystem of the human
protein interactome through the topological properties of the networks that
model it. As a second contribution, an unsupervised method using biological
criteria and network topology was used to improve the understanding of
the genetic mechanisms and risk factors of a disease through co-expression
networks. As a third contribution, a methodology was developed to remove
noise (denoise) in protein networks, to obtain more accurate models, using
the network topology. As a fourth contribution, a supervised methodology
was proposed to model the protein interactome dynamics, using exclusively
the topology of protein interactions networks that are part of the dynamic
model of the system.
The proposed methodologies contribute to the creation of more precise,
static and dynamic biological models through the identification and use of
topological patterns of protein interaction networks, which can be used to
predict and study diseases.O conhecimento crescente sobre os processos biológicos que regem a
dinâmica dos organismos vivos tem potenciado uma melhor compreensão da
origem de muitas doenças, assim como a identificação de potenciais alvos
terapêuticos. Os sistemas biológicos podem ser modelados através de redes
biológicas, permitindo aplicar e explorar métodos da teoria de grafos na sua
investigação e caracterização. Este trabalho teve como principal motivação
a inferência de padrões e de regras que estão subjacentes à organização de
redes biológicas.
Através da integração de diferentes tipos de dados, como a expressão
de genes, interação entre proteínas e outros conceitos biomédicos, foram
desenvolvidos métodos computacionais, para que possam ser usados na
previsão e no estudo de doenças.
Como primeira contribuição, foi proposto um método de caracterização de
um subsistema do interactoma de proteínas humano através das propriedades
topológicas das redes que o modelam. Como segunda contribuição, foi
utilizado um método não supervisionado que utiliza critérios biológicos e
topologia de redes para, através de redes de co-expressão, melhorar a
compreensão dos mecanismos genéticos e dos fatores de risco de uma
doença. Como terceira contribuição, foi desenvolvida uma metodologia
para remover ruído (denoise) em redes de proteínas, para obter modelos
mais precisos, utilizando a topologia das redes. Como quarta contribuição,
propôs-se uma metodologia supervisionada para modelar a dinâmica do
interactoma de proteínas, usando exclusivamente a topologia das redes de
interação de proteínas que fazem parte do modelo dinâmico do sistema.
As metodologias propostas contribuem para a criação de modelos biológicos,
estáticos e dinâmicos, mais precisos, através da identificação e uso de
padrões topológicos das redes de interação de proteínas, que podem ser
usados na previsão e no estudo doenças.Programa Doutoral em Engenharia Informátic
Genetic and environmental prediction of opioid cessation using machine learning, GWAS, and a mouse model
The United States is currently experiencing an epidemic of opioid use, use disorder, and overdose-related deaths. While studies have identified several loci that are associated with opioid use disorder (OUD) risk, the genetic basis for the ability to discontinue opioid use has not been investigated. Furthermore, very few studies have investigated the non-genetic factors that are predictive of opioid cessation or their predictive ability.
In this thesis, I studied a novel phenotype–opioid cessation, defined as the time since last use of illicit opioids (1 year ago as cease) among persons meeting lifetime DSM-5 criteria for opioid use disorder (OUD).
In chapter two, I identified novel genetic variants and biological pathways that potentially regulate opioid cessation success through a genome wide study, as well as genetic overlap between opioid cessation and other substance cessation traits.
In chapter three, I identified multiple non-genetic risk factors specific to each racial group that are predictive of opioid cessation from the same individuals analyzed in chapter two by applying several linear and non-linear machine learning techniques to a set of more than 3,000 variables assessed by a structured psychiatric interview. Factors identified from this atheoretical approach can be grouped into opioid use activities, other drug use, health conditions, and demographics, while the predictive accuracy as high as nearly 80% was achieved. The findings from this research generated more hypotheses for future studies to reference.
In chapter four, I performed differential gene expression and network analysis on mice with different oxycodone (an opioid receptor agonist)-induced behaviors and compared the significantly associated genes and network modules with top-ranked genes identified in humans. The pathway cross-talks and gene homologs identified from both species illuminate the potential molecular mechanism of opioid behaviors.
In summary, this thesis utilized statistical genetics, machine learning, and a computational biology framework to address factors that are associative with opioid cessation in humans, and cross-referenced the genetic findings in a mouse model. These findings serve as references for future studies and provide a framework for personalizing the treatment of OUD
Multi-omic biomarker discovery and network analyses to elucidate the molecular mechanisms of lung cancer premalignancy
Lung cancer (LC) is the leading cause of cancer death in the US, claiming over 160,000 lives annually. Although CT screening has been shown to be efficacious in reducing mortality, the limited access to screening programs among high-risk individuals and the high number of false positives contribute to low survival rates and increased healthcare costs. As a result, there is an urgent need for preventative therapeutics and novel interception biomarkers that would enhance current methods for detection of early-stage LC.
This thesis addresses this challenge by examining the hypothesis that transcriptomic changes preceding the onset of LC can be identified by studying bronchial premalignant lesions (PMLs) and the normal-appearing airway epithelial cells altered in their presence (i.e., the PML-associated airway field of injury). PMLs are the presumed precursors of lung squamous cell carcinoma (SCC) whose presence indicates an increased risk of developing SCC and other subtypes of LC. Here, I leverage high-throughput mRNA and miRNA sequencing data from bronchial brushings and lesion biopsies to develop biomarkers of PML presence and progression, and to understand regulatory mechanisms driving early carcinogenesis.
First, I utilized mRNA sequencing data from normal-appearing airway brushings to build a biomarker predictive of PML presence. After verifying the power of the 200-gene biomarker to detect the presence of PMLs, I evaluated its capacity to predict PML progression and detect presence of LC (Aim 1). Next, I identified likely regulatory mechanisms associated with PML severity and progression, by evaluating miRNA expression and gene coexpression modules containing their targets in bronchial lesion biopsies (Aim2). Lastly, I investigated the preservation of the PML-associated miRNAs and gene modules in the airway field of injury, highlighting an emergent link between the airway field and the PMLs (Aim 3).
Overall, this thesis suggests a multi-faceted utility of PML-associated genomic signatures as markers for stratification of high-risk smokers in chemoprevention trials, markers for early detection of lung cancer, and novel chemopreventive targets, and yields valuable insights into early lung carcinogenesis by characterizing mRNA and miRNA expression alterations that contribute to premalignant disease progression towards LC.2020-01-2
Transcriptomic Profiling in Mild Cognitive Impairment and Alzheimer's Disease Using Neuroimaging Endophenotypes
Indiana University-Purdue University Indianapolis (IUPUI)Alzheimer’s disease (AD) is a devastating neurodegenerative disease affecting more than 6 million Americans and 50 million people worldwide currently. It is an irreversible neurodegenerative disease which causes decline in memory, cognition, personality, and other functions which eventually lead to death due to complete brain failure.
Recently there has been a lot of research that has focused on enabling early intervention and disease prevention in AD which could have a significant impact on this disease, be crucial for life management, assessment of risk for future generations, and assistance in end-of-life preparation. For a late-life complex multifactorial disease, such as AD, where both genetic and environmental factors are involved, integrating multiple layers of genetic, imaging, and other biomarker data is a critical step for therapeutic discovery and building predictive risk assessment tools.
The multifactorial nature of AD suggests that multiple therapeutic targets need to be identified and tested together. Hence, we need a systems-level approach to build biomarker profiles which can be used for drug discovery and screening/risk assessment. The research presented in this dissertation focuses on utilizing a systems level approach to identify promising imaging genetics biomarkers that provide insight into dysregulated biological pathways in AD pathogenesis and identify critical mRNA measures that can be investigated further within the scope of novel therapeutics, as well as input variables in predictive models for AD risk, screening, and diagnosis. The overall research goal was the development of systems level, imaging genetics biomarker signatures to serve as tools for risk analysis and therapeutic discovery in AD. The specific outcomes of the analyses were characterization of patterns in gene expression at systems level using neuroimaging endophenotypes, and identification of specific driver genes and genotypic variants, which can inform predictive modeling for diagnosis, risk, and pathogenic profiling in AD
Identifying therapeutic targets in glioma using integrated network analysis
Gliomas are the most common brain tumours in adult population with rapid progression and poor prognosis. Survival among the patients diagnosed with the most aggressive histopathological subtype of gliomas, the glioblastoma, is a mere 12.6 months given the current standard of care. While glioblastomas mostly occur in people over 60, the lower-grade gliomas afflict themselves upon individuals in their third and fourth decades of life. Collectively, the gliomas are one of the major causes of cancer-related death in individuals under fortyin the UK. Over the past twenty years, little has changed in the standard of glioma treatment and the disease has remained incurable. This study focuses on identifying potential therapeutic targets in gliomasusing systems-level approaches and large-scale data integration.I used publicly available transcriptomic data to identify gene co-expression networks associated with the progression of IDH1-mutant 1p/19q euploid astrocytomas from grade II to grade III and high-lighted hub-genes of these networks, which could be targeted to modulate their biological function. I also studied the changes in co-expression patterns between grade II and grade III gliomas and identified a cluster of genes with differential co-expression in different disease states (module M2). By data integration and adaptation of reverse-engineering methods, I elucidated master regulators of the module M2. I then sought to counteract the regulatory activity by using drug-induced gene expression dataset to find compounds inducing gene expression in the opposite direction of the disease signature. I proposed resveratrol as a potentially disease modifying compound, which when administered to patients with a low-grade disease could potentially delay glioma progression.Finally, I appliedanensemble-learning algorithm on a large-scale loss-of-function viability screen in cancer cell-lines with different genetic backgrounds to identify gene dependencies associated with chromosomal copy-number losses common intheglioblastomas. I propose five novel target predictions to be validated in future experiments.Open acces
Functional Analysis of Human Long Non-coding RNAs and Their Associations with Diseases
Within this study, we sought to leverage knowledge from well-characterized protein coding genes to characterize the lesser known long non-coding RNA (lncRNA) genes using computational methods to find functional annotations and disease associations. Functional genome annotation is an essential step to a systems-level view of the human genome. With this knowledge, we can gain a deeper understanding of how humans develop and function, and a better understanding of human disease. LncRNAs are transcripts greater than 200 nucleotides, which do not code for proteins. LncRNAs have been found to regulate development, tissue and cell differentiation, and organ formation. Their dysregulation has been linked to several diseases including autism spectrum disorder (ASD) and cancer. While a great deal of research has been dedicated to protein-coding genes, the relatively recently discovered lncRNA genes have yet to be characterized. LncRNA function is tied closely to when and where they are expressed. Co-expression network analysis offer a means of functional annotation of uncharacterized genes through a guilt by association approach. We have constructed two co-expression networks using known disease-associated protein-coding genes and lncRNA genes. Through clustering of the networks, gene set enrichment analysis, and centrality measures, we found enrichment for disease association and functions as well as identified high-confidence lncRNA disease gene targets. We present a novel approach to the identification of disease state associations by demonstrating genes that are associated with the same disease states share patterns that can be discerned from transcriptomes of healthy tissues. Using a machine learning algorithm, we built a model to classify ASD versus non-ASD genes using their expression profiles from healthy developing human brain tissues. Feature selection during the model-building process also identified critical temporospatial points for the determination of ASD genes. We constructed a webserver tool for the prioritization of genes for ASD association. The webserver tool has a database containing prioritization and co-expression information for nearly every gene in the human genome
Recommended from our members
Toxicogenomic Biomarkers Associated with PAH Carcinogenic Potential in a 3D in Vitro Bronchial Epithelial Model
The environmental health science community recognizes polycyclic aromatic hydrocarbons (PAHs) as a re-emerging class of environmental pollutants due to their persistence and prominence in mixtures of concern. Due to their widespread distribution in the environment, exposure to PAHs often occur as complex chemical mixtures. Exposures are linked to numerous adverse health outcomes in humans, with cancer as the greatest concern. Current assessment of cancer risk for PAHs involves testing individual compounds in a two-year rodent bioassay. These studies are time and resource-intensive, and often lack reproducibility or concordance. Furthermore, they require extrapolation of effects to humans, leading to further uncertainties regarding species-specific biology and chemical mode of action (MOA). The primary method for estimating cancer risk of PAH mixtures is the relative potency factor (RPF) approach in which mixtures are evaluated based on a subset of individual component PAHs compared to benzo[a]pyrene (BAP) as a surrogate or reference. However, we and others have found this approach has proved to be inadequate for predicting carcinogenicity of PAH mixtures and certain individual PAHs, particularly those that function through alternate pathways or exhibit greater promotional capacity compared to BAP. Furthermore, the specific mechanisms by which environmental exposures to
PAHs may cause cardio-respiratory diseases and increase cancer risk remains poorly understood. In this dissertation, we employed a 3D, organotypic human in vitro bronchial epithelial culture (HBEC) model to address these gaps in knowledge. First, a comparative transcriptomic evaluation was conducted to assess potential differences in mechanism of toxicity for two PAHs, benzo[a]pyrene (BAP) and dibenzo[def,p]chrysene (DBC), compared to a complex PAH mixture based on short-term biosignatures identified from global gene expression profiling. Comparison of BAP and DBC gene signatures showed that a majority of genes (~60%) were uniquely regulated by treatment, including those enriched for cell cycle, hypoxia, oxidative stress, and inflammation. Gene networks involved in NRF2-mediated oxidative stress detoxification were upregulated by BAP, while DBC downregulated these same targets, suggesting a chemical-specific pattern in transcriptional regulation involved in antioxidant response, potentially contributing to differences in PAH potency. These findings support research scrutinizing the applicability of the RPF, where assumptions of similar MOA are necessary for quantitative PAH cancer risk assessment. Next, we developed and refined an approach to utilize chemical-specific transcriptional patterns towards accurate classification of carcinogenic potency of PAHs and PAH mixtures. Systems biology information was collected from a human in vitro airway epithelial model exposed to a range of non-carcinogenic and carcinogenic PAHs and PAH mixtures. These transcriptional changes were evaluated for differentially enriched biological functions. Individual pathway-based gene sets were tested for optimal classification performance. Posterior probabilities of best performing gene sets were selected and integrated via Bayesian integration resulting in a 91% accurate classifier with four gene sets, including aryl hydrocarbon receptor signaling, regulation of epithelial mesenchymal transition, regulation of angiogenesis, and cell cycle G2-M. In addition, transcriptional benchmark dose modeling of (BAP) showed that the most sensitive gene sets were largely dissimilar from those that best classified
PAH carcinogenicity challenging current assumptions that BAP carcinogenicity (and subsequent mode of action) is reflective of overall PAH carcinogenicity. Lastly, we evaluated molecular mechanisms related to PAH cancer risk through a two-tiered weighted gene co-expression network analysis (WGCNA) two-tiered approach, first to identify gene sets co-modulated to RPF cancer risk and then to link genes to a more comprehensive list of regulatory values, including inhalation-specific risk values. Over 3,000 genes associated with processes of cell cycle regulation, inflammation, DNA damage, and cell adhesion processes were found to be co-modulated with increasing RPF with pathways for cell cycle S phase and cytoskeleton actin identified as the most significantly enriched biological networks correlated to RPF. These gene sets represent potential biomarkers that can be used to evaluate cancer risk associated with PAH mixtures. In this study, the results illustrated the utility of systems toxicology approaches in analyzing global gene expression towards chemical hazard assessment, and information obtained from these analyses could be used towards future predictive model development. This work expanded current understanding of early mechanisms involved in PAH toxicity and provided novel applications utilizing toxicogenomics and organotypic cell culture models for classification, modelling, and biomarker identification. Together, these advances support further development of alternative approaches for use in predictive and mechanistic toxicology towards chemical hazard assessment
An Integrated, Module-based Biomarker Discovery Framework
Identification of biomarkers that contribute to complex human disorders is a principal and challenging task in computational biology. Prognostic biomarkers are useful for risk assessment of disease progression and patient stratification. Since treatment plans often hinge on patient stratification, better disease subtyping has the potential to significantly improve survival for patients. Additionally, a thorough understanding of the roles of biomarkers in cancer pathways facilitates insights into complex disease formation, and provides potential druggable targets in the pathways.
Many statistical methods have been applied toward biomarker discovery, often combining feature selection with classification methods. Traditional approaches are mainly concerned with statistical significance and fail to consider the clinical relevance of the selected biomarkers. Two additional problems impede meaningful biomarker discovery: gene multiplicity (several maximally predictive solutions exist) and instability (inconsistent gene sets from different experiments or cross validation runs).
Motivated by a need for more biologically informed, stable biomarker discovery method, I introduce an integrated module-based biomarker discovery framework for analyzing high- throughput genomic disease data. The proposed framework addresses the aforementioned challenges in three components. First, a recursive spectral clustering algorithm specifically
4
tailored toward high-dimensional, heterogeneous data (ReKS) is developed to partition genes into clusters that are treated as single entities for subsequent analysis. Next, the problems of gene multiplicity and instability are addressed through a group variable selection algorithm (T-ReCS) based on local causal discovery methods. Guided by the tree-like partition created from the clustering algorithm, this algorithm selects gene clusters that are predictive of a clinical outcome. We demonstrate that the group feature selection method facilitate the discovery of biologically relevant genes through their association with a statistically predictive driver. Finally, we elucidate the biological relevance of the biomarkers by leveraging available prior information to identify regulatory relationships between genes and between clusters, and deliver the information in the form of a user-friendly web server, mirConnX
Ant Colony Optimization
Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented