4,563 research outputs found

    Statistical Modeling of MicroRNA Expression with Human Cancers

    Get PDF
    MicroRNAs (miRNAs) are small non-coding RNAs (containing about 22 nucleotides) that regulate gene expression. MiRNAs are involved in many different biological processes such as cell proliferation, differentiation, apoptosis, fat metabolism, and human cancer genes; while miRNAs may function as candidates for diagnostic and prognostic biomarkers and predictors of drug response. This paper emphasizes the statistical methods in the analysis of the associations of miRNA gene expression with human cancers and related clinical phenotypes: 1) simple statistical methods include chi-square test, correlation analysis, t-test and one-way ANOVA; 2) regression models include linear and logistic regression; 3) survival analysis approaches such as non-parametric Kaplan-Meier method and log-rank test as well as semi-parametric Cox proportional hazards models have been used for time to event data; 4) multivariate method such as cluster analysis has been used for clustering samples and principal component analysis (PCA) has been used for data mining; 5) Bayesian statistical methods have recently made great inroads into many areas of science, including the assessment of association between miRNA expression and human cancers; and 6) multiple testing

    Computational Models for Transplant Biomarker Discovery.

    Get PDF
    Translational medicine offers a rich promise for improved diagnostics and drug discovery for biomedical research in the field of transplantation, where continued unmet diagnostic and therapeutic needs persist. Current advent of genomics and proteomics profiling called "omics" provides new resources to develop novel biomarkers for clinical routine. Establishing such a marker system heavily depends on appropriate applications of computational algorithms and software, which are basically based on mathematical theories and models. Understanding these theories would help to apply appropriate algorithms to ensure biomarker systems successful. Here, we review the key advances in theories and mathematical models relevant to transplant biomarker developments. Advantages and limitations inherent inside these models are discussed. The principles of key -computational approaches for selecting efficiently the best subset of biomarkers from high--dimensional omics data are highlighted. Prediction models are also introduced, and the integration of multi-microarray data is also discussed. Appreciating these key advances would help to accelerate the development of clinically reliable biomarker systems

    Pathway-Based Multi-Omics Data Integration for Breast Cancer Diagnosis and Prognosis.

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

    Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction

    Get PDF
    The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support

    Bioinformatics applied to human genomics and proteomics: development of algorithms and methods for the discovery of molecular signatures derived from omic data and for the construction of co-expression and interaction networks

    Get PDF
    [EN] The present PhD dissertation develops and applies Bioinformatic methods and tools to address key current problems in the analysis of human omic data. This PhD has been organised by main objectives into four different chapters focused on: (i) development of an algorithm for the analysis of changes and heterogeneity in large-scale omic data; (ii) development of a method for non-parametric feature selection; (iii) integration and analysis of human protein-protein interaction networks and (iv) integration and analysis of human co-expression networks derived from tissue expression data and evolutionary profiles of proteins. In the first chapter, we developed and tested a new robust algorithm in R, called DECO, for the discovery of subgroups of features and samples within large-scale omic datasets, exploring all feature differences possible heterogeneity, through the integration of both data dispersion and predictor-response information in a new statistic parameter called h (heterogeneity score). In the second chapter, we present a simple non-parametric statistic to measure the cohesiveness of categorical variables along any quantitative variable, applicable to feature selection in all types of big data sets. In the third chapter, we describe an analysis of the human interactome integrating two global datasets from high-quality proteomics technologies: HuRI (a human protein-protein interaction network generated by a systematic experimental screening based on Yeast-Two-Hybrid technology) and Cell-Atlas (a comprehensive map of subcellular localization of human proteins generated by antibody imaging). This analysis aims to create a framework for the subcellular localization characterization supported by the human protein-protein interactome. In the fourth chapter, we developed a full integration of three high-quality proteome-wide resources (Human Protein Atlas, OMA and TimeTree) to generate a robust human co-expression network across tissues assigning each human protein along the evolutionary timeline. In this way, we investigate how old in evolution and how correlated are the different human proteins, and we place all them in a common interaction network. As main general comment, all the work presented in this PhD uses and develops a wide variety of bioinformatic and statistical tools for the analysis, integration and enlighten of molecular signatures and biological networks using human omic data. Most of this data corresponds to sample cohorts generated in recent biomedical studies on specific human diseases

    Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

    Get PDF
    The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today.We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets.Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis

    INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE

    Get PDF
    Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics. 1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research. 2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS). 3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes. Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine

    A BAYESIAN MODEL FOR CROSS-STUDY DIFFERENTIAL GENE EXPRESSION

    Get PDF
    In this paper we define a hierarchical Bayesian model for microarray expression data collected from several studies and use it to identify genes that show differential expression between two conditions. Key features include shrinkage across both genes and studies; flexible modeling that allows for interactions between platforms and the estimated effect, and for both concordant and discordant differential expression across studies. We evaluated the performance of our model in a comprehensive fashion, using both artificial data, and a split-sample validation approach that provides an agnostic assessment of the model\u27s behavior not only under the null hypothesis but also under a realistic alternative. The simulation results from the artificial data demonstrate the advantages of a Bayesian model. Compared to a more direct combination of t- or SAM-statistics, the 1-AUC values for the Bayesian model is roughly half of the corresponding values for the t- and SAM-statistics. Furthermore, the simulations provide guidelines for when the Bayesian model is most likely to be useful. Most noticeably, in small studies the Bayesian model generally Outperforms other methods when evaluated by AUC, FDR, and MDR across a range of simulation parameters, and this difference diminishes for larger sample sizes in the individual studies. The split-study validation illustrates appropriate shrinkage of the Bayesian model in the absence of platform-, sample-, and annotation-differences that otherwise complicate experimental data analyses. Finally, we fit our model to four breast cancer studies employing different technologies (cDNA and Affymetrix) to estimate differential expression in estrogen receptor positive tumors versus negative ones. Software and data for reproducing our analysis are publicly available
    • 

    corecore