5 research outputs found

    Bioinformatics applied to human genomics and proteomics: development of algorithms and methods for the discovery of molecular signatures derived from omic data and for the construction of co-expression and interaction networks

    Get PDF
    [EN] The present PhD dissertation develops and applies Bioinformatic methods and tools to address key current problems in the analysis of human omic data. This PhD has been organised by main objectives into four different chapters focused on: (i) development of an algorithm for the analysis of changes and heterogeneity in large-scale omic data; (ii) development of a method for non-parametric feature selection; (iii) integration and analysis of human protein-protein interaction networks and (iv) integration and analysis of human co-expression networks derived from tissue expression data and evolutionary profiles of proteins. In the first chapter, we developed and tested a new robust algorithm in R, called DECO, for the discovery of subgroups of features and samples within large-scale omic datasets, exploring all feature differences possible heterogeneity, through the integration of both data dispersion and predictor-response information in a new statistic parameter called h (heterogeneity score). In the second chapter, we present a simple non-parametric statistic to measure the cohesiveness of categorical variables along any quantitative variable, applicable to feature selection in all types of big data sets. In the third chapter, we describe an analysis of the human interactome integrating two global datasets from high-quality proteomics technologies: HuRI (a human protein-protein interaction network generated by a systematic experimental screening based on Yeast-Two-Hybrid technology) and Cell-Atlas (a comprehensive map of subcellular localization of human proteins generated by antibody imaging). This analysis aims to create a framework for the subcellular localization characterization supported by the human protein-protein interactome. In the fourth chapter, we developed a full integration of three high-quality proteome-wide resources (Human Protein Atlas, OMA and TimeTree) to generate a robust human co-expression network across tissues assigning each human protein along the evolutionary timeline. In this way, we investigate how old in evolution and how correlated are the different human proteins, and we place all them in a common interaction network. As main general comment, all the work presented in this PhD uses and develops a wide variety of bioinformatic and statistical tools for the analysis, integration and enlighten of molecular signatures and biological networks using human omic data. Most of this data corresponds to sample cohorts generated in recent biomedical studies on specific human diseases

    Identification of expression patterns in the progression of disease stages by integration of transcriptomic data

    No full text
    From Statistical Methods for Omics Data Integration and Analysis 2015 - Valencia, Spain. 14-16 September 2015.[Background]: In the study of complex diseases using genome-wide expression data from clinical samples, a difficult case is the identification and mapping of the gene signatures associated to the stages that occur in the progression of a disease. The stages usually correspond to different subtypes or classes of the disease, and the difficulty to identify them often comes from patient heterogeneity and sample variability that can hide the biomedical relevant changes that characterize each stage, making standard differential analysis inadequate or inefficient. [Results]: We propose a methodology to study diseases or disease stages ordered in a sequential manner (e.g. from early stages with good prognosis to more acute or serious stages associated to poor prognosis). The methodology is applied to diseases that have been studied obtaining genome-wide expression profiling of cohorts of patients at different stages. The approach allows searching for consistent expression patterns along the progression of the disease through two major steps: (i) identifying genes with increasing or decreasing trends in the progression of the disease; (ii) clustering the increasing/decreasing gene expression patterns using an unsupervised approach to reveal whether there are consistent patterns and find genes altered at specific disease stages. The first step is carried out using Gamma rank correlation to identify genes whose expression correlates with a categorical variable that represents the stages of the disease. The second step is done using a Self Organizing Map (SOM) to cluster the genes according to their progressive profiles and identify specific patterns. Both steps are done after normalization of the genomic data to allow the integration of multiple independent datasets. In order to validate the results and evaluate their consistency and biological relevance, the methodology is applied to datasets of three different diseases: myelodysplastic syndrome, colorectal cancer and Alzheimer's disease. A software script written in R, named genediseasePatterns, is provided to allow the use and application of the methodology. [Conclusion]: The method presented allows the analysis of the progression of complex and heterogeneous diseases that can be divided in pathological stages. It identifies gene groups whose expression patterns change along the advance of the disease, and it can be applied to different types of genomic data studying cohorts of patients in different states.SA and FJCL were supported by a Research Grant to young scientists given by the Consejeria de Educacion (Junta de Castilla y León, Spain) and co-funded by the European Social Fund (ESF). MA was supported by a Research Grant for the PhD of the Spanish National Research Council (Junta para Ampliación de Estudios, Consejo Superior de Investigaciones Cientificas, CSIC; ref. 09-02402) also co-funded by the European Social Fund (ESF). We also acknowledge the funding provided to Dr. J. De Las Rivas research group by the Consejeria de Sanidad (Junta de Castilla y León, Spain) with project grant on Biomedicine: BIO/SA08/14; and by the Spanish Ministry of Economy and Competitiveness (MINECO) through the National Institute of Health Carlos III (ISCiii) with project grants co-funded by FEDER: PI12/00624 and PI15/00328.Peer Reviewe

    Additional file 1: Table S1. of Identification of expression patterns in the progression of disease stages by integration of transcriptomic data

    No full text
    Significant genes found in the patterns associated to the progression of MDS: A total set of 189 genes were found. These genes are included in one of the 4 patterns identified (marked in red colors in the case of the 2 increasing trends or green colors in the case of the 2 decreasing trends). The Gamma correlation factor and the adjusted p-value of such correlation are included for each gene in its pattern. (XLSX 79 kb

    Additional file 2: Table S2. of Identification of expression patterns in the progression of disease stages by integration of transcriptomic data

    No full text
    Significant genes found in the patterns associated to the progression of AD: In total 74 genes were found as significant assigned to one of the 4 AD patterns. The patterns are marked in red colors in the case of increasing trends or green colors in the case of decreasing trends. The Gamma correlation factor and the adjusted p-value of such correlation are included for each gene in its pattern. (XLSX 13 kb
    corecore