4,289 research outputs found

    Computational analysis of a plant receptor interaction network

    Full text link
    Trabajo fin de máster en Bioinformática y Biología ComputacionalIn all organisms, complex protein-protein interactions (PPI) networks control major biological functions yet studying their structural features presents a major analytical challenge. In plants, leucine-rich-repeat receptor kinases (LRR-RKs) are key in sensing and transmitting non-self as well as self-signals from the cell surface. As such, LRR-RKs have both developmental and immune functions that allow plants to make the most of their environments. In the model organism in plant molecular biology, Arabidopsis thaliana, most LRR-RKs are still represented by biochemically and genetically uncharacterized receptors. To fix this an LRR-based Cell Surface Interaction (CSI LRR ) network was obtained in 2018, a protein-protein interaction network of the extracellular domain of 170 LRR-RKs that contains 567 bidirectional interactions. Several network analyses have been performed with CSI LRR . However, these analyses have so far not considered the spatial and temporal expression of its proteins. Neither has it been characterized in detail the role of the extracellular domain (ECD) size in the network structure. Because of that, the objective of the present work is to continue with more in depth analyses with the CSI LRR network. This would provide important insights that will facilitate LRR-RKs function characterization. The first aim of this work is to test out the fit of the CSI LRR network to a scale-free topology. To accomplish that, the degree distribution of the CSI LRR network was compared with the degree distribution of the known network models of scale-free and random. Additionally, three network attack algorithms were implemented and applied to these two network models and the CSI LRR network to compare their behavior. However, since the CSI LRR interaction data comes from an in vitro screening, there is no direct evidence whether its protein-protein interactions occur inside the plant cells. To gain insight on how the network composition changes depending on the transcriptional regulation, the interaction data of the CSI LRR was integrated with 4 different RNA-Seq datasets related with the network biological functions. To automatize this task a Python script was written. Furthermore, it was evaluated the role of the LRR-RKs in the network structure depending on the size of their extracellular domain (large or small). For that, centrality parameters were measured, and size-targeted attacks performed. Finally, gene regulatory information was integrated into the CSI LRR to classify the different network proteins according to the function of the transcription factors that regulate its expression. The results were that CSI LRR fits a power law degree distribution and approximates a scale- free topology. Moreover, CSI LRR displays high resistance to random attacks and reduced resistance to hub/bottleneck-directed attacks, similarly to scale-free network model. Also, the integration of CSI LRR interaction data and RNA-Seq data suggests that the transcriptional regulation of the network is more relevant for developmental programs than for defense responses. Another result was that the LRR-RKs with a small ECD size have a major role in the maintenance of the CSI LRR integrity. Lastly, it was hypothesized that the integration of CSI LRR interaction data with predicted gene regulatory networks could shed light upon the functioning of growth-immunity signaling crosstalk

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Epigenetic characterization of human hepatocyte subpopulations in context of complex metabolic diseases and during in vitro differentiation of hepatocyte-like cells

    Get PDF
    The comprehensive transcriptional and epigenetic characterization of human hepatocyte subpopulations is necessary to achieve a better understanding of regulatory processes in health and complex metabolic diseases as well as during in vitro differentiation. Based on integrative analysis of genome-wide sequencing data, this thesis aims to unravel hepatocyte heterogeneity in different biological contexts. A deeper understanding of spatial organization of cells in human tissues is an important challenge. Using a unique experimental set-up based on laser capture microdissection coupled to next generation sequencing, which preserves spatial orientation and still provides genome-wide data of well defined subpopulations, the first combined spatial analysis of transcriptomes and methylomes across three micro-dissected zones of human liver provides a wealth of new positional insights, both in health and in context of fatty liver disease. In addition, these spatial maps serve as reference for projection of single cell data into hepatic pseudospace, which is still a major challenge. Hence, a novel pseudospace inference approach, which considerably improves spatial reconstruction of single cells into tissue context, is demonstrated for human liver. Finally, the identification of underlying regulatory networks by integrative epigenomic analysis of in vitro differentiated hepatocyte-like cells contributes to the development of reasonable cell culture interventions to improve differentiation.Die umfassende transkriptionelle und epigenetische Charakterisierung humaner Leberzellsubpopulationen ist notwendig für die Aufklärung regulatorischer Prozesse in gesundem Gewebe, sowie im Zusammenhang mit komplexen metabolischen Erkrankungen und während der in vitro Differenzierung. Ziel dieser Arbeit ist es, basierend auf der integrativen Analyse genomweiter Sequenzierungsdaten, die Heterogenität von Leberzellen besser zu verstehen. Die räumliche Organisation von Zellen in humanem Gewebe stellt eine große Herausforderung dar. Mit Hilfe von Lasermikrodissektion gekoppelt an Hochdurchsatzsequenzierung ist es möglich definierte Subpopulationen hinsichtlich ihres Gewebekontextes zu analysieren. Somit konnte die erste räumliche Analyse von Transkriptom und Methylom dreier Zonen der humanen Leber erstellt werden, die eine Vielzahl neuer Erkenntnisse sowohl in gesundem Lebergewebe als auch in Zusammenhang mit Fettlebererkrankungen liefert. Außerdem wurde auf Grundlage dieser räumlichen Karten ein neuer Ansatz zur Projektion von Einzelzelldaten in den räumlichen Gewebekontext etabliert. Schließlich konnte durch die integrative Analyse der ausschlaggebenden regulatorischen Netzwerke während der in vitro Differenzierung von Hepatozyten-ähnlichen Zellen neue Strategien zur Verbesserung der Differenzierung entwickelt werden

    Transcriptome profiling of grapevine seedless segregants during berry development reveals candidate genes associated with berry weight

    Get PDF
    Indexación: Web of Science; PubMedBackground Berry size is considered as one of the main selection criteria in table grape breeding programs. However, this is a quantitative and polygenic trait, and its genetic determination is still poorly understood. Considering its economic importance, it is relevant to determine its genetic architecture and elucidate the mechanisms involved in its expression. To approach this issue, an RNA-Seq experiment based on Illumina platform was performed (14 libraries), including seedless segregants with contrasting phenotypes for berry weight at fruit setting (FST) and 6–8 mm berries (B68) phenological stages. Results A group of 526 differentially expressed (DE) genes were identified, by comparing seedless segregants with contrasting phenotypes for berry weight: 101 genes from the FST stage and 463 from the B68 stage. Also, we integrated differential expression, principal components analysis (PCA), correlations and network co-expression analyses to characterize the transcriptome profiling observed in segregants with contrasting phenotypes for berry weight. After this, 68 DE genes were selected as candidate genes, and seven candidate genes were validated by real time-PCR, confirming their expression profiles. Conclusions We have carried out the first transcriptome analysis focused on table grape seedless segregants with contrasting phenotypes for berry weight. Our findings contributed to the understanding of the mechanisms involved in berry weight determination. Also, this comparative transcriptome profiling revealed candidate genes for berry weight which could be evaluated as selection tools in table grape breeding programs.http://bmcplantbiol.biomedcentral.com/articles/10.1186/s12870-016-0789-

    A model validation pipeline for healthy tissue genome-scale metabolic models

    Get PDF
    Dissertação de mestrado em BioinformáticaNos últimos anos, os métodos de alto rendimento disponibilizaram dados ómicos referentes a várias camadas da organização biológica, permitindo a integração do conhecimento de componentes individuais em modelos complexos, como modelos metabólicos à escala genómica (GSMMs). Estes podem ser analisados por métodos de modelação baseada em restrições(CBM), que facilitam abordagens preditivas in silico. Os modelos metabólicos humanos têm sido usados para estudar tecidos saudáveis e as suas doenças metabólicas associadas, como obesidade, diabetes e cancro. Modelos humanos genéricos podem ser integrados com dados contextuais por meio de algoritmos de reconstrução, com vista a produzir modelos metabólicos contextualizados (CSMs), que são normalmente melhores a capturar a variação entre diferentes tecidos e tipos de células. Como o corpo humano contém uma grande variedade de tecidos e tipos de células, os CSMs são frequentemente adotados como um meio de obter modelos metabólicos mais precisos de tecido humano saudável. No entanto, ao contrário de modelos de microrganismos e cancro, que acomodam vários métodos de validação, como a comparação de fluxos in silico ou de previsões de genes essenciais com dados experimentais, os métodos de validação facilmente aplicáveis a CSMs de tecido humano saudável podem ser mais limitados. Consequentemente, apesar de esforços continuados para atualizar os modelos humanos genéricos e algoritmos de reconstrução para extrair CSMs de alta qualidade, a sua validação continua a ser uma preocupação. Este trabalho apresenta uma pipeline para a extração e validação básica de CSMs de tecidos humanos normais derivados da integração de dados transcriptómicos com um modelo humano genérico. Todos os CSMs foram extraídos do modelo genérico Human-GEM publicado recentemente por Robinson et al. (2020), usando o package Troppo em Python e nos algoritmos de reconstrução fastCORE e tINIT nele implementados. Os CSMs extraídos correspondem a 11 tecidos saudáveis disponíveis no conjunto de dados GTEx v8. Antes da extração, métodos de aprendizagem máquina foram aplicados à seleção de um limiar para conversão em gene scores. Os modelos de maior qualidade foram obtidos com um limite mínimo global aplicado diretamente aos dados ómicos. A estratégia de validação focou-se no número de tarefas metabólicas passadas como um indicador de desempenho. Por último, este trabalho é acompanhado por Jupyter Notebooks, que incluem um guia de extração de modelos para novos utilizadores.n the past few years, high-throughput experimental methods have made omics data available for several layers of biological organization, enabling the integration of knowledge from individual components into complex modelssuch as genome-scale metabolic models (GSMMs). These can be analysed by constraint based modelling (CBM) methods, which facilitate in silico predictive approaches. Human metabolic models have been used to study healthy human tissues and their associated metabolic diseases, such as obesity, diabetes, and cancer. Generic human models can be integrated with contextual data through reconstruction algorithms to produce context-specific models (CSMs), which are typically better at capturing the variation between different tissues and cell types. As the human body contains a multitude of tissues and cell types, CSMs are frequently adopted as a means to obtain accurate metabolic models of healthy human tissues. However, unlike microorganisms’ or cancer models, which allow several methods of validation such as the comparison of in silico fluxes or gene essentiality predictions to experimental data, the validation methods easily applicable to CSMs of healthy human tissue are more limited. Consequently, despite continued efforts to update generic human models and reconstruction algorithms to extract high quality CSMs, their validation remains a concern. This work presents a pipeline for the extraction and basic validation of CSMs of normal human tissues derived from the integration of transcriptomics data with a generic human model. All CSMs were extracted from the Human-GEM generic model recently published by Robinson et al. (2020), relied on the open-source Troppo Python package and in the fastCORE and tINIT reconstruction algorithms implemented therein. CSMs were extracted for 11 healthy tissues available in the GTEx v8 dataset. Prior to extraction, machine learning methods were applied to threshold selection for gene scores conversion. The highest quality models were obtained with a global threshold applied to the omics data directly. The CSM validation strategy focused on the total number of metabolic tasks passed as a performance indicator. Lastly, this work is accompanied by Jupyter Notebooks, which include a beginner friendly model extraction guide
    corecore