57 research outputs found

    Alternative splicing: regulation, function and evolution

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Medicina, Departamento de Bioquímica. Fecha de lectura: 13-01-2021Introns populate eukaryotic genes to a variable extent across species, being widespread in vertebrates and mammals. While the evolutionary advantages, if any, of introns, remain unclear, their expansion has provided the opportunity to splice genes in more than a single way, allowing the production of diferent mRNAs from a single gene through Alternative splicing (AS). AS patterns change during the development of complex organisms and diverge across diferent tissues and experimental conditions. These highly reproducible changes evidences the existence of a regulatory network that ensures repeatable responses to certain stimuli and suggest that, at least some of them, play a role in the overall physiological response or adaptation. Not surprisingly, perturbation of some elements of this network is often associated with pathological conditions. However, not only we are far from a complete characterization of the molecular mechanisms that drive AS changes in most pathologies like those afecting the heart, but the computational tools that are currently used to study these regulatory networks are limiting our ability to extract all the information that is hidden in the data. It has been long hypothesized that AS contributes to a great expansion of the proteome and facilitates the evolution of new functions from pre-existing ones without gene duplication. While there are very well known examples of how AS enables the production of diferent functional proteins or mRNAs, the proportion of AS isoforms that are actually functional remains large unknown. Indeed, recent studies from diferent perspectives, including both transcriptomic, proteomics and sequence evolutionary analysis suggest that this percentage may be rather small and that much of the observed transcriptomic diversity is driven by non-functional noise in the splicing process. In this thesis, we have studied global AS patterns through computational analysis of large RNA-seq datasets to characterize the causes and consequences of AS changes from diferent perspectives. First, we have analyzed how AS global patterns change during heart development and disease using data from a variety of mouse models. We found that AS changes modulate diferent biological processes than gene expression ones and are associated to isoform speci c protein-protein interactions. Disease patterns partially recapitulate developmental patterns probably through the upregulation of PTBP1, which is suficient to induce pathological changes in the heart. Second, in an attempt to improve computational tools for identi cation of regulatory elements, we have developed dSreg. This tool leverages the power of bayesian inference and hierarchical models to pool information across the whole transcriptome to infer, not only the changes in the activities of the underlying regulatory elements, but also the changes in inclusion rates, outperforming competing methods and tools made for both purposes separately. Finally, we have studied the evolutionary process driving AS divergence during mammalian evolution using models of phenotypic evolution in a phylogenetic framework. We found that AS patterns have evolved under weak stabilizing selection that allows widespread variability in AS patterns across species, with only about 5% of the genes probably encoding AS isoforms with dif erent functions. Rates of neutral evolution are high, preventing the identi cation of adaptive changes at this long evolutionary scale. In summary, this thesis provides new computational tools and knowledge about the evolution and regulation of AS in diferent biological conditions and helps to better understand its relevance from diferent persepectives

    Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

    Get PDF
    Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

    Inverse Problems in data-driven multi-scale Systems Medicine: application to cancer physiology

    Get PDF
    Systems Medicine is an interdisciplinary framework involving reciprocal feedback between clinical investigation and mathematical modeling/analysis. Its aim is to improve the understanding of complex diseases by integrating knowledge and data across multiple levels of biological organization. This Thesis focuses on three inverse problems, arising from three kinds of data and related to cancer physiology, at different scales: tissues, cells, molecules. The general assumption of this piece of research is that cancer is associated toa path ological glucose consumption and, in fact, its functional behavior can be assessed by nuclear medicine experiments using [18F]-fluorodeoxyglucose (FDG) as a radioactive tracer mimicking the glucose properties. At tissue-scale, this Thesis considers the Positron Emission Tomography (PET) imaging technique, and deals with two distinct issues within compartmental analysis. First, this Thesis presents a compartmental approach, referred to as reference tissue model, for the estimation of FDG kinetics inside cancer tissues when the arterial blood input of the system is unknown. Then, this Thesis proposes an efficient and reliable method for recovering the compartmental kinetic parameters for each PET image pixel in the context of parametric imaging, exploiting information on the tissue physiology. Standard models in compartmental analysis assume that phosphorylation and dephosphorylation of FDG occur in the same intracellular cytosolic volume. Advances in cell biochemistry have shown that the appropriate location of dephosphorylation is the endoplasmic reticulum (ER). Therefore, at cell-scale, this Thesis formalizes a biochemically-driven compartmental model accounting for the specific role played by the ER, and applies it to the analysis of in vitro experiments on FDG uptake by cancer cell cultures obtained with a LigandTracer (LT) device. Finally, at molecule-scale, this Thesis provides a preliminary mathematical investigation of a chemical reaction network (CRN), represented by a huge Molecular Interaction Map (MIM), describing the biochemical interactions occurring between signaling proteins in specific pathways within a cancer cell. The main issue addressed in this case is the network parameterization problem, i.e. how to determine the reaction rate coefficients from protein concentration data

    Multimodal Biomedical Data Visualization: Enhancing Network, Clinical, and Image Data Depiction

    Get PDF
    In this dissertation, we present visual analytics tools for several biomedical applications. Our research spans three types of biomedical data: reaction networks, longitudinal multidimensional clinical data, and biomedical images. For each data type, we present intuitive visual representations and efficient data exploration methods to facilitate visual knowledge discovery. Rule-based simulation has been used for studying complex protein interactions. In a rule-based model, the relationships of interacting proteins can be represented as a network. Nevertheless, understanding and validating the intended behaviors in large network models are ineffective and error prone. We have developed a tool that first shows a network overview with concise visual representations and then shows relevant rule-specific details on demand. This strategy significantly improves visualization comprehensibility and disentangles the complex protein-protein relationships by showing them selectively alongside the global context of the network. Next, we present a tool for analyzing longitudinal multidimensional clinical datasets, that we developed for understanding Parkinson's disease progression. Detecting patterns involving multiple time-varying variables is especially challenging for clinical data. Conventional computational techniques, such as cluster analysis and dimension reduction, do not always generate interpretable, actionable results. Using our tool, users can select and compare patient subgroups by filtering patients with multiple symptoms simultaneously and interactively. Unlike conventional visualizations that use local features, many targets in biomedical images are characterized by high-level features. We present our research characterizing such high-level features through multiscale texture segmentation and deep-learning strategies. First, we present an efficient hierarchical texture segmentation approach that scales up well to gigapixel images to colorize electron microscopy (EM) images. This enhances visual comprehensibility of gigapixel EM images across a wide range of scales. Second, we use convolutional neural networks (CNNs) to automatically derive high-level features that distinguish cell states in live-cell imagery and voxel types in 3D EM volumes. In addition, we present a CNN-based 3D segmentation method for biomedical volume datasets with limited training samples. We use factorized convolutions and feature-level augmentations to improve model generalization and avoid overfitting

    Modularity-based approaches to community detection in multilayer networks with applications toward precision medicine

    Get PDF
    Networks have become an important tool for the analysis of complex systems across many different disciplines including computer science, biology, chemistry, social sciences, and importantly, cancer medicine. Networks in the real world typically exhibit many forms of higher order organization. The subfield of networks analysis known as community detection aims to provide tools for discovering and interpreting the global structure of a networks-based on the connectivity patterns of its edges. In this thesis, we provide an overview of the methods for community detection in networks with an emphasis on modularity-based approaches. We discuss several caveats and drawbacks of currently available methods. We also review the success that network analyses have had in interpreting large scale 'omics' data in the context of cancer biology. In the second and third chapters, we present CHAMP and multimodbp, two useful community detection tools that seek to overcome several of the deficiencies in modularity-based community detection. In the final chapter, we develop a networks-based significance test for addressing an important question in the field of oncology: are mutations in DNA damage repair genes associated with elevated levels of tumor mutational burden. We apply the tools of network analysis to this question and showcase how this approach yields new insight into the structure of the problem, revealing what we call the TMB Paradox. We close by demonstrating the clinical utility of our findings in predicting patient response to novel immunotherapies.Doctor of Philosoph

    A Review of the Role of Causality in Developing Trustworthy AI Systems

    Full text link
    State-of-the-art AI models largely lack an understanding of the cause-effect relationship that governs human understanding of the real world. Consequently, these models do not generalize to unseen data, often produce unfair results, and are difficult to interpret. This has led to efforts to improve the trustworthiness aspects of AI models. Recently, causal modeling and inference methods have emerged as powerful tools. This review aims to provide the reader with an overview of causal methods that have been developed to improve the trustworthiness of AI models. We hope that our contribution will motivate future research on causality-based solutions for trustworthy AI.Comment: 55 pages, 8 figures. Under revie
    corecore