15 research outputs found

    Functional profiling of genome-scale experiments: new approaches leading to a systemic analysis

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 31-10-200

    Laskennallisia menetelmiä säilyneiden geenisäätelyelementtien analyysiin ja paikallistamiseen DNA:sta

    Get PDF
    This thesis presents methods for locating and analyzing cis-regulatory DNA elements involved with the regulation of gene expression in multicellular organisms. The regulation of gene expression is carried out by the combined effort of several transcription factor proteins collectively binding the DNA on the cis-regulatory elements. Only sparse knowledge of the 'genetic code' of these elements exists today. An automatic tool for discovery of putative cis-regulatory elements could help their experimental analysis, which would result in a more detailed view of the cis-regulatory element structure and function. We have developed a computational model for the evolutionary conservation of cis-regulatory elements. The elements are modeled as evolutionarily conserved clusters of sequence-specific transcription factor binding sites. We give an efficient dynamic programming algorithm that locates the putative cis-regulatory elements and scores them according to the conservation model. A notable proportion of the high-scoring DNA sequences show transcriptional enhancer activity in transgenic mouse embryos. The conservation model includes four parameters whose optimal values are estimated with simulated annealing. With good parameter values the model discriminates well between the DNA sequences with evolutionarily conserved cis-regulatory elements and the DNA sequences that have evolved neutrally. In further inquiry, the set of highest scoring putative cis-regulatory elements were found to be sensitive to small variations in the parameter values. The statistical significance of the putative cis-regulatory elements is estimated with the Two Component Extreme Value Distribution. The p-values grade the conservation of the cis-regulatory elements above the neutral expectation. The parameter values for the distribution are estimated by simulating the neutral DNA evolution. The conservation of the transcription factor binding sites can be used in the upstream analysis of regulatory interactions. This approach may provide mechanistic insight to the transcription level data from, e.g., microarray experiments. Here we give a method to predict shared transcriptional regulators for a set of co-expressed genes. The EEL (Enhancer Element Locator) software implements the method for locating putative cis-regulatory elements. The software facilitates both interactive use and distributed batch processing. We have used it to analyze the non-coding regions around all human genes with respect to the orthologous regions in various other species including mouse. The data from these genome-wide analyzes is stored in a relational database which is used in the publicly available web services for upstream analysis and visualization of the putative cis-regulatory elements in the human genome.Kun ihmisen genomi saatiin sekvensoitua eli ihmisen geenit oli löydetty ja eritelty vuosituhannen alussa, tiedemiehet yllättyivät ihmisen geenien pienestä määrästä. Ihmisellä havaittiin olevan vain vähän enemmän geenejä kuin yksinkertaisella sukkulamadolla. Koska geenien lukumäärä ei pystykään selittämään ihmisen ja sukkulamadon ulkoisia eroavaisuuksia, selitystä ruvettiin etsimään geenien toiminnan eroista. Geenien toimintaa säädellään monisoluisissa eliöissä hyvin tarkasti tiettyyn paikkaan ja tiettyyn osaan ruumista. Tietyt proteiinit toteuttavat geenien säätelyä sitoutumalla tiettyihin kohtiin DNA:ta säädeltävän geenin läheisyydessä. Näiden DNA:han sitoutumiskohtien löytäminen genomista on kokeellisesti hyvin haastavaa: ne saattavat sijaita hyvin kaukana säädeltävästä geenistä eikä proteiinien sitoutumissääntöjä tunneta vielä kovin hyvin. Väitöstyössä on kehitetty laskennallisia menetelmiä geenisäätelyyn liittyvien DNA sitoutumiskohtien paikantamiseen eri nisäkkäiden genomeja vertailemalla. Esimerkiksi ihmisen ja hiiren genomeja vertailemalla voidaan paikantaa DNA:n pätkiä, jotka ovat olleet hiirien ja ihmisten viimeisessä yhteisessä esivanhemmassa noin 65 miljoonaa vuotta sitten ja lisäksi vaikuttavat mahdollisilta proteiinien sitoutumiskohdilta. Tällaisia mahdollisia DNA:han sitoutumiskohtia on löydetty ihmisen genomista tuhansia, ja osan niistä on kokeellisesti havaittu säätelevän lähellä sijaitsevaa geeniä. Sitoutumiskohtien analysointiin kehitettiin väitöstutkimuksessa menetelmä, jolla voidaan ennustaa geenijoukoille säätelyproteiineja. Nykyaikaiset tehoseulontamenetelmät löytävät nopeasti geenijoukkoja, joilla on jokin kiinnostava ominaisuus, jonka säätelystä ollaan kiinnostuneita. Kehitetyllä menetelmällä voidaan helposti ennustaa esimerkiksi tiettyyn sairauteen liittyvien geenien säätelijä. Kun mahdollinen säätelijäproteiini tunnetaan, sitä vastaan voidaan kehittää lääke. Työn tulokset antavat uusia menetelmiä erityisesti vaikeasti tutkittavien yksilönkehityksen aikana säädeltyjen geenien analyysiin. Kehitettyjen menetelmien lääketieteelliset sovellukset liittyvät esimerkiksi kudosspesifiin kasvun säätelyyn ja syöpägeenien kasvainspesifisyyteen. Nämä sovellukset pyrkivät selvittämään mm. syytä ihmisen suhteettoman suurille aivoille ja pienille lihaksille ja toisaalta pyrkivät avaamaan uusia lähestymistapoja esimerkiksi syövän diagnostiikkaan ja hoitoon

    Services for biological network feature detection

    Get PDF
    The complex environment of a living cell contains many molecules interacting in a variety of ways. Examples include the physical interaction between two proteins, or the biochemical interaction between an enzyme and its substrate. A challenge of systems biology is to understand the network of interactions between biological molecules, derived experimentally or computationally. Sophisticated dynamic modelling approaches provide detailed knowledge about single processes or individual pathways. However such methods are far less tractable for holistic cellular models, which are instead represented at the level of network topology. Current network analysis packages tend to be standalone desktop tools which rely on local resources and whose operations are not easily integrated with other software and databases. A key contribution of this thesis is an extensible toolkit of biological network construction and analysis operations, developed as web services. Web services are a distributed technology that enable machine-to-machine interaction over a network, and promote interoperability by allowing tools deployed on heterogeneous systems to interface. A conceptual framework has been created, which is realised practically through the proposal of a common graph format to standardise network data, and the investigation of open-source deployment technologies. Workflows are a graph of web services, allowing analyses to be carried out as part of a bigger software pipeline. They may be constructed using web services within the toolkit together with those from other providers, and can be saved, shared and reused, allowing biologists to construct their own complex queries over various tools and datasets, or execute pre-constructed workflows designed by expert bioinformaticians. Biologically relevant results have been produced as a result of this approach. One very interesting hypothesis has been generated regarding the regulation of yeast glycolysis by a protein found to interact with seven glycolytic enzymes. This has implied a potentially novel regulatory mechanism whereby the protein in question binds these enzymes to form an 'energy production unit'. Also of interest are workflows which identify termini (system inputs and outputs), and cycles, which are crucial for acquiring a physiological perspective on network behaviour

    27th Fungal Genetics Conference

    Get PDF
    Program and abstracts from the 27th Fungal Genetics Conference Asilomar, March 12-17, 2013

    27th Fungal Genetics Conference

    Get PDF
    Program and abstracts from the 27th Fungal Genetics Conference Asilomar, March 12-17, 2013

    Genetic mapping of metabolic biomarkers of cardiometabolic diseases

    Get PDF
    Cardiometabolic disorders (CMDs) are a major public health problem worldwide. The main goal of this thesis is to characterize the genetic architecture of CMD-related metabolites in a Lebanese cohort. In order to maximise the extraction of meaningful biological information from this dataset, an important part of this thesis focuses on the evaluation and subsequent improvement of the standard methods currently used for molecular epidemiology studies. First, I describe MetaboSignal, a novel network-based approach to explore the genetic regulation of the metabolome. Second, I comprehensively compare the recovery of metabolic information in the different 1H NMR strategies routinely used for metabolic profiling of plasma (standard 1D, spin-echo and JRES). Third, I describe a new method for dimensionality reduction of 1H NMR datasets prior to statistical modelling. Finally, I use all this methodological knowledge to search for molecular biomarkers of CMDs in a Lebanese population. Metabolome-wide association analyses identified a number of metabolites associated with CMDs, as well as several associations involving N-glycan units from acute-phase glycoproteins. Genetic mapping of these metabolites validated previously reported gene-metabolite associations, and revealed two novel loci associated with CMD-related metabolites. Collectively, this work contributes to the ongoing efforts to characterize the molecular mechanisms underlying complex human diseases.Open Acces

    Improved imbalanced classification through convex space learning

    Get PDF
    Imbalanced datasets for classification problems, characterised by unequal distribution of samples, are abundant in practical scenarios. Oversampling algorithms generate synthetic data to enrich classification performance for such datasets. In this thesis, I discuss two algorithms LoRAS & ProWRAS, improving on the state-of-the-art as shown through rigorous benchmarking on publicly available datasets. A biological application for detection of rare cell-types from single-cell transcriptomics data is also discussed. The thesis also provides a better theoretical understanding behind oversampling

    Identification of novel genes interacting with DVAP, the causative gene of ALS8 in humans

    Get PDF
    Amyotrophic lateral sclerosis (ALS) is a major neurodegenerative disease caused by the death of motor neurons leading to paralysis. Mechanisms underlying the pathogenesis of the disease remain unknown but with the identification of causative genes from ALS patients, some processes have been linked to the disease. One of these genes is VAPB, a highly conserved protein involved in lipid transfer, vesicle metabolism and synaptic morphology. We modeled in Drosophila the disease-linked P56S mutation (DVAP-P58S) and observed with the expression of this allele neurodegeneration in the eye and loss of motor performance. These phenotypes provide an excellent opportunity to use fly’s genetics to find novel genetic interactors of DVAP and understand ALS pathomechanism. Therefore, we carried out a large scale genetic screen by crossing the ALS model with a collection of P-element overexpression lines. After the analysis of 1183 lines, we obtained 71 modifier lines that suppress DVAP-induced neurodegeneration and 14 lines that enhance this phenotype, decreasing furthermore the eye size and viability of the offspring. To confirm that the effect of modifier lines was caused by a specific gene, we validated them with independent alleles of those genes. Using different sources, we were able to confirm the effect of 63 of the 85 modifiers, providing a strong confirmation of their effect. When we studied the effect of the modifier genes co-expressed with DVAP-P58S in the nervous system, we detected that 46 lines presented the same modifying effect in adult viability and 58 in the motor performance of the adult offspring. Considering the stronger readouts, we obtained 42 genes as novel high confidence DVAP genetic interactors. To understand furthermore the way they are affecting DVAP neurodegeneration, we carried out a series of bioinformatic analyses using Drosophila and human databases. Lipid droplets, vesicle metabolism and cell proliferation appear as the most important categories found in the screen, all processes conserved when analysed with human orthologs of the modifiers. Further characterisation of the endocytosis-linked modifier Rab5 and the predicted DVAP-interactors Rab7 and Rab11, showed that the suppression effect is not only confirmed in vivo but is also conserved in human tissue from ALS patients. These data validate our genetic screen and at the same time open novel opportunities to understand ALS mechanisms and find possible therapeutic targets

    Readings in Targeted Maximum Likelihood Estimation

    Get PDF
    This is a compilation of current and past work on targeted maximum likelihood estimation. It features the original targeted maximum likelihood learning paper as well as chapters on super (machine) learning using cross validation, randomized controlled trials, realistic individualized treatment rules in observational studies, biomarker discovery, case-control studies, and time-to-event outcomes with censored data, among others. We hope this collection is helpful to the interested reader and stimulates additional research in this important area
    corecore