23 research outputs found

    Reliable transfer of transcriptional gene regulatory networks between taxonomically related organisms

    Get PDF
    Baumbach J, Rahmann S, Tauch A. Reliable transfer of transcriptional gene regulatory networks between taxonomically related organisms. BMC Systems Biology. 2009;3(1):8.Background: Transcriptional regulation of gene activity is essential for any living organism. Transcription factors therefore recognize specific binding sites within the DNA to regulate the expression of particular target genes. The genome-scale reconstruction of the emerging regulatory networks is important for biotechnology and human medicine but cost-intensive, time-consuming, and impossible to perform for any species separately. By using bioinformatics methods one can partially transfer networks from well-studied model organisms to closely related species. However, the prediction quality is limited by the low level of evolutionary conservation of the transcription factor binding sites, even within organisms of the same genus. Results: Here we present an integrated bioinformatics workflow that assures the reliability of transferred gene regulatory networks. Our approach combines three methods that can be applied on a large-scale: re-assessment of annotated binding sites, subsequent binding site prediction, and homology detection. A gene regulatory interaction is considered to be conserved if (1) the transcription factor, (2) the adjusted binding site, and (3) the target gene are conserved. The power of the approach is demonstrated by transferring gene regulations from the model organism Corynebacterium glutamicum to the human pathogens C. diphtheriae, C. jeikeium, and the biotechnologically relevant C. efficiens. For these three organisms we identified reliable transcriptional regulations for similar to 40% of the common transcription factors, compared to similar to 5% for which knowledge was available before. Conclusion: Our results suggest that trustworthy genome-scale transfer of gene regulatory networks between organisms is feasible in general but still limited by the level of evolutionary conservation

    On the power and limits of evolutionary conservation—unraveling bacterial gene regulatory networks

    Get PDF
    The National Center for Biotechnology Information (NCBI) recently announced ‘1000 prokaryotic genomes are now completed and available in the Genome database’. The increasing trend will provide us with thousands of sequenced microbial organisms over the next years. However, this is only the first step in understanding how cells survive, reproduce and adapt their behavior while being exposed to changing environmental conditions. One major control mechanism is transcriptional gene regulation. Here, striking is the direct juxtaposition of the handful of bacterial model organisms to the 1000 prokaryotic genomes. Next-generation sequencing technologies will further widen this gap drastically. However, several computational approaches have proven to be helpful. The main idea is to use the known transcriptional regulatory network of reference organisms as template in order to unravel evolutionarily conserved gene regulations in newly sequenced species. This transfer essentially depends on the reliable identification of several types of conserved DNA sequences. We decompose this problem into three computational processes, review the state of the art and illustrate future perspectives

    Selected Works in Bioinformatics

    Get PDF
    This book consists of nine chapters covering a variety of bioinformatics subjects, ranging from database resources for protein allergens, unravelling genetic determinants of complex disorders, characterization and prediction of regulatory motifs, computational methods for identifying the best classifiers and key disease genes in large-scale transcriptomic and proteomic experiments, functional characterization of inherently unfolded proteins/regions, protein interaction networks and flexible protein-protein docking. The computational algorithms are in general presented in a way that is accessible to advanced undergraduate students, graduate students and researchers in molecular biology and genetics. The book should also serve as stepping stones for mathematicians, biostatisticians, and computational scientists to cross their academic boundaries into the dynamic and ever-expanding field of bioinformatics

    X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    Get PDF
    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4−/− mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases

    Active transitivity clustering of large-scale biomedical datasets

    Get PDF
    Clustering is a popular computational approach for partitioning data sets into groups of objects that share common traits. Due to recent advances in wet-lab technology, the amount of available biological data grows exponentially and increasingly poses problems in terms of computational complexity for current clustering approaches. In this thesis, we introduce two novel approaches, TransClustMV and ActiveTransClust, that enable the handling of large scale datasets by reducing the amount of required information drastically by means of exploiting missing values. Furthermore, there exists a plethora of different clustering tools and standards making it very difficult for researchers to choose the correct methods for a given problem. In order to clarify this multifarious field, we developed ClustEval which streamlines the clustering process and enables practitioners conducting large-scale cluster analyses in a standardized and bias-free manner. We conclude the thesis by demonstrating the power of clustering tools and the need for the previously developed methods by conducting real-world analyses. We transferred the regulatory network of E. coli K-12 to pathogenic EHEC organisms based on evolutionary conservation therefore avoiding tedious and potentially dangerous wet-lab experiments. In another example, we identify pathogenicity specific core genomes of actinobacteria in order to identify potential drug targets.Clustering ist ein populärer Ansatz um Datensätze in Gruppen ähnlicher Objekte zu partitionieren. Nicht zuletzt aufgrund der jüngsten Fortschritte in der Labortechnik wächst die Menge der biologischen Daten exponentiell und stellt zunehmend ein Problem für heutige Clusteralgorithmen dar. Im Rahmen dieser Arbeit stellen wir zwei neue Ansätze, TransClustMV und ActiveTransClust, vor die auch das Bearbeiten sehr großer Datensätze ermöglichen, indem sie den Umfang der benötigten Informationen drastisch reduzieren da fehlende Werte kompensiert werden können. Allein die schiere Vielfalt der vorhanden Cluster-Methoden und Standards stellt den Anwender darüber hinaus vor das Problem, den am besten geeigneten Algorithmus für das vorliegende Problem zu wählen. ClustEval wurde mit dem Ziel entwickelt, diese Unübersichtlichkeit zu beseitigen und gleichzeitig die Clusteranalyse zu vereinheitlichen und zu automatisieren um auch aufwendige Clusteranalysen zu realisieren. Abschließend demonstrieren wir die Nützlichkeit von Clustering anhand von realen Anwendungsfällen die darüber hinaus auch den Bedarf der zuvor entwickelten Methoden aufzeigen. Wir haben das genregulatorische Netzwerk von E. coli K-12 ohne langwierige und potentiell gefährliche Laborarbeit auf pathogene EHEC Stämme übertragen. In einem weiteren Beispiel bestimmen wir das pathogenitätsspeziefische „Kerngenom“ von Actinobakterien um potenzielle Angriffspunkte für Medikamente zu identifizieren

    High resolution DNA copy number analysis of constitutional chromosomal aberrations in human genomic disorders

    Get PDF
    About one to three percent of the human population is aflicted by mild to severe mental retardation, often in association with congenital abnormalities (MR/CA). These abnormalities in normal human morphogenesis may express themselves as subtle dysmorphic signs not causing any harm or present as severe disabling and life-threatening malformations such as congenital heart defects. It is well established that constitutional chromosomal aberrations are an important cause for MR/CA. The screening for such chromosomal rearrangements is done by widely used routine analysis of banded metaphase chromosomes (karyotyping). Given the limited resolution of such analyses (5-10 Mb), it was anticipated that a significant number of submicroscopic deletions or duplications (DNA copy number variations, CNV) were overlooked in patients with idiopathic mental retardation with or without congenital anomalies. This thesis represents one of the _rst exhaustive studies of this patient group using a new and more sensitive method for detection of CNVs. This technique, termed array comparative genomic hybridization (array CGH), allows the genome wide screening for submicroscopic aberrations in one single experiment. Array CGH uses reporter DNA molecules more or less evenly spread throughout the entire genome which are spotted or synthesized in an array on a glass slide. Each reporter is used to interrogate the DNA copy number of a specific genomic region through the competitive hybridization of differentially fluorescent labeled patient and control DNA. Together with the tedious optimalization of the technique, also a web based open source (MySQL) database platform was developed for the analysis and visualization of large amount of array CGH data (medgen.ugent.be/arrayCGHbase) (paper 6). A total of 140 carefully clinically selected patients with mental retardation and/or congenital abnormalities were analyzed for hidden chromosomal aberrations in a collaborative effort with the Center for Medical Genetics Leuven (KUL). This initial study together with a review of other published investigations, allowed for the first time to establish a reliable figure of the number of submicroscopic CNVs in this patient population. When excluding patients with subtelomeric imbalances which could be identified through FISH or MLPA analyses, array CGH still allowed to detect CNVs in an additional ~8% of patients (paper 2). A major challenge resulting from this new flow of information is the search and description of new microdeletion/microduplication syndromes. Although most CNVs seemed to be scattered across the entire genome we were able to describe a new microdeletion syndrome characterized by osteopoikilosis, mental retardation and short stature. This observation was facilitated through the identification of LEMD3 as the causal gene for osteopoikilosis, Buschke-Ollendorff syndrome (BOS) and melorheostosis in the 12q14.3 deleted interval and subsequent, the finding of two additional patients with a 12q14.3 microdeletion (paper 3). The present work also illustrates the possible contribution of array CGH in the delineation of the critical region for recurrent deletion syndromes. In this study we identified a small interstitial deletion on chromosome 18q12.3 in a patient with clinical features of the del(18)(q12.1q21.1) syndrome. We were able to delineate the critical region for this syndrome to an interval of 1.8 Mb, enabling hereby the determination of the crucial genes for this microdeletion syndrome (paper 4). This thesis also further illustrates the power of combined flow cytometry and array CGH for rapid identification of translocation breakpoints. Using this approach we were able to identify OPHN1 as the causal gene for the observed mental retardation and overgrowth in a girl with an apparent balanced t(X;9) translocation (paper 5). In conclusion, the presented work clearly illustrates several important applications of array CGH in the field of clinical cytogenetics. The use of this new performant methodology will greatly improve the diagnostic yield in patients with unexplained mental retardation, provide more insights into genotype-phenotype correlations and ultimately lead to the identification of the causal genes. Functional studies of these gene products will enhance our understanding of the genetic regulation in normal human morphogenesis, embryogenesis and brain functioning. Finally, it is my believe that implementation of array CGH will represent a major and perhaps last wave of innovation in cytogenetics, as the latter may become largely redundant. Ultimately and perhaps earlier than we can anticipate, sequencing of the whole genome of a patient may eventually emerge as the method of choice

    Application of massively parallel sequencing for the diagnosis of developmental and epileptic encephalopathies.

    Full text link
    Developmental and Epileptic Encephalopathies (DEE) are characterised by severe early-onset seizures and have poor developmental outcomes, significant co-morbidities, premature mortality and substantial psychosocial and economic impacts on families and society. Most DEE have a Mendelian genetic cause. Clarifying the underlying molecular diagnosis can permit targeted therapeutics and accurate genetic counselling. Massively parallel sequencing (MPS) has the potential to be a transformative technology for diagnostic evaluation of DEE. This study set out to (i) evaluate the diagnostic yield, cost-effectiveness and clinical utility of MPS panels and exome sequencing (ES), to guide clinical utilization, (ii) explore the potential of the more expensive but more comprehensive whole genome sequencing (WGS) to improve diagnostic yield for DEE,and, (iii) delineate novel genetic causes of DEE. Using a trio ES approach, we first analysed a cohort of 30 children with DEE with negative comprehensive first-tier testing. The diagnostic yield was 47% (14/30) and ES was demonstrated to be cost-effective when integrated into a diagnostic pathway for DEE. Next, we evaluated the role of WGS in diagnosis of DEE by analysis of 15 children remaining undiagnosed after the ES study and 15 additional children undiagnosed after MPS panel. In the 15 ES negative patients 8 additional diagnoses were made (cumulative diagnostic rate 73%) including 3 patients with complex or copy neutral structural variants. A diagnosis was confirmed or suspected in 11 of the 15 panel negative patients (73%) with several additional novel findings under evaluation. Most (n=10) were in genes excluded from the panel, 1 variant was missed by the panel for technical reasons. This work led to the identification and clinical characterisation of 4 novel causes of DEE: variants in KCNT2, ARV1, PUM1 and ATN1. Our study demonstrates ES or MPS gene panels are cost-effective for the diagnosis of DEE and that WGS further increases diagnostic yield. The importance of reanalysis and collaborative multidisciplinary evaluation of novel findings are demonstrated. Approaches to further optimise diagnostic yield and ways that genomic diagnoses can be leveraged to allow a targeted therapeutic and management approach are discussed
    corecore