29 research outputs found

    UCNEbase—a database of ultraconserved non-coding elements and genomic regulatory blocks

    Get PDF
    UCNEbase (http://ccg.vital-it.ch/UCNEbase) is a free, web-accessible information resource on the evolution and genomic organization of ultra-conserved non-coding elements (UCNEs). It currently covers 4351 such elements in 18 different species. The majority of UCNEs are supposed to be transcriptional regulators of key developmental genes. As most of them occur as clusters near potential target genes, the database is organized along two hierarchical levels: individual UCNEs and ultra-conserved genomic regulatory blocks (UGRBs). UCNEbase introduces a coherent nomenclature for UCNEs reflecting their respective associations with likely target genes. Orthologous and paralogous UCNEs share components of their names and are systematically cross-linked. Detailed synteny maps between the human and other genomes are provided for all UGRBs. UCNEbase is managed by a relational database system and can be accessed by a variety of web-based query pages. As it relies on the UCSC genome browser as visualization platform, a large part of its data content is also available as browser viewable custom track files. UCNEbase is potentially useful to any computational, experimental or evolutionary biologist interested in conserved non-coding DNA elements in vertebrate

    An Evolutionary Cancer Epigenetic Approach Revealed DNA Hypermethylation of Ultra-Conserved Non-Coding Elements in Squamous Cell Carcinoma of Different Mammalian Species

    Get PDF
    BACKGROUND: Ultra-conserved non-coding elements (UCNEs) are genomic sequences that exhibit > 95% sequence identity between humans, mammals, birds, reptiles, and fish. Recent findings reported their functional role in cancer. The aim of this study was to evaluate the DNA methylation modifications of UNCEs in squamous cell carcinoma (SCC) from different mammal species. METHODS: Fifty SCCs from 26 humans, 17 cats, 3 dogs, 1 horse, 1 bovine, 1 badger, and 1 porcupine were investigated. Fourteen feline stomatitis and normal samples from 36 healthy human donors, 7 cats, 5 dogs, 5 horses, 2 bovines and 1 badger were collected as normal controls. Bisulfite next generation sequencing evaluated the DNA methylation level from seven UCNEs (uc.160, uc.283, uc.416, uc.339, uc.270, uc.299, and uc.328). RESULTS: 57/59 CpGs were significantly different according to the Kruskal-Wallis test (p < 0.05) comparing normal samples with SCC. A common DNA hypermethylation pattern was observed in SCCs from all the species evaluated in this study, with an increasing trend of hypermethylation starting from normal mucosa, through stomatitis to SCC. CONCLUSIONS: Our findings indicate that UCNEs are hypermethylated in human SCC, and this behavior is also conserved among different species of mammals

    Allele frequencies of variants in Ultra Conserved Elements identify selective pressure on transcription factor binding

    Get PDF
    Ultra-conserved genes or elements (UCGs/UCEs) in the human genome are extreme examples of conservation. We characterized natural variations in 2884 UCEs and UCGs in two distinct populations ; Singaporean Chinese (n=280) and Italian (n=501) by using a pooled sample, targeted capture, sequencing approach. We identify, with high confidence, in these regions the abundance of rare SNVs (MAF<0.5%) of which 75% is not present in dbSNP137. UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies. By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants. Moreover, prevalent variants are less likely to overlap transcription factor binding site. Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation. All together, these results suggest UCEs are not under selective pressure as a stretch of DNA but are under differential evolutionary pressure on the single nucleotide level

    ANGIOGENES: knowledge database for protein-coding and noncoding RNA genes in endothelial cells

    Get PDF
    Increasing evidence indicates the presence of long noncoding RNAs (lncRNAs) is specific to various cell types. Although lncRNAs are speculated to be more numerous than protein-coding genes, the annotations of lncRNAs remain primitive due to the lack of well-structured schemes for their identification and description. Here, we introduce a new knowledge database "ANGIOGENES" (http://angiogenes.uni-frankfurt.de) to allow for in silico screening of protein-coding genes and lncRNAs expressed in various types of endothelial cells, which are present in all tissues. Using the latest annotations of protein-coding genes and lncRNAs, publicly-available RNA-seq data was analyzed to identify transcripts that are expressed in endothelial cells of human, mouse and zebrafish. The analyzed data were incorporated into ANGIOGENES to provide a one-stop-shop for transcriptomics data to facilitate further biological validation. ANGIOGENES is an intuitive and easy-to-use database to allow in silico screening of expressed, enriched and/or specific endothelial transcripts under various conditions. We anticipate that ANGIOGENES serves as a starting point for functional studies to elucidate the roles of protein-coding genes and lncRNAs in angiogenesis

    Computational genomics of regulatory elements and regulatory territories

    Get PDF
    Whole genome comparison of metazoan genomes reveals extremely high level of noncoding conservation over tens to hundreds of base pairs across distant species. These sequences are termed as conserved noncoding elements (CNEs). Arrays of conserved noncoding elements that span the loci of developmental regulatory genes and their span defines regulatory genomic blocks (GRBs). CNEs are currently known to be involved in transcriptional regulation and development as long-range enhancers. However, no molecular mechanism can yet explain their exceptional degree of conservation. As a first step towards the genome-wide study of these elements, I developed two R/Bioconductor packages CNEr and TFBSTools, to detect and analyse regulatory elements. Next, I designed a novel CNE detection pipeline for duplicated regions in the ameiotic Adineta vaga genome. Identification of CNEs in this genome suggests that the principal function of CNEs is regulation of developmental gene expression rather than copy number sensing. In addition, I performed a de novo genome annotation of European common carp Cyprinus carpio. This genome stands as an ideal candidate for comparative study of zebrafish genome. Its analysis revealed a wealth of previously undetected fish regulatory elements and their unexpectedly high level of conservation between the two genomes. Finally, I presented a computational method for the identification of GRB boundaries and prediction of the corresponding target genes under long-range regulation. The predicted target genes are implicated in developmental, transcriptional regulation and axon guidance. The disruption of regulation of these target genes is likely to cause complex diseases, including cancer. The GRB boundaries and predicted target genes are valuable resource for investigating developmental regulation and interpreting genome-wide association studies.Open Acces

    Classification of selectively constrained DNA elements using feature vectors and rule-based classifiers

    Get PDF
    Scarce work has been done in the analysis of the composition of conserved non-coding elements (CNEs) that are identified by comparisons of two or more genomes and are found to exist in all metazoan genomes. Here we present the analysis of CNEs with a methodology that takes into account word occurrence at various lengths scales in the form of feature vector representation and rule based classifiers. We implement our approach on both protein-coding exons and CNEs, originating from human, insect (Drosophila melanogaster) and worm (Caenorhabditis elegans) genomes, that are either identified in the present study or obtained from the literature. Alignment free feature vector representation of sequences combined with rule-based classification methods leads to successful classification of the different CNEs classes. Biologically meaningful results are derived by comparison with the genomic signatures approach, and classification rates for a variety of functional elements of the genomes along with surrogates are presented. (C) 2014 Elsevier Inc. All rights reserved


    Get PDF
    In this dissertation, we worked on several algorithmic problems in bioinformatics using mainly three approaches: (a) a streaming model, (b) sux-tree based indexing, and (c) minwise-hashing (minhash) and locality-sensitive hashing (LSH). The streaming models are useful for large data problems where a good approximation needs to be achieved with limited space usage. We developed an approximation algorithm (Kmer-Estimate) using the streaming approach to obtain a better estimation of the frequency of k-mer counts. A k-mer, a subsequence of length k, plays an important role in many bioinformatics analyses such as genome distance estimation. We also developed new methods that use sux tree, a trie data structure, for alignment-free, non-pairwise algorithms for a conserved non-coding sequence (CNS) identification problem. We provided two different algorithms: STAG-CNS to identify exact-matched CNSs and DiCE to identify CNSs with mismatches. Using our algorithms, CNSs among various grass species were identified. A different approach was employed for identification of longer CNSs ( 100 bp, mostly found in animals). In our new method (MinCNE), the minhash approach was used to estimate the Jaccard similarity. Using also LSH, k-mers extracted from genomic sequences were clustered and CNSs were identified. Another new algorithm (MinIsoClust) that also uses minhash and LSH techniques was developed for an isoform clustering problem. Isoforms are generated from the same gene but by alternative splicing. As the isoform sequences share some exons but in different combinations, regular sequencing clustering methods do not work well. Our algorithm generates clusters for isoform sequences based on their shared minhash signatures. Finally, we discuss de novo transcriptome assembly algorithms and how to improve the assembly accuracy using ensemble approaches. First, we did a comprehensive performance analysis on different transcriptome assemblers using simulated benchmark datasets. Then, we developed a new ensemble approach (Minsemble) for the de novo transcriptome assembly problem that integrates isoform-clustering using minhash technique to identify potentially correct transcripts from various de novo transcriptome assemblers. Minsemble identified more correctly assembled transcripts as well as genes compared to other de novo and ensemble methods. Adviser: Jitender S. Deogu

    Next-Generation Sequencing — An Overview of the History, Tools, and “Omic” Applications

    Get PDF
    Next-generation sequencing (NGS) technologies using DNA, RNA, or methylation sequencing have impacted enormously on the life sciences. NGS is the choice for large-scale genomic and transcriptomic sequencing because of the high-throughput production and outputs of sequencing data in the gigabase range per instrument run and the lower cost compared to the traditional Sanger first-generation sequencing method. The vast amounts of data generated by NGS have broadened our understanding of structural and functional genomics through the concepts of “omics” ranging from basic genomics to integrated systeomics, providing new insight into the workings and meaning of genetic conservation and diversity of living things. NGS today is more than ever about how different organisms use genetic information and molecular biology to survive and reproduce with and without mutations, disease, and diversity within their population networks and changing environments. In this chapter, the advances, applications, and challenges of NGS are reviewed starting with a history of first-generation sequencing followed by the major NGS platforms, the bioinformatics issues confronting NGS data storage and analysis, and the impacts made in the fields of genetics, biology, agriculture, and medicine in the brave, new world of ”omics.

    A quantitative-PCR based method to estimate ranavirus viral load following normalisation by reference to an ultraconserved vertebrate target

    Get PDF
    Ranaviruses are important pathogens of amphibians, reptiles and fish. To meet the need for an analytical method for generating normalised and comparable infection data for these diverse host species, two standard-curve based quantitative-PCR (qPCR) assays were developed enabling viral load estimation across these host groups. A viral qPCR targeting the major capsid protein (MCP) gene was developed which was specific to amphibian-associated ranaviruses with high analytical sensitivity (lower limit of detection: 4.23 plasmid standard copies per reaction) and high reproducibility across a wide dynamic range (coefficient of variation below 3.82% from 3 to 3 Ă— 108 standard copies per reaction). The comparative sensitivity of the viral qPCR was 100% (n = 78) based on agreement with an established end-point PCR. Comparative specificity with the end-point PCR was also 100% (n = 94) using samples from sites with no history of ranavirus infection. To normalise viral quantities, a host qPCR was developed which targeted a single-copy, ultra-conserved non-coding element (UCNE) of vertebrates. Viral and host qPCRs were applied to track ranavirus growth in culture. The two assays offer a robust approach to viral load estimation and the host qPCR can be paired with assays targeting other pathogens to study infection burdens