Search CORE

UNT Digital Library

Multiple whole genome alignments and novel biomedical applications at the VISTA portal

Author: Brudno Michael
Dubchak Inna
Minovitsky Simon
Poliakov Alexander
Ratnere Igor
Publication venue: Oxford University Press
Publication date: 01/02/2007
Field of study

The VISTA portal for comparative genomics is designed to give biomedical scientists a unified set of tools to lead them from the raw DNA sequences through the alignment and annotation to the visualization of the results. The VISTA portal also hosts the alignments of a number of genomes computed by our group, allowing users to study the regions of their interest without having to manually download the individual sequences. Here we describe various algorithmic and functional improvements implemented in the VISTA portal over the last 2 years. The VISTA Portal is accessible at http://genome.lbl.gov/vista

UNT Digital Library

Short sequence motifs, overrepresented in mammalian conserved non-coding sequences

Author: Dubchak Inna
Kel Alexander
Kondrashov Alexey S
Minovitsky Simon
Stegmaier Philip
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. Results We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments <it>vs</it>. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. Conclusion Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong.</p

Directory of Open Access Journals

UNT Digital Library

Deep Blue Documents

RegPrecise web services interface: programmatic access to the transcriptional regulatory interactions in bacteria reconstructed by comparative genomics.

Author: Arkin Adam P
Brettin Thomas S
Dehal Paramvir S
Dubchak Inna
Novichkov Pavel S
Novichkova Elena S
Rodionov Dmitry A
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Web services application programming interface (API) was developed to provide a programmatic access to the regulatory interactions accumulated in the RegPrecise database (http://regprecise.lbl.gov), a core resource on transcriptional regulation for the microbial domain of the Department of Energy (DOE) Systems Biology Knowledgebase. RegPrecise captures and visualize regulogs, sets of genes controlled by orthologous regulators in several closely related bacterial genomes, that were reconstructed by comparative genomics. The current release of RegPrecise 2.0 includes >1400 regulogs controlled either by protein transcription factors or by conserved ribonucleic acid regulatory motifs in >250 genomes from 24 taxonomic groups of bacteria. The reference regulons accumulated in RegPrecise can serve as a basis for automatic annotation of regulatory interactions in newly sequenced genomes. The developed API provides an efficient access to the RegPrecise data by a comprehensive set of 14 web service resources. The RegPrecise web services API is freely accessible at http://regprecise.lbl.gov/RegPrecise/services.jsp with no login requirements

CiteSeerX

The splicing regulatory element, UGCAUG, is phylogenetically and spatially conserved in introns that flank tissue-specific alternative exons

Author: Conboy John G.
Dubchak Inna
Gee Sherry L.
Minovitsky Simon
Schokrpur Shiruyeh
Publication venue: Oxford University Press
Publication date: 03/02/2005
Field of study

Previous studies have identified UGCAUG as an intron splicing enhancer that is frequently located adjacent to tissue-specific alternative exons in the human genome. Here, we show that UGCAUG is phylogenetically and spatially conserved in introns that flank brain-enriched alternative exons from fish to man. Analysis of sequence from the mouse, rat, dog, chicken and pufferfish genomes revealed a strongly statistically significant association of UGCAUG with the proximal intron region downstream of brain-enriched alternative exons. The number, position and sequence context of intronic UGCAUG elements were highly conserved among mammals and in chicken, but more divergent in fish. Control datasets, including constitutive exons and non-tissue-specific alternative exons, exhibited a much lower incidence of closely linked UGCAUG elements. We propose that the high sequence specificity of the UGCAUG element, and its unique association with tissue-specific alternative exons, mark it as a critical component of splicing switch mechanism(s) designed to activate a limited repertoire of splicing events in cell type-specific patterns. We further speculate that highly conserved UGCAUG-binding protein(s) related to the recently described Fox-1 splicing factor play a critical role in mediating this specificity

arXiv.org e-Print Archive

benchNGS : An approach to benchmark short reads alignment tools

Author: Alexandrov Nickolai
Dubchak Inna
Hassan Mehedi
Kryshchenko Alona
Rahman Farzana
Tatarinova Tatiana V.
Publication venue
Publication date: 24/04/2015
Field of study

In the last decade a number of algorithms and associated software have been developed to align next generation sequencing (NGS) reads with relevant reference genomes. The accuracy of these programs may vary significantly, especially when the NGS reads are quite different from the available reference genome. We propose a benchmark to assess accuracy of short reads mapping based on the pre-computed global alignment of related genome sequences. In this paper we propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. We outline the method and also present a short report of an experiment performed on five popular alignment tools based on the pairwise alignments of Escherichia coli O6 CFT073 genome with genomes of seven other bacteria.Comment: 1 figur

Kingston University Research Repository

Extensive parallelism in protein evolution

Author: Bazykin Georgii A
Brudno Michael
Dubchak Inna
Kondrashov Alexey S
Kondrashov Fyodor A
Poliakov Alexander
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens

University of Toronto Research Repository

CiteSeerX

Directory of Open Access Journals

Functionally conserved enhancers with divergent sequences in distant vertebrates

Author: Alexander Poliakov
Dario Boffelli
Inna Dubchak
Nadav Ahituv
Nir Oksenberg
Sachiko Takayama
Seok-Jin Heo
Song Yang
Publication venue: Springer Nature
Publication date: 01/10/2015
Field of study

Conserved transcription factor binding motifs in the five zebrafish/mouse syntenic enhancers. Identical n-mers (n âĽ 7) identified in the zebrafish, mouse, and human sequences of the five syntenic CNS were examined for the presence of transcription factor binding motifs; only motifs with E-value E â¤ 0.1 are shown. (XLSX 15 kb

FigShare

SNP-VISTA: An interactive SNP visualization tool

Author: Dubchak Inna L
Hamann Bernd
Hugenholtz Philip
Minovitsky Simon
Pennacchio Len A
Shah Nameeta
Teplitsky Michael V
Publication venue: BioMed Central
Publication date: 01/12/2005
Field of study

BACKGROUND: Recent advances in sequencing technologies promise to provide a better understanding of the genetics of human disease as well as the evolution of microbial populations. Single Nucleotide Polymorphisms (SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it has become possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease in an attempt to identify causative mutations. In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples enables more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at [1]. RESULTS: We have developed and present two modifications of an interactive visualization tool, SNP-VISTA, to aid in the analyses of the following types of data: A. Large-scale re-sequence data of disease-related genes for discovery of associated and/or causative alleles (GeneSNP-VISTA). B. Massive amounts of ecogenomics data for studying homologous recombination in microbial populations (EcoSNP-VISTA). The main features and capabilities of SNP-VISTA are: 1) mapping of SNPs to gene structure; 2) classification of SNPs, based on their location in the gene, frequency of occurrence in samples and allele composition; 3) clustering, based on user-defined subsets of SNPs, highlighting haplotypes as well as recombinant sequences; 4) integration of protein evolutionary conservation visualization; and 5) display of automatically calculated recombination points that are user-editable. CONCLUSION: The main strength of SNP-VISTA is its graphical interface and use of visual representations, which support interactive exploration and hence better understanding of large-scale SNP data by the user

Directory of Open Access Journals