Search CORE

Non-B DB: a database of predicted non-B DNA-forming motifs in mammalian genomes

Author: A. Bacolla
Akgun
B. T. Luke
Chandrasekhar
Collier
Courey
D'Angelo
Dayn
Glickman
Gordenin
Hill
Ho
Ho
Ho
J. R. Collins
K. H. Bruce
Kamenetskii
Kouzine
Krasilnikov
Kurahashi
Kuroda-Kawaguchi
Lange
Leonard
M. Yi
Mirkin
N. Volfovsky
R. M. Stephens
R. Z. Cer
Rich
Ristic
Rohs
Schroth
Sheridan
Singleton
Stajich
Stein
U. S. Mudunuri
Wang
Wells
Wittig
Zhang
Zhao
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Although the capability of DNA to form a variety of non-canonical (non-B) structures has long been recognized, the overall significance of these alternate conformations in biology has only recently become accepted en masse. In order to provide access to genome-wide locations of these classes of predicted structures, we have developed non-B DB, a database integrating annotations and analysis of non-B DNA-forming sequence motifs. The database provides the most complete list of alternative DNA structure predictions available, including Z-DNA motifs, quadruplex-forming motifs, inverted repeats, mirror repeats and direct repeats and their associated subsets of cruciforms, triplex and slipped structures, respectively. The database also contains motifs predicted to form static DNA bends, short tandem repeats and homo(purine•pyrimidine) tracts that have been associated with disease. The database has been built using the latest releases of the human, chimp, dog, macaque and mouse genomes, so that the results can be compared directly with other data sources. In order to make the data interpretable in a genomic context, features such as genes, single-nucleotide polymorphisms and repetitive elements (SINE, LINE, etc.) have also been incorporated. The database is accessed through query pages that produce results with links to the UCSC browser and a GBrowse-based genomic viewer. It is freely accessible at http://nonb.abcc.ncifcrf.gov

CiteSeerX

ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)

Author: AL Pedrosa
Antonio B de Miranda
B Clift
B Ewing
B Wickstead
CS Peacock
D Gordon
E Arner
EC Laurentino
ES Lander
EW Myers
G Fu
G Fu
IHGS Consortium
J Healy
J Jurka
J Wang
JD Thompson
K Reinert
K Swaminathan
Leonardo HF Gomes
M Margulies
Marcelo Alves-Ferreira
N Rodriguez
N Volfovsky
NM El-Sayed
P Rice
PA Pevzner
R Szklarczyk
RA Hoskins
S Kurtz
S Kurtz
SF Altschul
SM Sunkin
TD Otto
Thomas D Otto
Wim M Degrave
X Huang
Z Bao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen <it>Leishmania braziliensis</it>, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an <it>Escheria coli</it>. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a <it>L. braziliensis </it>GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the <it>E. coli </it>K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at <url>http://bioinfo.pdtis.fiocruz.br/ReRep/</url>.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Enlighten

How repetitive are genomes?

Author: A Faiella
AE Mirsky
B Haubold
Bernhard Haubold
CA Thomas Jn
D Gusfield
D Tautz
EA Bennett
EPC Rocha
G Achaz
International Human Genome Sequencing Consortium
J Liu
JI Jordan
JM Hancock
JM Hancock
L Zhou
LE Orgel
M Hofnung
MA Nóbrega
Mouse Genome Sequencing Consortium
N Volfovsky
OG Troyanskaya
R Development Core Team
RA Aras
Rat Genome Sequencing Consortium
RJ Britten
S Kurtz
SS Shapiro
The Chimpanzee Sequencing and Analysis Consortium
Thomas Wiehe
TR Gregory
WF Doolittle
Y Tian
YL Orlov
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Genome sequences vary strongly in their repetitiveness and the causes for this are still debated. Here we propose a novel measure of genome repetitiveness, the index of repetitiveness, I(r), which can be computed in time proportional to the length of the sequences analyzed. We apply it to 336 genomes from all three domains of life. RESULTS: The expected value of I(r )is zero for random sequences of any G/C content and greater than zero for sequences with excess repeats. We find that the I(r )of archaea is significantly smaller than that of eubacteria, which in turn is smaller than that of eukaryotes. Mouse chromosomes have a significantly higher I(r )than human chromosomes and within each genome the Y chromosome is most repetitive. A sliding window analysis reveals that the human HOXA cluster and two surrounding genes are characterized by local minima in I(r). A program for calculating the I(r )is freely available at . CONCLUSION: The general measure of DNA repetitiveness proposed in this paper can be efficiently computed on a genomic scale. This reveals a broad spectrum of repetitiveness among diverse genomes which agrees qualitatively with previous studies of repeat content. A sliding window analysis helps to analyze the intragenomic distribution of repeats

Springer - Publisher Connector

MPG.PuRe

Guanine Holes Are Prominent Targets for Mutation in Cancer and Inherited Disease

Author: Bacolla Albino
Ball Edward V.
Cer Regina Z.
Collins Jack R.
Cooper David N.
Donohue Duncan E.
Ivanic Joseph
Jain Aklank
Luke Brian T.
Mudunuri Uma S.
Stephens Robert M.
Temiz Nuri A.
Vasquez Karen M.
Volfovsky Natalia
Wang Guliang
Yi Ming
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Albino Bacolla, Guliang Wang, Aklank Jain, Karen M. Vasquez, Division of Pharmacology and Toxicology, The University of Texas at Austin, Dell Pediatric Research Institute, Austin, Texas, United States of AmericaAlbino Bacolla, Nuri A. Temiz, Ming Yi, Joseph Ivanic, Regina Z. Cer, Duncan E. Donohue, Uma S. Mudunuri, Natalia Volfovsky, Brian T. Luke, Robert M., Stephens, Jack R. Collins, Advanced Biomedical Computing Center, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of AmericaEdward V. Ball, David N. Cooper, Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United KingdomSingle base substitutions constitute the most frequent type of human gene mutation and are a leading cause of cancer and inherited disease. These alterations occur non-randomly in DNA, being strongly influenced by the local nucleotide sequence context. However, the molecular mechanisms underlying such sequence context-dependent mutagenesis are not fully understood. Using bioinformatics, computational and molecular modeling analyses, we have determined the frequencies of mutation at G•C bp in the context of all 64 5′-NGNN-3′ motifs that contain the mutation at the second position. Twenty-four datasets were employed, comprising >530,000 somatic single base substitutions from 21 cancer genomes, >77,000 germline single-base substitutions causing or associated with human inherited disease and 16.7 million benign germline single-nucleotide variants. In several cancer types, the number of mutated motifs correlated both with the free energies of base stacking and the energies required for abstracting an electron from the target guanines (ionization potentials). Similar correlations were also evident for the pathological missense and nonsense germline mutations, but only when the target guanines were located on the non-transcribed DNA strand. Likewise, pathogenic splicing mutations predominantly affected positions in which a purine was located on the non-transcribed DNA strand. Novel candidate driver mutations and tissue-specific mutational patterns were also identified in the cancer datasets. We conclude that electron transfer reactions within the DNA molecule contribute to sequence context-dependent mutagenesis, involving both somatic driver and passenger mutations in cancer, as well as germline alterations causing or associated with inherited disease.This work was supported by grants from the NIH (CA097175 and CA093729) to KMV, NCI/NIH contract HHSN261200800001E to AB and the Frederick National Laboratory for Cancer Research, and CBIIT/caBIG ISRCE yellow task #09-260 to the Frederick National Laboratory for Cancer Research. DNC and EVB received financial support from BIOBASE GmbH through a license agreement (for HGMD) with Cardiff University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.PharmacyEmail: [email protected]

CiteSeerX

Online Research @ Cardiff

Texas ScholarWorks

FigShare

A clade uniting the green algae Mesostigma viride and Chlorokybus atmophyticus represents the deepest branch of the Streptophyta in chloroplast genome-based phylogenies

Author: A Simon
AM Nedelcu
AS Bélanger
B Marin
C Lemieux
CC Chang
CE Rogers
CF Delwiche
Christian Otis
Claude Lemieux
D Bhattacharya
D Maddison
DE Soltis
DL Swofford
F Delsuc
G Glockner
G Tesler
GM Lokhorst
H Philippe
H Shimodaira
H Shimodaira
HA Schmidt
HS Yoon
HS Yoon
J Adachi
J Felsenstein
J Leebens-Mack
J Petersen
JC de Cambiaire
JC Hagopian
JF Pombert
JF Pombert
JF Pombert
JS Farris
JT Harper
K Bremer
KG Karol
KR Mattox
KV Kowallik
L Geitler
LA Lewis
LE Graham
M Melkonian
M Reith
M Steel
M Thollesson
M Turmel
M Turmel
M Turmel
M Turmel
M Turmel
M Turmel
M Turmel
M Turmel
MB Rogers
Monique Turmel
MV Puerta
N Ohta
N Volfovsky
NJ Patron
NM Fast
P Rice
PG Wolf
PJ Lockhart
PS Soltis
RA Andersen
RM McCourt
S Guindon
S Kurtz
S Stefanovic
S Watanabe
SE Douglas
T Cavalier-Smith
T Nishiyama
TR Bachvaroff
V Stirewalt
VV Goremykin
VV Goremykin
VV Goremykin
W Martin
W Martin
Y-L Qiu
YL Qiu
Z Cai
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: The Viridiplantae comprise two major phyla: the Streptophyta, containing the charophycean green algae and all land plants, and the Chlorophyta, containing the remaining green algae. Despite recent progress in unravelling phylogenetic relationships among major green plant lineages, problematic nodes still remain in the green tree of life. One of the major issues concerns the scaly biflagellate Mesostigma viride, which is either regarded as representing the earliest divergence of the Streptophyta or a separate lineage that diverged before the Chlorophyta and Streptophyta. Phylogenies based on chloroplast and mitochondrial genomes support the latter view. Because some green plant lineages are not represented in these phylogenies, sparse taxon sampling has been suspected to yield misleading topologies. Here, we describe the complete chloroplast DNA (cpDNA) sequence of the early-diverging charophycean alga Chlorokybus atmophyticus and present chloroplast genome-based phylogenies with an expanded taxon sampling. RESULTS: The 152,254 bp Chlorokybus cpDNA closely resembles its Mesostigma homologue at the gene content and gene order levels. Using various methods of phylogenetic inference, we analyzed amino acid and nucleotide data sets that were derived from 45 protein-coding genes common to the cpDNAs of 37 green algal/land plant taxa and eight non-green algae. Unexpectedly, all best trees recovered a robust clade uniting Chlorokybus and Mesostigma. In protein trees, this clade was sister to all streptophytes and chlorophytes and this placement received moderate support. In contrast, gene trees provided unequivocal support to the notion that the Mesostigma + Chlorokybus clade represents the earliest-diverging branch of the Streptophyta. Independent analyses of structural data (gene content and/or gene order) and of subsets of amino acid data progressively enriched in slow-evolving sites led us to conclude that the latter topology reflects the true organismal relationships. CONCLUSION: In disclosing a sister relationship between the Mesostigmatales and Chlorokybales, our study resolves the long-standing debate about the nature of the unicellular flagellated ancestors of land plants and alters significantly our concepts regarding the evolution of streptophyte algae. Moreover, in predicting a richer chloroplast gene repertoire than previously inferred for the common ancestor of all streptophytes, our study has contributed to a better understanding of chloroplast genome evolution in the Viridiplantae

Springer - Publisher Connector

Lessons from non-canonical splicing

Author: A Ameur
A Bachmayr-Heyda
A Corvelo
A Dhir
A Dobin
A Ferlini
A Goyenvalle
A Goyenvalle
A Ivanov
A McPherson
A Rybak-Wolf
AG Matera
AG Sowalsky
AP de Koning
AR Grosso
AR Hatton
B Brouha
B Raj
BL Robberson
C DeBoever
C Feschotte
C Schindewolf
C Trapnell
CA Maher
CD Malone
CE Nelson
Christopher R. Sibley
CJ McManus
CM Koh
CR Sibley
CR Sibley
CS Wu
D Bergeron
D Liang
D Meili
D Merico
DA Bitton
DL Black
DS Rickman
E Buratti
E Daguenet
F Jacob
F Pagani
F Qin
F Supek
FM Menzies
G McClorey
GE Crooks
GE Parada
H Chao
H Dvinge
H He
H Jung
H Keren
H Li
H Ner-Gaon
H Suzuki
H Uchikawa
H Yoshida
H Yuan
HY Xiong
I Pulyakhina
I Vorechovsky
J Argente
J Konig
J Konig
J Li
J Salzman
J Ule
J Wu
Jernej Ule
JJ Wong
JK Pickrell
JM Burnette
JM Nigro
JO Ilagan
JP Ling
JU Guo
JZ Ni
K Greer
K Jividen
K Milde-Langosch
K Sathasivam
K Yap
K Yoshida
K Zarnack
L Blazquez
L Davidson
L De Conti
L Gorman
L Herzel
L Szabo
L Xu
LF Lareau
LL Chen
Lorea Blazquez
M Cowley
M Danan
M Dergai
M Gabler
M Irimia
M Jangi
M Jangi
M Jens
M Kellis
M Puttaraju
M Quesnel-Vallieres
M Romano
M Roy
M Tabebordbar
M Tajnik
MA Allen
MA Garcia-Blanco
MC Kramer
MK Parra
MM Scotti
MO Duff
MT Lovci
N Gal-Mark
N Liu
N Lopez-Bigas
N Sheth
N Volfovsky
NJ Sakabe
O Rossbach
O Solomon
P Akiva
P Edery
PA Galante
PE Wright
PL Boutz
PT Buckley
Q Wu
Q Yan
R Ashwal-Fluss
R Dorn
R Hayashi
R Martinez-Contreras
R Vaz-Drago
RB Darman
RC Dietrich
RE Sutton
RK Singh
S Alsafadi
S Bonnal
S Braun
S Ghosal
S Kelly
S Lualdi
S Memczak
S Memczak
S Naftelberg
S Petkovic
S Shen
SA Akker
SJ Conn
SM Rueter
T Derrien
T Eom
TB Hansen
TD Brunet
TJ Chuang
TR Mercer
TS Alioto
TY Hsu
U Braunschweig
U Koller
V Madan
V Marinescu
VO Wickramasinghe
W Filipowicz
WE Highsmith
WR Jeck
WR Jeck
X Chen
X Roca
X Roca
X Shen
X You
XD Fu
XO Zhang
Y Hua
Y Kapustin
Y Kong
Y Marquez
Y Marquez
Y Quentin
Y Zhang
YH Wang
YI Li
Z Dominski
Z Dominski
Z Kan
Z Pasman
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2016
Field of study

Recent improvements in experimental and computational techniques that are used to study the transcriptome have enabled an unprecedented view of RNA processing, revealing many previously unknown non-canonical splicing events. This includes cryptic events located far from the currently annotated exons and unconventional splicing mechanisms that have important roles in regulating gene expression. These non-canonical splicing events are a major source of newly emerging transcripts during evolution, especially when they involve sequences derived from transposable elements. They are therefore under precise regulation and quality control, which minimizes their potential to disrupt gene expression. We explain how non-canonical splicing can lead to aberrant transcripts that cause many diseases, and also how it can be exploited for new therapeutic strategies

UCL Discovery