Search CORE

62 research outputs found

Minimal Absent Words in Prokaryotic and Eukaryotic Genomes

Author: A Gentles
AJ Pinho
Armando J. Pinho
C Acquisti
C Simillion
Carlos A. C. Bastos
Christian Schönbach
D Gusfield
E Margulies
G Hampikian
I Ulitsky
J Herold
João M. O. S. Rodrigues
M Burrows
MI Abouelhoda
Paulo J. S. G. Ferreira
R Sokal
S Karlin
S Karlin
S Karlin
S Karlin
S Karlin
S Pietrokovski
Sara P. Garcia
T Kasai
V Brendel
Publication venue: Public Library of Science
Publication date: 31/01/2011
Field of study

Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we explore different sets of minimal absent words in the genomes of 22 organisms (one archaeota, thirteen bacteria and eight eukaryotes). We investigate if the mutational biases that may explain the deficit of the shortest absent words in vertebrates are also pervasive in other absent words, namely in minimal absent words, as well as to other organisms. We find that the compositional biases observed for the shortest absent words in vertebrates are not uniform throughout different sets of minimal absent words. We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry from the similarity in dinucleotide relative abundances of different sets of minimal absent words, and find that this inheritance may be exclusive to vertebrates

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures

Author: B Langmead
Christian Otto
Cynthia M. Sharma
David B. Searls
G Myers
H Li
H Li
H Lin
JC Dohm
JM Rothberg
Jörg Hackermüller
Jörg Vogel
K Prüfer
M Crochemore
MI Abouelhoda
P Ferragina
Peter F. Stadler
Philipp Khaitovich
R Li
S Bennett
S Huse
S Karlin
SM Rumble
Stefan Kurtz
Steve Hoffmann
W Chang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For example, it can handle the problem of leading and trailing contaminations caused by primers and poly-A tails in transcriptomics or the length-dependent increase of error rates. In these contexts, it thus simplifies the tedious and error-prone trimming step. For efficient searches, our method utilizes index structures in the form of enhanced suffix arrays. In a comparison with current methods for short read mapping, the presented approach shows significantly increased performance not only for 454 reads, but also for Illumina reads. Our approach is implemented in the software segemehl available at http://www.bioinf.uni-leipzig.de/Software/segemehl/

Public Library of Science (PLOS)

Crossref

Fraunhofer-ePrints

Directory of Open Access Journals

PubMed Central

Longest Common Prefixes with $k$ -Errors and Applications

Author: A Apostolico
AF Smit
B Bollobás
C Leimeister
C Pizzi
DE Willard
G Kucherov
G Manzini
G Navarro
H Alamro
I Ulitsky
J Fischer
KR Rasmussen
M Alzamel
MA Bender
MI Abouelhoda
N Välimäki
P Eades
R Kolpakov
S Faro
S Grabowski
S Karlin
SV Thankachan
SV Thankachan
SV Thankachan
T Derrien
T Flouri
TH Cormen
U Manber
Publication venue
Publication date: 01/01/2018
Field of study

Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we study the problem of computing the longest prefix of each suffix of a given string of length

n

over a constant-sized alphabet that occurs elsewhere in the string with

k

-errors. This problem has already been studied under the Hamming distance model. Our first result is an improvement upon the state-of-the-art average-case time complexity for non-constant

k

and using only linear space under the Hamming distance model. Notably, we show that our technique can be extended to the edit distance model with the same time and space complexities. Specifically, our algorithms run in

\mathcal{O}(n \log^k n \log \log n)

time on average using

\mathcal{O}(n)

space. We show that our technique is applicable to several algorithmic problems in computational biology and elsewhere

arXiv.org e-Print Archive

Crossref

King's Research Portal

High quality copy number and genotype data from FFPE samples using Molecular Inversion Probe (MIP) microarrays

BACKGROUND:A major challenge facing DNA copy number (CN) studies of tumors is that most banked samples with extensive clinical follow-up information are Formalin-Fixed Paraffin Embedded (FFPE). DNA from FFPE samples generally underperforms or suffers high failure rates compared to fresh frozen samples because of DNA degradation and cross-linking during FFPE fixation and processing. As FFPE protocols may vary widely between labs and samples may be stored for decades at room temperature, an ideal FFPE CN technology should work on diverse sample sets. Molecular Inversion Probe (MIP) technology has been applied successfully to obtain high quality CN and genotype data from cell line and frozen tumor DNA. Since the MIP probes require only a small (~40 bp) target binding site, we reasoned they may be well suited to assess degraded FFPE DNA. We assessed CN with a MIP panel of 50,000 markers in 93 FFPE tumor samples from 7 diverse collections. For 38 FFPE samples from three collections we were also able to asses CN in matched fresh frozen tumor tissue.RESULTS:Using an input of 37 ng genomic DNA, we generated high quality CN data with MIP technology in 88% of FFPE samples from seven diverse collections. When matched fresh frozen tissue was available, the performance of FFPE DNA was comparable to that of DNA obtained from matched frozen tumor (genotype concordance averaged 99.9%), with only a modest loss in performance in FFPE.CONCLUSION:MIP technology can be used to generate high quality CN and genotype data in FFPE as well as fresh frozen samples.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]

Crossref

Springer - Publisher Connector

PubMed Central

The University of Arizona

UNT Digital Library

Cooperative Binding

Author: A Karlin
AV Hill
BA Mello
C Bohr
DE Koshland
DE Koshland
DT Gallagher
GK Ackers
GS Adair
H Aramaki
IM Klotz
IM Klotz
J Monod
J Wyman
JC Gerhart
JP Changeux
JP Changeux
JP Changeux
JP Changeux
JP Changeux
K Brejc
L Pauling
M Ptashne
MD Seo
Melanie I. Stefan
MF Perutz
MI Stefan
MM Rubin
Nicolas Le Novère
P Cluzel
RB Honzatko
Shoshana Wodak
SJ Edelstein
T Krell
T Meyer
TA Duke
TS Najdi
TS Teo
V Sourjik
YS Babu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/06/2013
Field of study

Molecular binding is an interaction between molecules that results in a stable association between those molecules. Cooperative binding occurs if the number of binding sites of a macromolecule that are occupied by a specific type of ligand is a nonlinear function of this ligand’s concentration. This can be due, for instance, to an affinity for the ligand that depends on the amount of ligand bound. Cooperativity can be positive (supralinear) or negative (infralinear). Cooperative binding is most often observed in proteins, but nucleic acids can also exhibit cooperative binding, for instance of transcription factors. Cooperative binding has been shown to be the mechanism underlying a large range of biochemical and physiological processes

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Caltech Authors

FigShare

Organizational Heterogeneity of Vertebrate Genomes

Author: A Nekrutenko
A Paz
A Porceddu
A Woolfe
Abraham Korol
AE Vinogradov
AE Vinogradov
AV Smith
B Deng
B-Y Liao
BS Weir
C Chapus
C Dufraigne
C McLean
C Melodelima
C Nusbaum
C Schmegner
C Schmegner
CM Malcom
CP Ponting
D Sellis
E Bingham
E Buschiazzo
E Lieberman-Aiden
EN Trifonov
ET Dermitzakis
ET Dermitzakis
F Larsen
G Bejerano
G Bernardi
G Bernardi
G Rosen
GE Sims
GL Rosen
H Caron
H Wu
HeldenJ van
I Dunham
J Grimwood
J Healy
J Jurka
JR Chubb
K Jabbari
K Sivaraman
K Yamada
KJ Meaburn
L Chen
L Duret
L Eory
L Mariño-Ramírez
LW Hillier
M Costantini
M Costantini
M Costantini
M Costantini
M Costantini
M Csurös
M Gardiner-Garden
M Hattori
M Höhl
M Sémon
M Touchon
MC Zody
MI Jensen-Seaman
MJ Lercher
P Carpena
R Nussinov
R Versteeg
RK Azad
S De
S Karlin
S Karlin
S Karlin
S Katzman
S Katzman
S Pietrokovski
S Vinga
SB Hedges
SJ Bell
SJ Bell
Svetlana Frenkel
T Abe
T Cremer
T Ryba
V Kirzhner
V Kirzhner
V Kirzhner
V Kirzhner
Valery Kirzhner
Vincent Laudet
W Li
W Li
W Li
WJ Kent
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as “texts” using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter - GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The use of genomic signature distance between bacteriophages and their hosts displays evolutionary relationships and phage growth cycle determination

Author: A Campbell
A Wietzorrek
AJ Clark
AM Kropinski
BE Blaisdell
C Canchaya
C Desplats
C Dufraigne
C Halling
C Regeard
CA Suttle
Christophe Regeard
D Savalia
D Scholl
D Scholl
D Vybiral
D Zou
DM Kristensen
DT Pride
ES Miller
F Sanger
F Sanger
FE Angly
G Plunkett
GF Hatfull
GF Hatfull
GF Hatfull
GF Hatfull
GF Hatfull
GJ German
GJ Morgan
H Brüssow
H Mertens
H Miyamoto
H Teeling
HW Ackermann
J Becq
J Dorscht
J Hong
J Kaneko
J Lawrence
J Mediavilla
J Recktenwald
J Uchiyama
J Wang
JG Lawrence
JJ Dunn
JJ Iandolo
K Creuzburg
K Hertveldt
K Nakayama
K Stummeyer
KE Wommack
KV Srividhya
M Byrne
M Kuroda
M Mulet
M Ohnishi
MB Lobocka
MB Sullivan
MD Braid
MD Roberts
ME Ford
ME Ford
ME Zegans
MG Weinbauer
MG Weinbauer
MI Pajunen
Michael S DuBow
MK Waldor
ML Pedulla
MW van Passel
N Jamalludeen
NY Cho
P Morris
P Serwer
Passel van
Patrick Deschavanne
PJ Ceyssens
PJ Ceyssens
PJ Ceyssens
PJ Deschavanne
PR Marri
PW Wang
R Lavigne
RJ Juhala
S Karlin
S Karlin
S Karlin
S Labrie
S Matsuzaki
S Narita
S O'Flaherty
S Zuber
SJ Williamson
SM Tallent
SR Casjens
T Bae
T Coenye
T Kwan
T Kwan
T Kwan
T Sato
T Sato
T Yamaguchi
TF Thingstad
TT Pham
V Rosas-Magallanes
VV Mesyanzhinov
WB Whitman
XX Ma
XX Ma
Y Tan
YJ Heo
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Bacteriophage classification is mainly based on morphological traits and genome characteristics combined with host information and in some cases on phage growth lifestyle. A lack of molecular tools can impede more precise studies on phylogenetic relationships or even a taxonomic classification. The use of methods to analyze genome sequences without the requirement for homology has allowed advances in classification. Results Here, we proposed to use genome sequence signature to characterize bacteriophages and to compare them to their host genome signature in order to obtain host-phage relationships and information on their lifestyle. We analyze the host-phage relationships in the four most representative groups of Caudoviridae, the dsDNA group of phages. We demonstrate that the use of phage genomic signature and its comparison with that of the host allows a grouping of phages and is also able to predict the host-phage relationships (lytic <it>vs</it>. temperate). Conclusions We can thus condense, in relatively simple figures, this phage information dispersed over many publications.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

HAL-Inserm

PubMed Central

Hal-Diderot

Fine-structured multi-scaling long-range correlations in completely sequenced genomes—features, origin, and classification

Author: A Grossberg
A Mira
A Provata
AI Lamond
AK Mohanty
Anis Abuseiris
AS Borovik
B Hao
B Hao
C Ambrose
CA Chatzidimitriou-Dreismann
CK Peng
CK Peng
F Reif
FHC Crick
FNH Robinson
Frank G. Grosveld
G Bernardi
G Bernardi
HE Stanley
I Amato
J Maddox
JCW Shephard
JCW Shephard
K Liu
KA Bailey
KJ Hsü
KJ Hsü
L Luo
LP Lefkovith
M Dundr
M Eigen
M Eigen
M Eigen
M Takahashi
Markus Göker
MI Rabinovich
P Allegrini
P Mackiewicz
PMC Oliveira de
PT Lowary
RF Voss
Rudolf Lohner
S Karlin
S Kirkpatrick
S Nee
SV Buldyrev
SV Buldyrev
T Cremer
TA Blank
Tobias A. Knoch
U Francke
VV Prabhu
W Li
W Li
W Li
W Li
W Li
W Li
Z Yu
ZG Yu
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

The sequential organization of genomes, i.e. the relations between distant base pairs and regions within sequences, and its connection to the three-dimensional organization of genomes is still a largely unresolved problem. Long-range power-law correlations were found using correlation analysis on almost the entire observable scale of 132 completely sequenced chromosomes of 0.5 × 106 to 3.0 × 107 bp from Archaea, Bacteria, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, and Homo sapiens. The local correlation coefficients show a species-specific multi-scaling behaviour: close to random correlations on the scale of a few base pairs, a first maximum from 40 to 3,400 bp (for Arabidopsis thaliana and Drosophila melanogaster divided in two submaxima), and often a region of one or more second maxima from 105 to 3 × 105 bp. Within this multi-scaling behaviour, an additional fine-structure is present and attributable to codon usage in all except the human sequences, where it is related to nucleosomal binding. Computer-generated random sequences assuming a block organization of genomes, the codon usage, and nucleosomal binding explain these results. Mutation by sequence reshuffling destroyed all correlations. Thus, the stability of correlations seems to be evolutionarily tightly controlled and connected to the spatial genome organization, especially on large scales. In summary, genomes show a complex sequential organization related closely to their three-dimensional organization

Crossref

Springer - Publisher Connector

PubMed Central

EUR Research Repository

Erasmus University Digital Repository

A biologically plausible model of time-scale invariant interval timing

The temporal durations between events often exert a strong influence over behavior. The details of this influence have been extensively characterized in behavioral experiments in different animal species. A remarkable feature of the data collected in these experiments is that they are often time-scale invariant. This means that response measurements obtained under intervals of different durations coincide when plotted as functions of relative time. Here we describe a biologically plausible model of an interval timing device and show that it is consistent with time-scale invariant behavior over a substantial range of interval durations. The model consists of a set of bistable units that switch from one state to the other at random times. We first use an abstract formulation of the model to derive exact expressions for some key quantities and to demonstrate time-scale invariance for any range of interval durations. We then show how the model could be implemented in the nervous system through a generic and biologically plausible mechanism. In particular, we show that any system that can display noise-driven transitions from one stable state to another can be used to implement the timing device. Our work demonstrates that a biologically plausible model can qualitatively account for a large body of data and thus provides a link between the biology and behavior of interval timing

Crossref

Springer - Publisher Connector

PubMed Central

Comparative Genomic Analysis of Drosophila melanogaster and Vector Mosquito Developmental Genes

Author: A Clemons
A Clemons
A Clemons
A Clemons
A Clemons
A Clemons
A Kusserow
A Pires-daSilva
A Stathopoulos
AC Koutsos
AK Mueller
AP McGregor
B Bryant
B Bryant
B Lilly
BS Baker
BS Emerald
C Nassif
C Nusslein-Volhard
C Scali
Charles R. Tessier
CI Jones
CL Campbell
CWC Davis
D Jhaveri
D Lawson
D Smedley
DA Wassarman
David W. Severson
DE Klein
DG McHaffey
DJ Andrew
DM Cooper
DM Cooper
E Calvo
E Calvo
E Hornstein
E Wienholds
E Wienholds
E Zuckerkandl
EA Mead
Ellen Flannery
EM Zdobnov
EV Kriventseva
EW Abrams
F Catteruccia
F Feiguin
F Hirth
F Tajima
G Schwank
GB Craig Jr
GJ Bashaw
GK Davis
GK Davis
GL Grossman
H Hing
H Li
H McNeill
H Noguchi
H Steller
J Curtiss
J Jiang
J Juhn
J Juhn
J Mohler
JA Lynch
Joseph Sarro
JR Terman
K Hoshijima
K Senti
K Tamura
KA Wharton
KC Burtis
KJ Mitchell
KP O'Brien
L Almeras
L Zhou
LN Raminani
LNCE Raminani
M Beye
M Haugen
M Nei
M Noll
M Orme
M Somel
M Van der Zee
MA Huntley
MA Huntley
MA Larkin
MC Alonso
MI Salazar
MJ Sonnenfeld
MK Abbott
ML Spletter
Molly Duman-Scheel
Morgan Haugen
MS Chen
N Fuse
N Posnien
NA Jones
NH Patel
P Arensburger
P Huang
PA Rossignol
Pedro Lagerblad Oliveira
PK Dearden
Q Liu
R Aguilar
R Dasgupta
R Harris
R Kofler
R Lehmann
R Schroder
R Schweitzer
RA Holt
RF Stocker
RT Boggs
S Artavanis-Tsakonas
S Griffiths-Jones
S Iwai
S Karlin
S Karlin
S Li
S Shigenobu
S Tweedie
SD Podos
SE Goulding
SF Altschul
SK Behura
SM Cohen
SN Kim
Susanta K. Behura
T Brody
T Gempe
T Komiyama
T Thomson
T Volk
TW Cline
U Hinz
U Lammel
V Nene
V Pirrotta
W Simanton
WCt Black
WH Xu
WR Horsfall
WS Romoser
Y Goltsev
Y Goltsev
Y Goltsev
Y Rao
Z Jin
Z Kaprielian
Z Song
ZN Adelman
Publication venue: Public Library of Science
Publication date: 06/07/2011
Field of study

Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1) are components of developmental signaling pathways, 2) regulate fundamental developmental processes, 3) are critical for the development of tissues of vector importance, 4) function in developmental processes known to have diverged within insects, and 5) encode microRNAs (miRNAs) that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central