Search CORE

552 research outputs found

Using cascading Bloom filters to improve the memory usage for de Brujin graphs

Author: A. Bowe
A. Kirsch
E. Porat
F.R. Blattner
J. Pell
J.R. Miller
M.G. Grabherr
P.A. Pevzner
R. Chikhi
T.C. Conway
Y. Peng
Z. Iqbal
Publication venue
Publication date: 01/01/2013
Field of study

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of [3] that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to the method of [3], with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to [3]. This is, to our knowledge, the best practical representation for de Bruijn graphs.Comment: 12 pages, submitte

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Comparison of Spectra in Unsequenced Species

Author: A.M. Frank
B. Habermann
D. Tsur
D.N. Perkins
E. Pitzer
J. Eng
J. Grossmann
J.R. Yates
P.A. Pevzner
P.A. Pevzner
S. Pevtsov
V. Dancik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

International audienceWe introduce a new algorithm for the mass spectromet- ric identication of proteins. Experimental spectra obtained by tandem MS/MS are directly compared to theoretical spectra generated from pro- teins of evolutionarily closely related organisms. This work is motivated by the need of a method that allows the identication of proteins of unsequenced species against a database containing proteins of related organisms. The idea is that matching spectra of unknown peptides to very similar MS/MS spectra generated from this database of annotated proteins can lead to annotate unknown proteins. This process is similar to ortholog annotation in protein sequence databases. The difficulty with such an approach is that two similar peptides, even with just one mod- ication (i.e. insertion, deletion or substitution of one or several amino acid(s)) between them, usually generate very dissimilar spectra. In this paper, we present a new dynamic programming based algorithm: Packet- SpectralAlignment. Our algorithm is tolerant to modications and fully exploits two important properties that are usually not considered: the notion of inner symmetry, a relation linking pairs of spectrum peaks, and the notion of packet inside each spectrum to keep related peaks together. Our algorithm, PacketSpectralAlignment is then compared to SpectralAlignment [1] on a dataset of simulated spectra. Our tests show that PacketSpectralAlignment behaves better, in terms of results and execution tim

Crossref

Safe and complete contig assembly via omnitigs

Author: A Bankevich
A Guénoche
AR Rubinov
AS Motahari
C Kingsford
D Haussler
DR Zerbino
E Kapun
E Kapun
ES Lander
G Bresler
G Narzisi
I Lysov
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
K Lam
K Sahlin
L Salmela
M Boetzer
M Boetzer
N Nagarajan
N Nagarajan
N Vyahhi
P Medvedev
P Medvedev
P Medvedev
PA Pevzner
PA Pevzner
R Chikhi
R Chikhi
R Luo
R Uricaru
RM Idury
SL Salzberg
Publication venue
Publication date: 16/08/2016
Field of study

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph

G

(e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from

G

as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

arXiv.org e-Print Archive

Crossref

Thermodynamics of protein folding: a random matrix formulation

Author: Betancourt M R
Creighton T E
Frauenfelder H
Kleinberg J Istrail S Pevzner P Waterman M
Lee S
Mehta M L
Pragya Shukla
Richards F M
Shortle D
Shukla P
van den Berg B
Publication venue: 'IOP Publishing'
Publication date: 16/10/2010
Field of study

The process of protein folding from an unfolded state to a biologically active, folded conformation is governed by many parameters e.g the sequence of amino acids, intermolecular interactions, the solvent, temperature and chaperon molecules. Our study, based on random matrix modeling of the interactions, shows however that the evolution of the statistical measures e.g Gibbs free energy, heat capacity, entropy is single parametric. The information can explain the selection of specific folding pathways from an infinite number of possible ways as well as other folding characteristics observed in computer simulation studies.Comment: 21 Pages, no figure

arXiv.org e-Print Archive

Crossref

Group testing with Random Pools: Phase Transitions and Optimal Strategy

Author: A. Macula
C. Toninelli
C.M. Fortuin
D. Dorfman
D. Gupta
D.J. Balding
D.Z. Du
E. Barillot
E. Knill
E.H. Hong
J. Lu
M. Mézard
M. Sobel
M. Tarzia
M.M. Mézard
P.A. Pevzner
S.A. Zenios
T. Berger
T.J. Richardson
W.H. Kautz
W.J. Bruno
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/11/2007
Field of study

The problem of Group Testing is to identify defective items out of a set of objects by means of pool queries of the form "Does the pool contain at least a defective?". The aim is of course to perform detection with the fewest possible queries, a problem which has relevant practical applications in different fields including molecular biology and computer science. Here we study GT in the probabilistic setting focusing on the regime of small defective probability and large number of objects,

p \to 0

and

N \to \infty

. We construct and analyze one-stage algorithms for which we establish the occurrence of a non-detection/detection phase transition resulting in a sharp threshold,

\bar M

, for the number of tests. By optimizing the pool design we construct algorithms whose detection threshold follows the optimal scaling

\bar M\propto Np|\log p|

. Then we consider two-stages algorithms and analyze their performance for different choices of the first stage pools. In particular, via a proper random choice of the pools, we construct algorithms which attain the optimal value (previously determined in Ref. [16]) for the mean number of tests required for complete detection. We finally discuss the optimal pool design in the case of finite

p

arXiv.org e-Print Archive

Crossref

Hal-Diderot

Parking functions, labeled trees and DCJ sorting scenarios

Author: A. Bergeron
A. McLysaght
A.C. Siepel
A.G. Konheim
A.W. Xu
D. Sankoff
D. Sankoff
E. Barcucci
I. Miklós
I. Miklós
M. Ozery-flato
M.D.V. Braga
P. Pevzner
R.P. Stanley
R.P. Stanley
R.P. Stanley
S. Bérard
S. Yancopoulos
Y. Ajana
Y. Diekmann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

In genome rearrangement theory, one of the elusive questions raised in recent years is the enumeration of rearrangement scenarios between two genomes. This problem is related to the uniform generation of rearrangement scenarios, and the derivation of tests of statistical significance of the properties of these scenarios. Here we give an exact formula for the number of double-cut-and-join (DCJ) rearrangement scenarios of co-tailed genomes. We also construct effective bijections between the set of scenarios that sort a cycle and well studied combinatorial objects such as parking functions and labeled trees.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Comparison of Percutaneous Nucleoplasty and Open Discectomy in Patients with Lumbar Disc Protrusions

Author: D Adam
E Pevzner
MD Danil Adam
R Gepstein
Publication venue
Publication date: 24/04/2020
Field of study

Rezumat Comparaåie între nucleoplastia percutanã aei discectomia deschisã la pacienåii cu protruzii discale lombare Introducere: Nucleoplastia prin coblaåie este o metodã minim invazivã situatã la mijlocul distanåei dintre tratamentul conservator şi cel operator classic al degenerãrii discului lombar asociat cu protruzie discalã. Autorii comparã rezultatele obåinute prin tratamentul minim invaziv şi cel operator classic al acestei afecåiuni. Material şi rezultate: Pacienåii din douã grupe (fiecare grupã având 80 de pacienåi) au fost trataåi prin cele douã metode. Pacienåii cu simptomatologie radicularã produsã de protruzii discale cu diametrul antero-posterior < 6 mm, rezistente la tratamentul conservator, au fost operaåi prin nucleoplastie. În situaåia în care diametrul antero-posterior al discului herniat a fost > 6 mm, s-a aplicat metoda discectomiei clasice. În grupul tratat prin discectomie deschisã ameliorarea durerii radiculare a fost imediatã, dar la 1 an postoperator doar o treime dintre pacienåi şi-au reluat munca. În grupul tratat prin nucleoplastie ameliorarea durerii a fost mai lentã dar progresivã. La un an postoperator scorul VAS al pacienåilor trataåi prin cele 2 metode este foarte apropiat. Toåi pacienåii şi-au reluat munca la 3 zile dupã nucleoplastie. În acest grup nu au existat complicaåii intraoperatorii sau postoperatorii. Un pacient a fost ulterior operat prin discectomie clasicã. Concluzie: Nucleoplastia prin coblaåie este o metodã de tratament eficientã şi sigurã a protruziilor discale lombare. Cuvinte cheie: nucleoplastie, coblaåie, discectomie Abstract Introduction: Coblation nucleoplasty is a minimally invasive method, at middle way between conservative and open surgical treatment of patients with degenerative disc disease and lumbar disc protrusion. Authors compare the outcome of patients treated through the two methods. Material and results: Two groups of 80 patients each were treated through open discectomy and nucleoplasty. Patients with radicular symptoms caused by disc protrusions, having antero-posterior diameter of herniated disc < 6 mm, resistant to conservative treatment, were operated using nucleoplasty. When antero-posterior diameter of the disc herniation was > 6 mm, classical discectomy method was applied. Classical surgeries (discectomies) were performed by the senior author (D.A.), while the nucleoplasty procedures all three authors equally participated. In the first group improvement of radicular pain was immediate. At 1 year after the procedure only one third of the patients returned to work. In the group treated through nucleoplasty improvement of pain was slow but gradual. After 1 postoperative year the VAS score of patients treated through the two methods were very close. At 3 days post nucleoplasty all patients returned to work. In this group there were not intraoperative or post-operative complications. One patient was afterwards operated through open discectomy. Conclusion: Coblation nucleoplasty is a safe and efficient method to treat patients with lumbar disc protrusion

CiteSeerX

Limited Lifespan of Fragile Regions in Mammalian Evolution

Author: A. Bergeron
A. Bhutkar
A. Kulemzina
A. Ruiz-Herrera
A. Ruiz-Herrera
A.E. Wind van der
C. Webber
D. Larkin
D. Misceo
D. San Mauro
D. Sankoff
D. Sankoff
D.M. Larkin
D.M. Larkin
E. Mlynarski
E. Mongin
E.E. Eichler
G. Fertin
H. Hinsch
H. Kikuta
H. Zhao
J. Ma
J. Ma
J.H. Nadeau
L. Armengol
L. Gordon
M. Caceres
M. Longo
M.A. Alekseyev
M.A. Alekseyev
M.A. Alekseyev
M.A. Alekseyev
M.R. Mehan
O. Lecompte
P. Pevzner
P.A. Pevzner
R. Koszul
S. Myers
S. Ohno
S. Yancopoulos
S. Zhao
W.J. Kent
W.J. Murphy
Y. Yue
Z. Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some doubts about their existence. We demonstrate that fragile regions are subject to a "birth and death" process, implying that fragility has limited evolutionary lifespan. This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome

arXiv.org e-Print Archive

CiteSeerX

Crossref

Stationary Distribution and Eigenvalues for a de Bruijn Process

Author: Abbas Alhakim
Anthony Ralston
B. Nooten Van
Donald E. Knuth
Haiyan Chen
Herbert S. Wilf
J. Sherman
N. G. Bruijn
P. Flajolet
Pavel A. Pevzner
R A Blythe
R. Dawson
T. Aardenne-Ehrenfest van
T. Mori
V. V. Strok
W. T. Tutte
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/08/2011
Field of study

We define a de Bruijn process with parameters n and L as a certain continuous-time Markov chain on the de Bruijn graph with words of length L over an n-letter alphabet as vertices. We determine explicitly its steady state distribution and its characteristic polynomial, which turns out to decompose into linear factors. In addition, we examine the stationary state of two specializations in detail. In the first one, the de Bruijn-Bernoulli process, this is a product measure. In the second one, the Skin-deep de Bruin process, the distribution has constant density but nontrivial correlation functions. The two point correlation function is determined using generating function techniques.Comment: Dedicated to Herb Wilf on the occasion of his 80th birthda

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Teaching Bioinformatics at the Secondary School Level

Author: Fran Lewitter
G Pavesi
I Neil Sarkar
JL Ditty
JR Jungck
P Pevzner
Philip E. Bourne
R Altman
RJ Zauhar
S Ranganathan
SH Wefer
Publication venue: Public Library of Science
Publication date: 01/10/2011
Field of study

Crossref

Directory of Open Access Journals

PubMed Central