552 research outputs found
Using cascading Bloom filters to improve the memory usage for de Brujin graphs
De Brujin graphs are widely used in bioinformatics for processing
next-generation sequencing data. Due to a very large size of NGS datasets, it
is essential to represent de Bruijn graphs compactly, and several approaches to
this problem have been proposed recently. In this work, we show how to reduce
the memory required by the algorithm of [3] that represents de Brujin graphs
using Bloom filters. Our method requires 30% to 40% less memory with respect to
the method of [3], with insignificant impact to construction time. At the same
time, our experiments showed a better query time compared to [3]. This is, to
our knowledge, the best practical representation for de Bruijn graphs.Comment: 12 pages, submitte
Comparison of Spectra in Unsequenced Species
International audienceWe introduce a new algorithm for the mass spectromet- ric identication of proteins. Experimental spectra obtained by tandem MS/MS are directly compared to theoretical spectra generated from pro- teins of evolutionarily closely related organisms. This work is motivated by the need of a method that allows the identication of proteins of unsequenced species against a database containing proteins of related organisms. The idea is that matching spectra of unknown peptides to very similar MS/MS spectra generated from this database of annotated proteins can lead to annotate unknown proteins. This process is similar to ortholog annotation in protein sequence databases. The difficulty with such an approach is that two similar peptides, even with just one mod- ication (i.e. insertion, deletion or substitution of one or several amino acid(s)) between them, usually generate very dissimilar spectra. In this paper, we present a new dynamic programming based algorithm: Packet- SpectralAlignment. Our algorithm is tolerant to modications and fully exploits two important properties that are usually not considered: the notion of inner symmetry, a relation linking pairs of spectrum peaks, and the notion of packet inside each spectrum to keep related peaks together. Our algorithm, PacketSpectralAlignment is then compared to SpectralAlignment [1] on a dataset of simulated spectra. Our tests show that PacketSpectralAlignment behaves better, in terms of results and execution tim
Safe and complete contig assembly via omnitigs
Contig assembly is the first stage that most assemblers solve when
reconstructing a genome from a set of reads. Its output consists of contigs --
a set of strings that are promised to appear in any genome that could have
generated the reads. From the introduction of contigs 20 years ago, assemblers
have tried to obtain longer and longer contigs, but the following question was
never solved: given a genome graph (e.g. a de Bruijn, or a string graph),
what are all the strings that can be safely reported from as contigs? In
this paper we finally answer this question, and also give a polynomial time
algorithm to find them. Our experiments show that these strings, which we call
omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of
dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201
Thermodynamics of protein folding: a random matrix formulation
The process of protein folding from an unfolded state to a biologically
active, folded conformation is governed by many parameters e.g the sequence of
amino acids, intermolecular interactions, the solvent, temperature and chaperon
molecules. Our study, based on random matrix modeling of the interactions,
shows however that the evolution of the statistical measures e.g Gibbs free
energy, heat capacity, entropy is single parametric. The information can
explain the selection of specific folding pathways from an infinite number of
possible ways as well as other folding characteristics observed in computer
simulation studies.Comment: 21 Pages, no figure
Group testing with Random Pools: Phase Transitions and Optimal Strategy
The problem of Group Testing is to identify defective items out of a set of
objects by means of pool queries of the form "Does the pool contain at least a
defective?". The aim is of course to perform detection with the fewest possible
queries, a problem which has relevant practical applications in different
fields including molecular biology and computer science. Here we study GT in
the probabilistic setting focusing on the regime of small defective probability
and large number of objects, and . We construct and
analyze one-stage algorithms for which we establish the occurrence of a
non-detection/detection phase transition resulting in a sharp threshold, , for the number of tests. By optimizing the pool design we construct
algorithms whose detection threshold follows the optimal scaling . Then we consider two-stages algorithms and analyze their
performance for different choices of the first stage pools. In particular, via
a proper random choice of the pools, we construct algorithms which attain the
optimal value (previously determined in Ref. [16]) for the mean number of tests
required for complete detection. We finally discuss the optimal pool design in
the case of finite
Parking functions, labeled trees and DCJ sorting scenarios
In genome rearrangement theory, one of the elusive questions raised in recent
years is the enumeration of rearrangement scenarios between two genomes. This
problem is related to the uniform generation of rearrangement scenarios, and
the derivation of tests of statistical significance of the properties of these
scenarios. Here we give an exact formula for the number of double-cut-and-join
(DCJ) rearrangement scenarios of co-tailed genomes. We also construct effective
bijections between the set of scenarios that sort a cycle and well studied
combinatorial objects such as parking functions and labeled trees.Comment: 12 pages, 3 figure
Comparison of Percutaneous Nucleoplasty and Open Discectomy in Patients with Lumbar Disc Protrusions
Rezumat Comparaåie între nucleoplastia percutanã aei discectomia deschisã la pacienåii cu protruzii discale lombare Introducere: Nucleoplastia prin coblaåie este o metodã minim invazivã situatã la mijlocul distanåei dintre tratamentul conservator şi cel operator classic al degenerãrii discului lombar asociat cu protruzie discalã. Autorii comparã rezultatele obåinute prin tratamentul minim invaziv şi cel operator classic al acestei afecåiuni. Material şi rezultate: Pacienåii din douã grupe (fiecare grupã având 80 de pacienåi) au fost trataåi prin cele douã metode. Pacienåii cu simptomatologie radicularã produsã de protruzii discale cu diametrul antero-posterior < 6 mm, rezistente la tratamentul conservator, au fost operaåi prin nucleoplastie. În situaåia în care diametrul antero-posterior al discului herniat a fost > 6 mm, s-a aplicat metoda discectomiei clasice. În grupul tratat prin discectomie deschisã ameliorarea durerii radiculare a fost imediatã, dar la 1 an postoperator doar o treime dintre pacienåi şi-au reluat munca. În grupul tratat prin nucleoplastie ameliorarea durerii a fost mai lentã dar progresivã. La un an postoperator scorul VAS al pacienåilor trataåi prin cele 2 metode este foarte apropiat. Toåi pacienåii şi-au reluat munca la 3 zile dupã nucleoplastie. În acest grup nu au existat complicaåii intraoperatorii sau postoperatorii. Un pacient a fost ulterior operat prin discectomie clasicã. Concluzie: Nucleoplastia prin coblaåie este o metodã de tratament eficientã şi sigurã a protruziilor discale lombare. Cuvinte cheie: nucleoplastie, coblaåie, discectomie Abstract Introduction: Coblation nucleoplasty is a minimally invasive method, at middle way between conservative and open surgical treatment of patients with degenerative disc disease and lumbar disc protrusion. Authors compare the outcome of patients treated through the two methods. Material and results: Two groups of 80 patients each were treated through open discectomy and nucleoplasty. Patients with radicular symptoms caused by disc protrusions, having antero-posterior diameter of herniated disc < 6 mm, resistant to conservative treatment, were operated using nucleoplasty. When antero-posterior diameter of the disc herniation was > 6 mm, classical discectomy method was applied. Classical surgeries (discectomies) were performed by the senior author (D.A.), while the nucleoplasty procedures all three authors equally participated. In the first group improvement of radicular pain was immediate. At 1 year after the procedure only one third of the patients returned to work. In the group treated through nucleoplasty improvement of pain was slow but gradual. After 1 postoperative year the VAS score of patients treated through the two methods were very close. At 3 days post nucleoplasty all patients returned to work. In this group there were not intraoperative or post-operative complications. One patient was afterwards operated through open discectomy. Conclusion: Coblation nucleoplasty is a safe and efficient method to treat patients with lumbar disc protrusion
Limited Lifespan of Fragile Regions in Mammalian Evolution
An important question in genome evolution is whether there exist fragile
regions (rearrangement hotspots) where chromosomal rearrangements are happening
over and over again. Although nearly all recent studies supported the existence
of fragile regions in mammalian genomes, the most comprehensive phylogenomic
study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some
doubts about their existence. We demonstrate that fragile regions are subject
to a "birth and death" process, implying that fragility has limited
evolutionary lifespan. This finding implies that fragile regions migrate to
different locations in different mammals, explaining why there exist only a few
chromosomal breakpoints shared between different lineages. The birth and death
of fragile regions phenomenon reinforces the hypothesis that rearrangements are
promoted by matching segmental duplications and suggests putative locations of
the currently active fragile regions in the human genome
Stationary Distribution and Eigenvalues for a de Bruijn Process
We define a de Bruijn process with parameters n and L as a certain
continuous-time Markov chain on the de Bruijn graph with words of length L over
an n-letter alphabet as vertices. We determine explicitly its steady state
distribution and its characteristic polynomial, which turns out to decompose
into linear factors. In addition, we examine the stationary state of two
specializations in detail. In the first one, the de Bruijn-Bernoulli process,
this is a product measure. In the second one, the Skin-deep de Bruin process,
the distribution has constant density but nontrivial correlation functions. The
two point correlation function is determined using generating function
techniques.Comment: Dedicated to Herb Wilf on the occasion of his 80th birthda
- …