581,373 research outputs found

    SEED: efficient clustering of next-generation sequences.

    Get PDF
    MotivationSimilarity clustering of next-generation sequences (NGS) is an important computational problem to study the population sizes of DNA/RNA molecules and to reduce the redundancies in NGS data. Currently, most sequence clustering algorithms are limited by their speed and scalability, and thus cannot handle data with tens of millions of reads.ResultsHere, we introduce SEED-an efficient algorithm for clustering very large NGS sets. It joins sequences into clusters that can differ by up to three mismatches and three overhanging residues from their virtual center. It is based on a modified spaced seed method, called block spaced seeds. Its clustering component operates on the hash tables by first identifying virtual center sequences and then finding all their neighboring sequences that meet the similarity parameters. SEED can cluster 100 million short read sequences in <4 h with a linear time and memory performance. When using SEED as a preprocessing tool on genome/transcriptome assembly data, it was able to reduce the time and memory requirements of the Velvet/Oasis assembler for the datasets used in this study by 60-85% and 21-41%, respectively. In addition, the assemblies contained longer contigs than non-preprocessed data as indicated by 12-27% larger N50 values. Compared with other clustering tools, SEED showed the best performance in generating clusters of NGS data similar to true cluster results with a 2- to 10-fold better time performance. While most of SEED's utilities fall into the preprocessing area of NGS data, our tests also demonstrate its efficiency as stand-alone tool for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.AvailabilityThe SEED software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/[email protected] informationSupplementary data are available at Bioinformatics online

    Random Sampling in Computational Algebra: Helly Numbers and Violator Spaces

    Get PDF
    This paper transfers a randomized algorithm, originally used in geometric optimization, to computational problems in commutative algebra. We show that Clarkson's sampling algorithm can be applied to two problems in computational algebra: solving large-scale polynomial systems and finding small generating sets of graded ideals. The cornerstone of our work is showing that the theory of violator spaces of G\"artner et al.\ applies to polynomial ideal problems. To show this, one utilizes a Helly-type result for algebraic varieties. The resulting algorithms have expected runtime linear in the number of input polynomials, making the ideas interesting for handling systems with very large numbers of polynomials, but whose rank in the vector space of polynomials is small (e.g., when the number of variables and degree is constant).Comment: Minor edits, added two references; results unchange

    Cayley graphs of order kp are hamiltonian for k < 48

    Full text link
    We provide a computer-assisted proof that if G is any finite group of order kp, where k < 48 and p is prime, then every connected Cayley graph on G is hamiltonian (unless kp = 2). As part of the proof, it is verified that every connected Cayley graph of order less than 48 is either hamiltonian connected or hamiltonian laceable (or has valence less than three).Comment: 16 pages. GAP source code is available in the ancillary file

    Finding kk Simple Shortest Paths and Cycles

    Get PDF
    The problem of finding multiple simple shortest paths in a weighted directed graph G=(V,E)G=(V,E) has many applications, and is considerably more difficult than the corresponding problem when cycles are allowed in the paths. Even for a single source-sink pair, it is known that two simple shortest paths cannot be found in time polynomially smaller than n3n^3 (where n=Vn=|V|) unless the All-Pairs Shortest Paths problem can be solved in a similar time bound. The latter is a well-known open problem in algorithm design. We consider the all-pairs version of the problem, and we give a new algorithm to find kk simple shortest paths for all pairs of vertices. For k=2k=2, our algorithm runs in O(mn+n2logn)O(mn + n^2 \log n) time (where m=Em=|E|), which is almost the same bound as for the single pair case, and for k=3k=3 we improve earlier bounds. Our approach is based on forming suitable path extensions to find simple shortest paths; this method is different from the `detour finding' technique used in most of the prior work on simple shortest paths, replacement paths, and distance sensitivity oracles. Enumerating simple cycles is a well-studied classical problem. We present new algorithms for generating simple cycles and simple paths in GG in non-decreasing order of their weights; the algorithm for generating simple paths is much faster, and uses another variant of path extensions. We also give hardness results for sparse graphs, relative to the complexity of computing a minimum weight cycle in a graph, for several variants of problems related to finding kk simple paths and cycles.Comment: The current version includes new results for undirected graphs. In Section 4, the notion of an (m,n) reduction is generalized to an f(m,n) reductio

    Admissibility in Finitely Generated Quasivarieties

    Get PDF
    Checking the admissibility of quasiequations in a finitely generated (i.e., generated by a finite set of finite algebras) quasivariety Q amounts to checking validity in a suitable finite free algebra of the quasivariety, and is therefore decidable. However, since free algebras may be large even for small sets of small algebras and very few generators, this naive method for checking admissibility in \Q is not computationally feasible. In this paper, algorithms are introduced that generate a minimal (with respect to a multiset well-ordering on their cardinalities) finite set of algebras such that the validity of a quasiequation in this set corresponds to admissibility of the quasiequation in Q. In particular, structural completeness (validity and admissibility coincide) and almost structural completeness (validity and admissibility coincide for quasiequations with unifiable premises) can be checked. The algorithms are illustrated with a selection of well-known finitely generated quasivarieties, and adapted to handle also admissibility of rules in finite-valued logics

    A method for dense packing discovery

    Full text link
    The problem of packing a system of particles as densely as possible is foundational in the field of discrete geometry and is a powerful model in the material and biological sciences. As packing problems retreat from the reach of solution by analytic constructions, the importance of an efficient numerical method for conducting \textit{de novo} (from-scratch) searches for dense packings becomes crucial. In this paper, we use the \textit{divide and concur} framework to develop a general search method for the solution of periodic constraint problems, and we apply it to the discovery of dense periodic packings. An important feature of the method is the integration of the unit cell parameters with the other packing variables in the definition of the configuration space. The method we present led to improvements in the densest-known tetrahedron packing which are reported in [arXiv:0910.5226]. Here, we use the method to reproduce the densest known lattice sphere packings and the best known lattice kissing arrangements in up to 14 and 11 dimensions respectively (the first such numerical evidence for their optimality in some of these dimensions). For non-spherical particles, we report a new dense packing of regular four-dimensional simplices with density ϕ=128/2190.5845\phi=128/219\approx0.5845 and with a similar structure to the densest known tetrahedron packing.Comment: 15 pages, 5 figure

    A Novel Approach for Ellipsoidal Outer-Approximation of the Intersection Region of Ellipses in the Plane

    Get PDF
    In this paper, a novel technique for tight outer-approximation of the intersection region of a finite number of ellipses in 2-dimensional (2D) space is proposed. First, the vertices of a tight polygon that contains the convex intersection of the ellipses are found in an efficient manner. To do so, the intersection points of the ellipses that fall on the boundary of the intersection region are determined, and a set of points is generated on the elliptic arcs connecting every two neighbouring intersection points. By finding the tangent lines to the ellipses at the extended set of points, a set of half-planes is obtained, whose intersection forms a polygon. To find the polygon more efficiently, the points are given an order and the intersection of the half-planes corresponding to every two neighbouring points is calculated. If the polygon is convex and bounded, these calculated points together with the initially obtained intersection points will form its vertices. If the polygon is non-convex or unbounded, we can detect this situation and then generate additional discrete points only on the elliptical arc segment causing the issue, and restart the algorithm to obtain a bounded and convex polygon. Finally, the smallest area ellipse that contains the vertices of the polygon is obtained by solving a convex optimization problem. Through numerical experiments, it is illustrated that the proposed technique returns a tighter outer-approximation of the intersection of multiple ellipses, compared to conventional techniques, with only slightly higher computational cost
    corecore