653 research outputs found

    Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

    Get PDF
    We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem.Comment: 22 page

    Linear-time Computation of Minimal Absent Words Using Suffix Array

    Get PDF
    An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation provides a fast alternative for measuring approximation in sequence comparison. There exists an O(n)-time and O(n)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata (Crochemore et al., 1998). No implementation of this algorithm is publicly available. There also exists an O(n^2)-time and O(n)-space algorithm for the same problem based on the construction of suffix arrays (Pinho et al., 2009). An implementation of this algorithm was also provided by the authors and is currently the fastest available. In this article, we bridge this unpleasant gap by presenting an O(n)-time and O(n)-space algorithm for computing all minimal absent words based on the construction of suffix arrays. Experimental results using real and synthetic data show that the respective implementation outperforms the one by Pinho et al

    Direct laser printing of thin-film polyaniline devices

    Full text link
    We report the fabrication of electrically functional polyaniline thin-film microdevices. Polyaniline films were printed in the solid phase by Laser Induced Forward Transfer directly between Au electrodes on a Si/SiO2 substrate. To apply solid-phase deposition, aniline was in situ polymerized on quartz substrates. Laser deposition preserves the morphology of the films and delivers sharp features with controllable dimensions. The electrical characteristics of printed polyaniline present ohmic behavior, allowing for electroactive applications. Results on gas sensing of ammonia are presented.Comment: In Pres

    Reverse-Safe Data Structures for Text Indexing

    Get PDF
    We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model

    Linear-Time Superbubble Identification Algorithm for Genome Assembly

    Get PDF
    DNA sequencing is the process of determining the exact order of the nucleotide bases of an individual's genome in order to catalogue sequence variation and understand its biological implications. Whole-genome sequencing techniques produce masses of data in the form of short sequences known as reads. Assembling these reads into a whole genome constitutes a major algorithmic challenge. Most assembly algorithms utilize de Bruijn graphs constructed from reads for this purpose. A critical step of these algorithms is to detect typical motif structures in the graph caused by sequencing errors and genome repeats, and filter them out; one such complex subgraph class is a so-called superbubble. In this paper, we propose an O(n+m)-time algorithm to detect all superbubbles in a directed acyclic graph with n nodes and m (directed) edges, improving the best-known O(m log m)-time algorithm by Sung et al

    Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time

    Get PDF
    Computing genetic evolution distances among a set of taxa dominates the running time of many phylogenetic inference methods. Most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method

    Efficient Computation of Sequence Mappability

    Get PDF
    Sequence mappability is an important task in genome re-sequencing. In the (k,m)(k,m)-mappability problem, for a given sequence TT of length nn, our goal is to compute a table whose iith entry is the number of indices jij \ne i such that length-mm substrings of TT starting at positions ii and jj have at most kk mismatches. Previous works on this problem focused on heuristic approaches to compute a rough approximation of the result or on the case of k=1k=1. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that works in O(nmin{mk,logk+1n})\mathcal{O}(n \min\{m^k,\log^{k+1} n\}) time and O(n)\mathcal{O}(n) space for k=O(1)k=\mathcal{O}(1). It requires a carefu l adaptation of the technique of Cole et al.~[STOC 2004] to avoid multiple counting of pairs of substrings. We also show O(n2)\mathcal{O}(n^2)-time algorithms to compute all results for a fixed mm and all k=0,,mk=0,\ldots,m or a fixed kk and all m=k,,n1m=k,\ldots,n-1. Finally we show that the (k,m)(k,m)-mappability problem cannot be solved in strongly subquadratic time for k,m=Θ(logn)k,m = \Theta(\log n) unless the Strong Exponential Time Hypothesis fails.Comment: Accepted to SPIRE 201

    Mode Coupling relaxation scenario in a confined glass former

    Full text link
    Molecular dynamics simulations of a Lennard-Jones binary mixture confined in a disordered array of soft spheres are presented. The single particle dynamical behavior of the glass former is examined upon supercooling. Predictions of mode coupling theory are satisfied by the confined liquid. Estimates of the crossover temperature are obtained by power law fit to the diffusion coefficients and relaxation times of the late α\alpha region. The bb exponent of the von Schweidler law is also evaluated. Similarly to the bulk, different values of the exponent γ\gamma are extracted from the power law fit to the diffusion coefficients and relaxation times.Comment: 5 pages, 4 figures, changes in the text, accepted for publication on Europhysics Letter