251 research outputs found

    Some Relationships Between Sequences and Their Kmer Profiles

    Get PDF
    This paper explores kmer profiles in bioinformatics its two applications: one as a model for the reads of genome assembly, the second as a nice representation of DNA sequences. Kmer profiles are simply unordered collections of fixed length substrings (with length k) of DNA sequences; they resemble an idealized form of input genome assemblers receive while has been in the literature used as a fast way to approximate the otherwise expensive edit distance. The obvious question is the choice of k. After using the theory of metric embedding, de Bruijn assembly, and to some extent algebra, the familiar conclusion for genome assembly is recovered: k should be as large as permitted. The conclusion for edit distance approximation is more subtle. Small k loses nice mathematical properties while retaining good computational ones. Large k has good mathematical properties (with a proper metric distortion) while becomes computationally ugly due to the curse of dimensionality.Bachelor of Scienc

    Complex dynamics emerging in Rule 30 with majority memory

    Get PDF
    In cellular automata with memory, the unchanged maps of the conventional cellular automata are applied to cells endowed with memory of their past states in some specified interval. We implement Rule 30 automata with a majority memory and show that using the memory function we can transform quasi-chaotic dynamics of classical Rule 30 into domains of travelling structures with predictable behaviour. We analyse morphological complexity of the automata and classify dynamics of gliders (particles, self-localizations) in memory-enriched Rule 30. We provide formal ways of encoding and classifying glider dynamics using de Bruijn diagrams, soliton reactions and quasi-chemical representations

    HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

    Full text link
    The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020

    Optimal Watermark Embedding and Detection Strategies Under Limited Detection Resources

    Full text link
    An information-theoretic approach is proposed to watermark embedding and detection under limited detector resources. First, we consider the attack-free scenario under which asymptotically optimal decision regions in the Neyman-Pearson sense are proposed, along with the optimal embedding rule. Later, we explore the case of zero-mean i.i.d. Gaussian covertext distribution with unknown variance under the attack-free scenario. For this case, we propose a lower bound on the exponential decay rate of the false-negative probability and prove that the optimal embedding and detecting strategy is superior to the customary linear, additive embedding strategy in the exponential sense. Finally, these results are extended to the case of memoryless attacks and general worst case attacks. Optimal decision regions and embedding rules are offered, and the worst attack channel is identified.Comment: 36 pages, 5 figures. Revised version. Submitted to IEEE Transactions on Information Theor
    • …
    corecore