Search CORE

8,091 research outputs found

Wavelet analysis on symbolic sequences and two-fold de Bruijn sequences

Author: Osipov Vladimir Al.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/01/2016
Field of study

The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of {\it two-fold de Bruijn sequences}, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied

arXiv.org e-Print Archive

Lund University Publications

Partitioning de Bruijn Graphs into Fixed-Length Cycles for Robot Identification and Tracking

Author: Grubman Tony
Wood David R.
Şekercioğlu Y. Ahmet
Publication venue: 'Elsevier BV'
Publication date: 11/01/2016
Field of study

We propose a new camera-based method of robot identification, tracking and orientation estimation. The system utilises coloured lights mounted in a circle around each robot to create unique colour sequences that are observed by a camera. The number of robots that can be uniquely identified is limited by the number of colours available,

q

, the number of lights on each robot,

k

, and the number of consecutive lights the camera can see,

\ell

. For a given set of parameters, we would like to maximise the number of robots that we can use. We model this as a combinatorial problem and show that it is equivalent to finding the maximum number of disjoint

k

-cycles in the de Bruijn graph

\text{dB}(q,\ell)

. We provide several existence results that give the maximum number of cycles in

\text{dB}(q,\ell)

in various cases. For example, we give an optimal solution when

k=q^{\ell-1}

. Another construction yields many cycles in larger de Bruijn graphs using cycles from smaller de Bruijn graphs: if

\text{dB}(q,\ell)

can be partitioned into

k

-cycles, then

\text{dB}(q,\ell)

can be partitioned into

tk

-cycles for any divisor

t

k

. The methods used are based on finite field algebra and the combinatorics of words.Comment: 16 pages, 4 figures. Accepted for publication in Discrete Applied Mathematic

arXiv.org e-Print Archive

CiteSeerX

Constant-Weight Gray Codes for Local Rank Modulation

Author: Eyal En Gad
Jehoshua Bruck
Michael Langberg
Moshe Schwartz
Senior Member
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2010
Field of study

We consider the local rank-modulation scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. Local rank- modulation is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory. We study constant-weight Gray codes for the local rank- modulation scheme in order to simulate conventional multi-level flash cells while retaining the benefits of rank modulation. We provide necessary conditions for the existence of cyclic and cyclic optimal Gray codes. We then specifically study codes of weight 2 and upper bound their efficiency, thus proving that there are no such asymptotically-optimal cyclic codes. In contrast, we study codes of weight 3 and efficiently construct codes which are asymptotically-optimal. We conclude with a construction of codes with asymptotically-optimal rate and weight asymptotically half the length, thus having an asymptotically-optimal charge difference between adjacent cells

CiteSeerX

Caltech Authors

MSPKmerCounter: A Fast and Memory Efficient Approach for K-mer Counting

Author: Li Yang
XifengYan
Publication venue
Publication date: 25/05/2015
Field of study

A major challenge in next-generation genome sequencing (NGS) is to assemble massive overlapping short reads that are randomly sampled from DNA fragments. To complete assembling, one needs to finish a fundamental task in many leading assembly algorithms: counting the number of occurrences of k-mers (length-k substrings in sequences). The counting results are critical for many components in assembly (e.g. variants detection and read error correction). For large genomes, the k-mer counting task can easily consume a huge amount of memory, making it impossible for large-scale parallel assembly on commodity servers. In this paper, we develop MSPKmerCounter, a disk-based approach, to efficiently perform k-mer counting for large genomes using a small amount of memory. Our approach is based on a novel technique called Minimum Substring Partitioning (MSP). MSP breaks short reads into multiple disjoint partitions such that each partition can be loaded into memory and processed individually. By leveraging the overlaps among the k-mers derived from the same short read, MSP can achieve astonishing compression ratio so that the I/O cost can be significantly reduced. For the task of k-mer counting, MSPKmerCounter offers a very fast and memory-efficient solution. Experiment results on large real-life short reads data sets demonstrate that MSPKmerCounter can achieve better overall performance than state-of-the-art k-mer counting approaches. MSPKmerCounter is available at http://www.cs.ucsb.edu/~yangli/MSPKmerCounte

arXiv.org e-Print Archive

CiteSeerX

Cellular Probabilistic Automata - A Novel Method for Uncertainty Propagation

Author: Kohler Dominic
Müller Johannes
Wever Utz
Publication venue
Publication date: 05/02/2013
Field of study

We propose a novel density based numerical method for uncertainty propagation under certain partial differential equation dynamics. The main idea is to translate them into objects that we call cellular probabilistic automata and to evolve the latter. The translation is achieved by state discretization as in set oriented numerics and the use of the locality concept from cellular automata theory. We develop the method at the example of initial value uncertainties under deterministic dynamics and prove a consistency result. As an application we discuss arsenate transportation and adsorption in drinking water pipes and compare our results to Monte Carlo computations

arXiv.org e-Print Archive

PuSH

HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

Author: Casiraghi Giona
Eliassi-Rad Tina
LaRock Timothy
Nanumyan Vahan
Scholtes Ingo
Schweitzer Frank
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 29/01/2020
Field of study

The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020

arXiv.org e-Print Archive

Crossref

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Author: Chang Zuling
Chrisnata Johan
Ezerman Martianus Frederic
Kiah Han Mao
Publication venue
Publication date: 08/07/2016
Field of study

A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size

q

, read length

\ell

, and word length

n

.Consequently, we demonstrate that for

q\ge 2

and

n\le q^{\ell/2-1}

, the number of profile vectors is at least

q^{\kappa n}

with

\kappa

very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)