1,104 research outputs found
Permutation Complexity via Duality between Values and Orderings
We study the permutation complexity of finite-state stationary stochastic
processes based on a duality between values and orderings between values.
First, we establish a duality between the set of all words of a fixed length
and the set of all permutations of the same length. Second, on this basis, we
give an elementary alternative proof of the equality between the permutation
entropy rate and the entropy rate for a finite-state stationary stochastic
processes first proved in [Amigo, J.M., Kennel, M. B., Kocarev, L., 2005.
Physica D 210, 77-95]. Third, we show that further information on the
relationship between the structure of values and the structure of orderings for
finite-state stationary stochastic processes beyond the entropy rate can be
obtained from the established duality. In particular, we prove that the
permutation excess entropy is equal to the excess entropy, which is a measure
of global correlation present in a stationary stochastic process, for
finite-state stationary ergodic Markov processes.Comment: 26 page
Permutation Complexity and Coupling Measures in Hidden Markov Models
In [Haruna, T. and Nakajima, K., 2011. Physica D 240, 1370-1377], the authors
introduced the duality between values (words) and orderings (permutations) as a
basis to discuss the relationship between information theoretic measures for
finite-alphabet stationary stochastic processes and their permutation
analogues. It has been used to give a simple proof of the equality between the
entropy rate and the permutation entropy rate for any finite-alphabet
stationary stochastic process and show some results on the excess entropy and
the transfer entropy for finite-alphabet stationary ergodic Markov processes.
In this paper, we extend our previous results to hidden Markov models and show
the equalities between various information theoretic complexity and coupling
measures and their permutation analogues. In particular, we show the following
two results within the realm of hidden Markov models with ergodic internal
processes: the two permutation analogues of the transfer entropy, the symbolic
transfer entropy and the transfer entropy on rank vectors, are both equivalent
to the transfer entropy if they are considered as the rates, and the directed
information theory can be captured by the permutation entropy approach.Comment: 26 page
Thermodynamic Analysis of Interacting Nucleic Acid Strands
Motivated by the analysis of natural and engineered DNA and RNA systems, we present the first algorithm for calculating the partition function of an unpseudoknotted complex of multiple interacting nucleic acid strands. This dynamic program is based on a rigorous extension of secondary structure models to the multistranded case, addressing representation and distinguishability issues that do not arise for single-stranded structures. We then derive the form of the partition function for a fixed volume containing a dilute solution of nucleic acid complexes. This expression can be evaluated explicitly for small numbers of strands, allowing the calculation of the equilibrium population distribution for each species of complex. Alternatively, for large systems (e.g., a test tube), we show that the unique complex concentrations corresponding to thermodynamic equilibrium can be obtained by solving a convex programming problem. Partition function and concentration information can then be used to calculate equilibrium base-pairing observables. The underlying physics and mathematical formulation of these problems lead to an interesting blend of approaches, including ideas from graph theory, group theory, dynamic programming, combinatorics, convex optimization, and Lagrange duality
Convex Relaxations for Permutation Problems
Seriation seeks to reconstruct a linear order between variables using
unsorted, pairwise similarity information. It has direct applications in
archeology and shotgun gene sequencing for example. We write seriation as an
optimization problem by proving the equivalence between the seriation and
combinatorial 2-SUM problems on similarity matrices (2-SUM is a quadratic
minimization problem over permutations). The seriation problem can be solved
exactly by a spectral algorithm in the noiseless case and we derive several
convex relaxations for 2-SUM to improve the robustness of seriation solutions
in noisy settings. These convex relaxations also allow us to impose structural
constraints on the solution, hence solve semi-supervised seriation problems. We
derive new approximation bounds for some of these relaxations and present
numerical experiments on archeological data, Markov chains and DNA assembly
from shotgun gene sequencing data.Comment: Final journal version, a few typos and references fixe
The Complexity of Order Type Isomorphism
The order type of a point set in maps each -tuple of points to
its orientation (e.g., clockwise or counterclockwise in ). Two point sets
and have the same order type if there exists a mapping from to
for which every -tuple of and the
corresponding tuple in have the same
orientation. In this paper we investigate the complexity of determining whether
two point sets have the same order type. We provide an algorithm for
this task, thereby improving upon the algorithm
of Goodman and Pollack (1983). The algorithm uses only order type queries and
also works for abstract order types (or acyclic oriented matroids). Our
algorithm is optimal, both in the abstract setting and for realizable points
sets if the algorithm only uses order type queries.Comment: Preliminary version of paper to appear at ACM-SIAM Symposium on
Discrete Algorithms (SODA14
A Dynamic Data Structure to Efficiently Find the Points below a Line and Estimate Their Number
A basic question in computational geometry is how to find the relationship between a set of points and a line in a real plane. In this paper, we present multidimensional data structures for N points that allow answering the following queries for any given input line: (1) estimate in O(log N) time the number of points below the line; (2) return in O(log N + k) time the k ≤ N points that are below the line; and (3) return in O(log N) time the point that is closest to the line. We illustrate the utility of this computational question with GIS applications in air defense and traffic control
- …