19,207 research outputs found
Pattern Discovery in Colored Strings
In this paper, we consider the problem of identifying patterns of interest in
colored strings. A colored string is a string where each position is assigned
one of a finite set of colors. Our task is to find substrings of the colored
string that always occur followed by the same color at the same distance. The
problem is motivated by applications in embedded systems verification, in
particular, assertion mining. The goal there is to automatically find
properties of the embedded system from the analysis of its simulation traces.
We show that, in our setting, the number of patterns of interest is
upper-bounded by , where is the length of the string. We
introduce a baseline algorithm, running in time, which
identifies all patterns of interest satisfying certain minimality conditions,
for all colors in the string. For the case where one is interested in patterns
related to one color only, we also provide a second algorithm which runs in
time in the worst case but is faster than the baseline
algorithm in practice. Both solutions use suffix trees, and the second
algorithm also uses an appropriately defined priority queue, which allows us to
reduce the number of computations. We performed an experimental evaluation of
the proposed approaches over both synthetic and real-world datasets, and found
that the second algorithm outperforms the first algorithm on all simulated
data, while on the real-world data, the performance varies between a slight
slowdown (on half of the datasets) and a speedup by a factor of up to 11.Comment: 22 pages, 5 figures, 2 tables, published in ACM Journal of
Experimental Algorithmics. This is the journal version of the paper with the
same title at SEA 2020 (18th Symposium on Experimental Algorithms, Catania,
Italy, June 16-18, 2020
Pattern Discovery in Colored Strings
Publisher Copyright: © 2020 Association for Computing Machiner.In this article, we consider the problem of identifying patterns of interest in colored strings. A colored string is a string where each position is assigned one of a finite set of colors. Our task is to find substrings of the colored string that always occur followed by the same color at the same distance. The problem is motivated by applications in embedded systems verification, in particular, assertion mining. The goal there is to auto matically find properties of the embedded system from the analysis of its simulation traces. We show that, in our setting, the number of patterns of interest is upper-bounded by O(n2), where n is the length of the string. We introduce a baseline algorithm, running in O(n2) time, which identifies all patterns of interest satisfying certain minimality conditions for all colors in the string. For the case where one is interested in patterns related to one color only, we also provide a second algorithm that runs in O(n2 logn) time in the worst case but is faster than the baseline algorithm in practice. Both solutions use suffix trees, and the second algorithm also uses an appropriately defined priority queue, which allows us to reduce the number of computations. We performed an experimental evaluation of the proposed approaches over both synthetic and real-world datasets, and found that the second algorithm outperforms the first algorithm on all simulated data, while on the real-world data, the performance varies between a slight slowdown (on half of the datasets) and a speedup by a factor of up to 11.Peer reviewe
Compressed Subsequence Matching and Packed Tree Coloring
We present a new algorithm for subsequence matching in grammar compressed
strings. Given a grammar of size compressing a string of size and a
pattern string of size over an alphabet of size , our algorithm
uses space and or time. Here
is the word size and is the number of occurrences of the pattern. Our
algorithm uses less space than previous algorithms and is also faster for
occurrences. The algorithm uses a new data structure
that allows us to efficiently find the next occurrence of a given character
after a given position in a compressed string. This data structure in turn is
based on a new data structure for the tree color problem, where the node colors
are packed in bit strings.Comment: To appear at CPM '1
Local Causal States and Discrete Coherent Structures
Coherent structures form spontaneously in nonlinear spatiotemporal systems
and are found at all spatial scales in natural phenomena from laboratory
hydrodynamic flows and chemical reactions to ocean, atmosphere, and planetary
climate dynamics. Phenomenologically, they appear as key components that
organize the macroscopic behaviors in such systems. Despite a century of
effort, they have eluded rigorous analysis and empirical prediction, with
progress being made only recently. As a step in this, we present a formal
theory of coherent structures in fully-discrete dynamical field theories. It
builds on the notion of structure introduced by computational mechanics,
generalizing it to a local spatiotemporal setting. The analysis' main tool
employs the \localstates, which are used to uncover a system's hidden
spatiotemporal symmetries and which identify coherent structures as
spatially-localized deviations from those symmetries. The approach is
behavior-driven in the sense that it does not rely on directly analyzing
spatiotemporal equations of motion, rather it considers only the spatiotemporal
fields a system generates. As such, it offers an unsupervised approach to
discover and describe coherent structures. We illustrate the approach by
analyzing coherent structures generated by elementary cellular automata,
comparing the results with an earlier, dynamic-invariant-set approach that
decomposes fields into domains, particles, and particle interactions.Comment: 27 pages, 10 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/dcs.ht
String theory and the Kauffman polynomial
We propose a new, precise integrality conjecture for the colored Kauffman
polynomial of knots and links inspired by large N dualities and the structure
of topological string theory on orientifolds. According to this conjecture, the
natural knot invariant in an unoriented theory involves both the colored
Kauffman polynomial and the colored HOMFLY polynomial for composite
representations, i.e. it involves the full HOMFLY skein of the annulus. The
conjecture sheds new light on the relationship between the Kauffman and the
HOMFLY polynomials, and it implies for example Rudolph's theorem. We provide
various non-trivial tests of the conjecture and we sketch the string theory
arguments that lead to it.Comment: 36 pages, many figures; references and examples added, typos
corrected, final version to appear in CM
An optimized TOPS+ comparison method for enhanced TOPS models
This article has been made available through the Brunel Open Access Publishing Fund.Background
Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+.
Results
We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method.
Conclusions
Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun
An Attempt to Construct the Standard Model with Monopoles
We construct a model in which stable magnetic monopoles have magnetic charges
that are identical to the electric charges on leptons and quarks and the
colored monopoles are confined by strings in color singlets.Comment: 10 pages; LaTeX Added clarifying remarks, a Comment on the scattering
of particles, acknowledgements and references. Version to be publishe
Macroscopic Strings and "Quirks" at Colliders
We consider extensions of the standard model containing additional heavy
particles ("quirks") charged under a new unbroken non-abelian gauge group as
well as the standard model. We assume that the quirk mass m is in the
phenomenologically interesting range 100 GeV--TeV, and that the new gauge group
gets strong at a scale Lambda < m. In this case breaking of strings is
exponentially suppressed, and quirk production results in strings that are long
compared to 1/Lambda. The existence of these long stable strings leads to
highly exotic events at colliders. For 100 eV < Lambda < keV the strings are
macroscopic, giving rise to events with two separated quirk tracks with
measurable curvature toward each other due to the string interaction. For keV <
Lambda < MeV the typical strings are mesoscopic: too small to resolve in the
detector, but large compared to atomic scales. In this case, the bound state
appears as a single particle, but its mass is the invariant mass of a quirk
pair, which has an event-by-event distribution. For MeV < Lambda < m the
strings are microscopic, and the quirks annihilate promptly within the
detector. For colored quirks, this can lead to hadronic fireball events with
10^3 hadrons with energy of order GeV emitted in conjunction with hard decay
products from the final annihilation.Comment: Added discussion of photon-jet decay, fixed minor typo
- âŠ