106 research outputs found
Compact Recognizers of Episode Sequences
Abstract Mikhail J. Atallah t Purdue University Given two strings T = at ... an and P = hI .. .h m over an alphabet E, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple D(nlog lEI) time preprocessing ofT makes it easy to decide subsequently for any P and in at most IPJIog lEI character comparisons, whether P is a subsequence of T. These problems become more complicated if onc asks instead whether P occurs as a subsequence of some substring Y of T of bounded length. This paper presents an automaton built on the textstring T and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P, it is meant that P is not a subsequence of any proper substring of Y. For every minimal substring Y, the automaton recognizes the occurrence of P having lexicographically smallest sequence of symbol positions in Y. It is not difficult to realize such an automaton in time and space 0(n 2 ) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O(nlogn), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the respective complexities of off-line exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried out in time O(n + k l rocc, . i . log n . log I~I), where rocc, is the number of distinct minimal substrings of T having b 1 ... b; as a subsequence. All log factors appearing in the above bounds can be further reduced to log log by resort to known integer-handling data structures. Index Terms -Algorithms, pattern matching, subsequence and episode searching, DAWG, suffix automaton, compact subsequence automaton, skip-edge DAWG, forward failure function, skip-link
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
A generating function for bit strings with no Grand Dyck pattern matching
Abstract
We study the construction and the enumeration of bit strings, or binary words in {0, 1}*, having more 1's than 0's and avoiding a set of Grand Dyck patterns which form a cross-bifix-free set. We give a particular jumping and marked succession rule which describes the growth of such words according to the number of 1's. Then, we give the enumeration of the class by means of generating function
Off-Policy Actor-Critic
This paper presents the first actor-critic algorithm for off-policy
reinforcement learning. Our algorithm is online and incremental, and its
per-time-step complexity scales linearly with the number of learned weights.
Previous work on actor-critic algorithms is limited to the on-policy setting
and does not take advantage of the recent advances in off-policy gradient
temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable
a target policy to be learned while following and obtaining data from another
(behavior) policy. For many problems, however, actor-critic methods are more
practical than action value methods (like Greedy-GQ) because they explicitly
represent the policy; consequently, the policy can be stochastic and utilize a
large action space. In this paper, we illustrate how to practically combine the
generality and learning potential of off-policy learning with the flexibility
in action selection given by actor-critic methods. We derive an incremental,
linear time and space complexity algorithm that includes eligibility traces,
prove convergence under assumptions similar to previous off-policy algorithms,
and empirically show better or comparable performance to existing algorithms on
standard reinforcement-learning benchmark problems.Comment: Full version of the paper, appendix and errata included; Proceedings
of the 2012 International Conference on Machine Learnin
Recommended from our members
Classification of time series patterns from complex dynamic systems
An increasing availability of high-performance computing and data storage media at decreasing cost is making possible the proliferation of large-scale numerical databases and data warehouses. Numeric warehousing enterprises on the order of hundreds of gigabytes to terabytes are a reality in many fields such as finance, retail sales, process systems monitoring, biomedical monitoring, surveillance and transportation. Large-scale databases are becoming more accessible to larger user communities through the internet, web-based applications and database connectivity. Consequently, most researchers now have access to a variety of massive datasets. This trend will probably only continue to grow over the next several years. Unfortunately, the availability of integrated tools to explore, analyze and understand the data warehoused in these archives is lagging far behind the ability to gain access to the same data. In particular, locating and identifying patterns of interest in numerical time series data is an increasingly important problem for which there are few available techniques. Temporal pattern recognition poses many interesting problems in classification, segmentation, prediction, diagnosis and anomaly detection. This research focuses on the problem of classification or characterization of numerical time series data. Highway vehicles and their drivers are examples of complex dynamic systems (CDS) which are being used by transportation agencies for field testing to generate large-scale time series datasets. Tools for effective analysis of numerical time series in databases generated by highway vehicle systems are not yet available, or have not been adapted to the target problem domain. However, analysis tools from similar domains may be adapted to the problem of classification of numerical time series data
Simulating and Reconstructing Neurodynamics with Epsilon-Automata Applied to Electroencephalography (EEG) Microstate Sequences
We introduce new techniques to the analysis of neural spatiotemporal dynamics
via applying -machine reconstruction to electroencephalography (EEG)
microstate sequences. Microstates are short duration quasi-stable states of the
dynamically changing electrical field topographies recorded via an array of
electrodes from the human scalp, and cluster into four canonical classes. The
sequence of microstates observed under particular conditions can be considered
an information source with unknown underlying structure. -machines
are discrete dynamical system automata with state-dependent probabilities on
different future observations (in this case the next measured EEG microstate).
They artificially reproduce underlying structure in an optimally predictive
manner as generative models exhibiting dynamics emulating the behaviour of the
source. Here we present experiments using both simulations and empirical data
supporting the value of associating these discrete dynamical systems with
mental states (e.g. mind-wandering, focused attention, etc.) and with clinical
populations. The neurodynamics of mental states and clinical populations can
then be further characterized by properties of these dynamical systems,
including: i) statistical complexity (determined by the number of states of the
corresponding -automaton); ii) entropy rate; iii) characteristic
sequence patterning (syntax, probabilistic grammars); iv) duration, persistence
and stability of dynamical patterns; and v) algebraic measures such as
Krohn-Rhodes complexity or holonomy length of the decompositions of these. The
potential applications include the characterization of mental states in
neurodynamic terms for mental health diagnostics, well-being interventions,
human-machine interface, and others on both subject-specific and
group/population-level
- …