106 research outputs found

    Compact Recognizers of Episode Sequences

    Get PDF
    Abstract Mikhail J. Atallah t Purdue University Given two strings T = at ... an and P = hI .. .h m over an alphabet E, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple D(nlog lEI) time preprocessing ofT makes it easy to decide subsequently for any P and in at most IPJIog lEI character comparisons, whether P is a subsequence of T. These problems become more complicated if onc asks instead whether P occurs as a subsequence of some substring Y of T of bounded length. This paper presents an automaton built on the textstring T and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P, it is meant that P is not a subsequence of any proper substring of Y. For every minimal substring Y, the automaton recognizes the occurrence of P having lexicographically smallest sequence of symbol positions in Y. It is not difficult to realize such an automaton in time and space 0(n 2 ) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O(nlogn), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the respective complexities of off-line exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried out in time O(n + k l rocc, . i . log n . log I~I), where rocc, is the number of distinct minimal substrings of T having b 1 ... b; as a subsequence. All log factors appearing in the above bounds can be further reduced to log log by resort to known integer-handling data structures. Index Terms -Algorithms, pattern matching, subsequence and episode searching, DAWG, suffix automaton, compact subsequence automaton, skip-edge DAWG, forward failure function, skip-link

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Classification of time series patterns from complex dynamic systems

    Full text link

    A generating function for bit strings with no Grand Dyck pattern matching

    Get PDF
    Abstract We study the construction and the enumeration of bit strings, or binary words in {0, 1}*, having more 1's than 0's and avoiding a set of Grand Dyck patterns which form a cross-bifix-free set. We give a particular jumping and marked succession rule which describes the growth of such words according to the number of 1's. Then, we give the enumeration of the class by means of generating function

    Off-Policy Actor-Critic

    Get PDF
    This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable a target policy to be learned while following and obtaining data from another (behavior) policy. For many problems, however, actor-critic methods are more practical than action value methods (like Greedy-GQ) because they explicitly represent the policy; consequently, the policy can be stochastic and utilize a large action space. In this paper, we illustrate how to practically combine the generality and learning potential of off-policy learning with the flexibility in action selection given by actor-critic methods. We derive an incremental, linear time and space complexity algorithm that includes eligibility traces, prove convergence under assumptions similar to previous off-policy algorithms, and empirically show better or comparable performance to existing algorithms on standard reinforcement-learning benchmark problems.Comment: Full version of the paper, appendix and errata included; Proceedings of the 2012 International Conference on Machine Learnin

    Simulating and Reconstructing Neurodynamics with Epsilon-Automata Applied to Electroencephalography (EEG) Microstate Sequences

    Get PDF
    We introduce new techniques to the analysis of neural spatiotemporal dynamics via applying ϵ\epsilon-machine reconstruction to electroencephalography (EEG) microstate sequences. Microstates are short duration quasi-stable states of the dynamically changing electrical field topographies recorded via an array of electrodes from the human scalp, and cluster into four canonical classes. The sequence of microstates observed under particular conditions can be considered an information source with unknown underlying structure. ϵ\epsilon-machines are discrete dynamical system automata with state-dependent probabilities on different future observations (in this case the next measured EEG microstate). They artificially reproduce underlying structure in an optimally predictive manner as generative models exhibiting dynamics emulating the behaviour of the source. Here we present experiments using both simulations and empirical data supporting the value of associating these discrete dynamical systems with mental states (e.g. mind-wandering, focused attention, etc.) and with clinical populations. The neurodynamics of mental states and clinical populations can then be further characterized by properties of these dynamical systems, including: i) statistical complexity (determined by the number of states of the corresponding ϵ\epsilon-automaton); ii) entropy rate; iii) characteristic sequence patterning (syntax, probabilistic grammars); iv) duration, persistence and stability of dynamical patterns; and v) algebraic measures such as Krohn-Rhodes complexity or holonomy length of the decompositions of these. The potential applications include the characterization of mental states in neurodynamic terms for mental health diagnostics, well-being interventions, human-machine interface, and others on both subject-specific and group/population-level

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF
    corecore