Search CORE

106 research outputs found

Compact Recognizers of Episode Sequences

Author
Publication venue
Publication date: 01/01/1997
Field of study

Abstract Mikhail J. Atallah t Purdue University Given two strings T = at ... an and P = hI .. .h m over an alphabet E, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple D(nlog lEI) time preprocessing ofT makes it easy to decide subsequently for any P and in at most IPJIog lEI character comparisons, whether P is a subsequence of T. These problems become more complicated if onc asks instead whether P occurs as a subsequence of some substring Y of T of bounded length. This paper presents an automaton built on the textstring T and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P, it is meant that P is not a subsequence of any proper substring of Y. For every minimal substring Y, the automaton recognizes the occurrence of P having lexicographically smallest sequence of symbol positions in Y. It is not difficult to realize such an automaton in time and space 0(n 2 ) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O(nlogn), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the respective complexities of off-line exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried out in time O(n + k l rocc, . i . log n . log I~I), where rocc, is the number of distinct minimal substrings of T having b 1 ... b; as a subsequence. All log factors appearing in the above bounds can be further reduced to log log by resort to known integer-handling data structures. Index Terms -Algorithms, pattern matching, subsequence and episode searching, DAWG, suffix automaton, compact subsequence automaton, skip-edge DAWG, forward failure function, skip-link

CiteSeerX

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Classification of time series patterns from complex dynamic systems

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

A generating function for bit strings with no Grand Dyck pattern matching

Author: Antonio Bernini
Elisa Pergola
Renzo Pinzani
Stefano Bilotta
Publication venue
Publication date: 01/09/2015
Field of study

Abstract We study the construction and the enumeration of bit strings, or binary words in {0, 1}*, having more 1's than 0's and avoiding a set of Grand Dyck patterns which form a cross-bifix-free set. We give a particular jumping and marked succession rule which describes the growth of such words according to the number of 1's. Then, we give the enumeration of the class by means of generating function

Open Access Repository

Off-Policy Actor-Critic

Author: Degris Thomas
Sutton Richard S.
White Martha
Publication venue
Publication date: 01/01/2012
Field of study

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable a target policy to be learned while following and obtaining data from another (behavior) policy. For many problems, however, actor-critic methods are more practical than action value methods (like Greedy-GQ) because they explicitly represent the policy; consequently, the policy can be stochastic and utilize a large action space. In this paper, we illustrate how to practically combine the generality and learning potential of off-policy learning with the flexibility in action selection given by actor-critic methods. We derive an incremental, linear time and space complexity algorithm that includes eligibility traces, prove convergence under assumptions similar to previous off-policy algorithms, and empirically show better or comparable performance to existing algorithms on standard reinforcement-learning benchmark problems.Comment: Full version of the paper, appendix and errata included; Proceedings of the 2012 International Conference on Machine Learnin

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Recommended from our members

Classification of time series patterns from complex dynamic systems

Author: Rao N.
Schryver J.C.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/07/1998
Field of study

An increasing availability of high-performance computing and data storage media at decreasing cost is making possible the proliferation of large-scale numerical databases and data warehouses. Numeric warehousing enterprises on the order of hundreds of gigabytes to terabytes are a reality in many fields such as finance, retail sales, process systems monitoring, biomedical monitoring, surveillance and transportation. Large-scale databases are becoming more accessible to larger user communities through the internet, web-based applications and database connectivity. Consequently, most researchers now have access to a variety of massive datasets. This trend will probably only continue to grow over the next several years. Unfortunately, the availability of integrated tools to explore, analyze and understand the data warehoused in these archives is lagging far behind the ability to gain access to the same data. In particular, locating and identifying patterns of interest in numerical time series data is an increasingly important problem for which there are few available techniques. Temporal pattern recognition poses many interesting problems in classification, segmentation, prediction, diagnosis and anomaly detection. This research focuses on the problem of classification or characterization of numerical time series data. Highway vehicles and their drivers are examples of complex dynamic systems (CDS) which are being used by transportation agencies for field testing to generate large-scale time series datasets. Tools for effective analysis of numerical time series in databases generated by highway vehicle systems are not yet available, or have not been adapted to the target problem domain. However, analysis tools from similar domains may be adapted to the problem of classification of numerical time series data

UNT Digital Library

Simulating and Reconstructing Neurodynamics with Epsilon-Automata Applied to Electroencephalography (EEG) Microstate Sequences

Author: Antonova E
Nehaniv CL
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

We introduce new techniques to the analysis of neural spatiotemporal dynamics via applying

\epsilon

-machine reconstruction to electroencephalography (EEG) microstate sequences. Microstates are short duration quasi-stable states of the dynamically changing electrical field topographies recorded via an array of electrodes from the human scalp, and cluster into four canonical classes. The sequence of microstates observed under particular conditions can be considered an information source with unknown underlying structure.

\epsilon

-machines are discrete dynamical system automata with state-dependent probabilities on different future observations (in this case the next measured EEG microstate). They artificially reproduce underlying structure in an optimally predictive manner as generative models exhibiting dynamics emulating the behaviour of the source. Here we present experiments using both simulations and empirical data supporting the value of associating these discrete dynamical systems with mental states (e.g. mind-wandering, focused attention, etc.) and with clinical populations. The neurodynamics of mental states and clinical populations can then be further characterized by properties of these dynamical systems, including: i) statistical complexity (determined by the number of states of the corresponding

\epsilon

-automaton); ii) entropy rate; iii) characteristic sequence patterning (syntax, probabilistic grammars); iv) duration, persistence and stability of dynamical patterns; and v) algebraic measures such as Krohn-Rhodes complexity or holonomy length of the decompositions of these. The potential applications include the characterization of mental states in neurodynamic terms for mental health diagnostics, well-being interventions, human-machine interface, and others on both subject-specific and group/population-level

arXiv.org e-Print Archive

Brunel University Research Archive

Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

Author: Kraaij W.
Larson M.
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 24/07/2008
Field of study

University of Twente Research Information