Search CORE

2,395 research outputs found

Learning Tree Distributions by Hidden Markov Models

Author: Bacciu Davide
Castellana Daniele
Publication venue
Publication date: 01/01/2018
Field of study

Hidden tree Markov models allow learning distributions for tree structured data while being interpretable as nondeterministic automata. We provide a concise summary of the main approaches in literature, focusing in particular on the causality assumptions introduced by the choice of a specific tree visit direction. We will then sketch a novel non-parametric generalization of the bottom-up hidden tree Markov model with its interpretation as a nondeterministic tree automaton with infinite states.Comment: Accepted in LearnAut2018 worksho

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

DNA ANALYSIS USING GRAMMATICAL INFERENCE

Author: Cook Cory
Publication venue: SJSU ScholarWorks
Publication date: 14/06/2016
Field of study

An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

SJSU ScholarWorks

Learning probability distributions generated by finite-state machines

Author: Castro Rabal Jorge
Gavaldà Mestre Ricard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for nondeterministic probabilistic automata. In both cases, we derive them from a high-level algorithm described in terms of the Hankel matrix of the distribution to be learned, given as an oracle, and then describe how to adapt that algorithm to account for the error introduced by a finite sample.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Fuzzy Automata: A Quantitative Review

Author: Rahul Kumar Singh, Akshama Rani, Manoj Kumar Sachan
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/07/2017
Field of study

Classical automata theory cannot deal with the system uncertainty. To deal with the system uncertainty the concept of fuzzy finite automata was proposed. Fuzzy automata can be used in diverse applications such as fault detection, pattern matching, measuring the fuzziness between strings, description of natural languages, neural network, lexical analysis, image processing, scheduling problem and many more. In this paper, a methodical literature review is carried out on various research works in the field of Fuzzy automata and explained the challenging issues in the field of fuzzy automata

International Journal on Future Revolution in Computer Science & Communication Engineering

Computation of distances for regular and context-free probabilistic languages

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue
Publication date: 01/01/2008
Field of study

Several mathematical distances between probabilistic languages have been investigated in the literature, motivated by applications in language modeling, computational biology, syntactic pattern matching and machine learning. In most cases, only pairs of probabilistic regular languages were considered. In this paper we extend the previous results to pairs of languages generated by a probabilistic context-free grammar and a probabilistic finite automaton.PostprintPeer reviewe

Elsevier - Publisher Connector

Crossref

Archivio istituzionale della ricerca - Università di Padova

University of St. Andrews - Pure

St Andrews Research Repository

Fast Lexically Constrained Viterbi Algorithm (FLCVA): Simultaneous Optimization of Speed and Memory

Author: Lifchitz Alain
Maire Frederic
Revuz Dominique
Publication venue
Publication date: 01/01/2006
Field of study

Lexical constraints on the input of speech and on-line handwriting systems improve the performance of such systems. A significant gain in speed can be achieved by integrating in a digraph structure the different Hidden Markov Models (HMM) corresponding to the words of the relevant lexicon. This integration avoids redundant computations by sharing intermediate results between HMM's corresponding to different words of the lexicon. In this paper, we introduce a token passing method to perform simultaneously the computation of the a posteriori probabilities of all the words of the lexicon. The coding scheme that we introduce for the tokens is optimal in the information theory sense. The tokens use the minimum possible number of bits. Overall, we optimize simultaneously the execution speed and the memory requirement of the recognition systems.Comment: 5 pages, 2 figures, 4 table

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM