12,381 research outputs found
Bayesian Structural Inference for Hidden Processes
We introduce a Bayesian approach to discovering patterns in structurally
complex processes. The proposed method of Bayesian Structural Inference (BSI)
relies on a set of candidate unifilar HMM (uHMM) topologies for inference of
process structure from a data series. We employ a recently developed exact
enumeration of topological epsilon-machines. (A sequel then removes the
topological restriction.) This subset of the uHMM topologies has the added
benefit that inferred models are guaranteed to be epsilon-machines,
irrespective of estimated transition probabilities. Properties of
epsilon-machines and uHMMs allow for the derivation of analytic expressions for
estimating transition probabilities, inferring start states, and comparing the
posterior probability of candidate model topologies, despite process internal
structure being only indirectly present in data. We demonstrate BSI's
effectiveness in estimating a process's randomness, as reflected by the Shannon
entropy rate, and its structure, as quantified by the statistical complexity.
We also compare using the posterior distribution over candidate models and the
single, maximum a posteriori model for point estimation and show that the
former more accurately reflects uncertainty in estimated values. We apply BSI
to in-class examples of finite- and infinite-order Markov processes, as well to
an out-of-class, infinite-state hidden process.Comment: 20 pages, 11 figures, 1 table; supplementary materials, 15 pages, 11
figures, 6 tables; http://csc.ucdavis.edu/~cmg/compmech/pubs/bsihp.ht
Recommended from our members
Machine learning : techniques and foundations
The field of machine learning studies computational methods for acquiring new knowledge, new skills, and new ways to organize existing knowledge. In this paper we present some of the basic techniques and principles that underlie AI research on learning, including methods for learning from examples, learning in problem solving, learning by analogy, grammar acquisition, and machine discovery. In each case, we illustrate the techniques with paradigmatic examples
Structure induction by lossless graph compression
This work is motivated by the necessity to automate the discovery of
structure in vast and evergrowing collection of relational data commonly
represented as graphs, for example genomic networks. A novel algorithm, dubbed
Graphitour, for structure induction by lossless graph compression is presented
and illustrated by a clear and broadly known case of nested structure in a DNA
molecule. This work extends to graphs some well established approaches to
grammatical inference previously applied only to strings. The bottom-up graph
compression problem is related to the maximum cardinality (non-bipartite)
maximum cardinality matching problem. The algorithm accepts a variety of graph
types including directed graphs and graphs with labeled nodes and arcs. The
resulting structure could be used for representation and classification of
graphs.Comment: 10 pages, 7 figures, 2 tables published in Proceedings of the Data
Compression Conference, 200
The Effect of Non-tightness on Bayesian Estimation of PCFGs
Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the âprobabilitiesâ of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian estimation of PCFGs. We begin by presenting the notion of âalmost everywhere tight grammars â and show that linear CFGs follow it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically.
SkILL - a Stochastic Inductive Logic Learner
Probabilistic Inductive Logic Programming (PILP) is a rel- atively unexplored
area of Statistical Relational Learning which extends classic Inductive Logic
Programming (ILP). This work introduces SkILL, a Stochastic Inductive Logic
Learner, which takes probabilistic annotated data and produces First Order
Logic theories. Data in several domains such as medicine and bioinformatics
have an inherent degree of uncer- tainty, that can be used to produce models
closer to reality. SkILL can not only use this type of probabilistic data to
extract non-trivial knowl- edge from databases, but it also addresses
efficiency issues by introducing a novel, efficient and effective search
strategy to guide the search in PILP environments. The capabilities of SkILL
are demonstrated in three dif- ferent datasets: (i) a synthetic toy example
used to validate the system, (ii) a probabilistic adaptation of a well-known
biological metabolism ap- plication, and (iii) a real world medical dataset in
the breast cancer domain. Results show that SkILL can perform as well as a
deterministic ILP learner, while also being able to incorporate probabilistic
knowledge that would otherwise not be considered
- âŠ