29,704 research outputs found
A Method for Compressing Parameters in Bayesian Models with Application to Logistic Sequence Prediction Models
Bayesian classification and regression with high order interactions is
largely infeasible because Markov chain Monte Carlo (MCMC) would need to be
applied with a great many parameters, whose number increases rapidly with the
order. In this paper we show how to make it feasible by effectively reducing
the number of parameters, exploiting the fact that many interactions have the
same values for all training cases. Our method uses a single ``compressed''
parameter to represent the sum of all parameters associated with a set of
patterns that have the same value for all training cases. Using symmetric
stable distributions as the priors of the original parameters, we can easily
find the priors of these compressed parameters. We therefore need to deal only
with a much smaller number of compressed parameters when training the model
with MCMC. The number of compressed parameters may have converged before
considering the highest possible order. After training the model, we can split
these compressed parameters into the original ones as needed to make
predictions for test cases. We show in detail how to compress parameters for
logistic sequence prediction models. Experiments on both simulated and real
data demonstrate that a huge number of parameters can indeed be reduced by our
compression method.Comment: 29 page
Conflict and Computation on Wikipedia: a Finite-State Machine Analysis of Editor Interactions
What is the boundary between a vigorous argument and a breakdown of
relations? What drives a group of individuals across it? Taking Wikipedia as a
test case, we use a hidden Markov model to approximate the computational
structure and social grammar of more than a decade of cooperation and conflict
among its editors. Across a wide range of pages, we discover a bursty war/peace
structure where the systems can become trapped, sometimes for months, in a
computational subspace associated with significantly higher levels of
conflict-tracking "revert" actions. Distinct patterns of behavior characterize
the lower-conflict subspace, including tit-for-tat reversion. While a fraction
of the transitions between these subspaces are associated with top-down actions
taken by administrators, the effects are weak. Surprisingly, we find no
statistical signal that transitions are associated with the appearance of
particularly anti-social users, and only weak association with significant news
events outside the system. These findings are consistent with transitions being
driven by decentralized processes with no clear locus of control. Models of
belief revision in the presence of a common resource for information-sharing
predict the existence of two distinct phases: a disordered high-conflict phase,
and a frozen phase with spontaneously-broken symmetry. The bistability we
observe empirically may be a consequence of editor turn-over, which drives the
system to a critical point between them.Comment: 23 pages, 3 figures. Matches published version. Code for HMM fitting
available at http://bit.ly/sfihmm ; time series and derived finite state
machines at bit.ly/wiki_hm
An audio-based sports video segmentation and event detection algorithm
In this paper, we present an audio-based event detection algorithm shown to be effective when applied to Soccer video. The main benefit of this approach is the ability to recognise patterns that display high levels of crowd response correlated to key events. The soundtrack from a Soccer sequence is first parameterised using Mel-frequency Cepstral coefficients. It is then segmented into homogenous components using a windowing algorithm with a decision process based on Bayesian model selection. This decision process eliminated the need for defining a heuristic set of rules for segmentation. Each audio segment is then labelled using a series of Hidden Markov model (HMM) classifiers, each a representation of one of 6 predefined semantic content classes found in Soccer video. Exciting events are identified as those segments belonging to a crowd cheering class. Experimentation indicated that the algorithm was more effective for classifying crowd response when compared to traditional model-based segmentation and classification techniques
Algebraic statistical models
Many statistical models are algebraic in that they are defined in terms of
polynomial constraints, or in terms of polynomial or rational parametrizations.
The parameter spaces of such models are typically semi-algebraic subsets of the
parameter space of a reference model with nice properties, such as for example
a regular exponential family. This observation leads to the definition of an
`algebraic exponential family'. This new definition provides a unified
framework for the study of statistical models with algebraic structure. In this
paper we review the ingredients to this definition and illustrate in examples
how computational algebraic geometry can be used to solve problems arising in
statistical inference in algebraic models
- ā¦