342 research outputs found
Universality of Bayesian mixture predictors
The problem is that of sequential probability forecasting for finite-valued
time series. The data is generated by an unknown probability distribution over
the space of all one-way infinite sequences. It is known that this measure
belongs to a given set C, but the latter is completely arbitrary (uncountably
infinite, without any structure given). The performance is measured with
asymptotic average log loss. In this work it is shown that the minimax
asymptotic performance is always attainable, and it is attained by a convex
combination of a countably many measures from the set C (a Bayesian mixture).
This was previously only known for the case when the best achievable asymptotic
error is 0. This also contrasts previous results that show that in the
non-realizable case all Bayesian mixtures may be suboptimal, while there is a
predictor that achieves the optimal performance
Hypotheses testing on infinite random graphs
Drawing on some recent results that provide the formalism necessary to
definite stationarity for infinite random graphs, this paper initiates the
study of statistical and learning questions pertaining to these objects.
Specifically, a criterion for the existence of a consistent test for complex
hypotheses is presented, generalizing the corresponding results on time series.
As an application, it is shown how one can test that a tree has the Markov
property, or, more generally, to estimate its memory
On sample complexity for computational pattern recognition
In statistical setting of the pattern recognition problem the number of
examples required to approximate an unknown labelling function is linear in the
VC dimension of the target learning class. In this work we consider the
question whether such bounds exist if we restrict our attention to computable
pattern recognition methods, assuming that the unknown labelling function is
also computable. We find that in this case the number of examples required for
a computable method to approximate the labelling function not only is not
linear, but grows faster (in the VC dimension of the class) than any computable
function. No time or space constraints are put on the predictors or target
functions; the only resource we consider is the training examples.
The task of pattern recognition is considered in conjunction with another
learning problem -- data compression. An impossibility result for the task of
data compression allows us to estimate the sample complexity for pattern
recognition
Independence clustering (without a matrix)
The independence clustering problem is considered in the following
formulation: given a set of random variables, it is required to find the
finest partitioning of into clusters such that the
clusters are mutually independent. Since mutual independence is
the target, pairwise similarity measurements are of no use, and thus
traditional clustering algorithms are inapplicable. The distribution of the
random variables in is, in general, unknown, but a sample is available.
Thus, the problem is cast in terms of time series. Two forms of sampling are
considered: i.i.d.\ and stationary time series, with the main emphasis being on
the latter, more general, case. A consistent, computationally tractable
algorithm for each of the settings is proposed, and a number of open directions
for further research are outlined
- …