4,594 research outputs found
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
The relationship between the Bayesian approach and the minimum description
length approach is established. We sharpen and clarify the general modeling
principles MDL and MML, abstracted as the ideal MDL principle and defined from
Bayes's rule by means of Kolmogorov complexity. The basic condition under which
the ideal principle should be applied is encapsulated as the Fundamental
Inequality, which in broad terms states that the principle is valid when the
data are random, relative to every contemplated hypothesis and also these
hypotheses are random relative to the (universal) prior. Basically, the ideal
principle states that the prior probability associated with the hypothesis
should be given by the algorithmic universal probability, and the sum of the
log universal probability of the model plus the log of the probability of the
data given the model should be minimized. If we restrict the model class to the
finite sets then application of the ideal principle turns into Kolmogorov's
minimal sufficient statistic. In general we show that data compression is
almost always the best strategy, both in hypothesis identification and
prediction.Comment: 35 pages, Latex. Submitted IEEE Trans. Inform. Theor
A sequential sampling strategy for extreme event statistics in nonlinear dynamical systems
We develop a method for the evaluation of extreme event statistics associated
with nonlinear dynamical systems, using a small number of samples. From an
initial dataset of design points, we formulate a sequential strategy that
provides the 'next-best' data point (set of parameters) that when evaluated
results in improved estimates of the probability density function (pdf) for a
scalar quantity of interest. The approach utilizes Gaussian process regression
to perform Bayesian inference on the parameter-to-observation map describing
the quantity of interest. We then approximate the desired pdf along with
uncertainty bounds utilizing the posterior distribution of the inferred map.
The 'next-best' design point is sequentially determined through an optimization
procedure that selects the point in parameter space that maximally reduces
uncertainty between the estimated bounds of the pdf prediction. Since the
optimization process utilizes only information from the inferred map it has
minimal computational cost. Moreover, the special form of the metric emphasizes
the tails of the pdf. The method is practical for systems where the
dimensionality of the parameter space is of moderate size, i.e. order O(10). We
apply the method to estimate the extreme event statistics for a very
high-dimensional system with millions of degrees of freedom: an offshore
platform subjected to three-dimensional irregular waves. It is demonstrated
that the developed approach can accurately determine the extreme event
statistics using limited number of samples
Recommended from our members
Semantic Concept Co-Occurrence Patterns for Image Annotation and Retrieval.
Describing visual image contents by semantic concepts is an effective and straightforward way to facilitate various high level applications. Inferring semantic concepts from low-level pictorial feature analysis is challenging due to the semantic gap problem, while manually labeling concepts is unwise because of a large number of images in both online and offline collections. In this paper, we present a novel approach to automatically generate intermediate image descriptors by exploiting concept co-occurrence patterns in the pre-labeled training set that renders it possible to depict complex scene images semantically. Our work is motivated by the fact that multiple concepts that frequently co-occur across images form patterns which could provide contextual cues for individual concept inference. We discover the co-occurrence patterns as hierarchical communities by graph modularity maximization in a network with nodes and edges representing concepts and co-occurrence relationships separately. A random walk process working on the inferred concept probabilities with the discovered co-occurrence patterns is applied to acquire the refined concept signature representation. Through experiments in automatic image annotation and semantic image retrieval on several challenging datasets, we demonstrate the effectiveness of the proposed concept co-occurrence patterns as well as the concept signature representation in comparison with state-of-the-art approaches
The use of data-mining for the automatic formation of tactics
This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques
Automated Classification of Periodic Variable Stars detected by the Wide-field Infrared Survey Explorer
We describe a methodology to classify periodic variable stars identified
using photometric time-series measurements constructed from the Wide-field
Infrared Survey Explorer (WISE) full-mission single-exposure Source Databases.
This will assist in the future construction of a WISE Variable Source Database
that assigns variables to specific science classes as constrained by the WISE
observing cadence with statistically meaningful classification probabilities.
We have analyzed the WISE light curves of 8273 variable stars identified in
previous optical variability surveys (MACHO, GCVS, and ASAS) and show that
Fourier decomposition techniques can be extended into the mid-IR to assist with
their classification. Combined with other periodic light-curve features, this
sample is then used to train a machine-learned classifier based on the random
forest (RF) method. Consistent with previous classification studies of variable
stars in general, the RF machine-learned classifier is superior to other
methods in terms of accuracy, robustness against outliers, and relative
immunity to features that carry little or redundant class information. For the
three most common classes identified by WISE: Algols, RR Lyrae, and W Ursae
Majoris type variables, we obtain classification efficiencies of 80.7%, 82.7%,
and 84.5% respectively using cross-validation analyses, with 95% confidence
intervals of approximately +/-2%. These accuracies are achieved at purity (or
reliability) levels of 88.5%, 96.2%, and 87.8% respectively, similar to that
achieved in previous automated classification studies of periodic variable
stars.Comment: 48 pages, 17 figures, 1 table, accepted by A
Bayesian stochastic blockmodeling
This chapter provides a self-contained introduction to the use of Bayesian
inference to extract large-scale modular structures from network data, based on
the stochastic blockmodel (SBM), as well as its degree-corrected and
overlapping generalizations. We focus on nonparametric formulations that allow
their inference in a manner that prevents overfitting, and enables model
selection. We discuss aspects of the choice of priors, in particular how to
avoid underfitting via increased Bayesian hierarchies, and we contrast the task
of sampling network partitions from the posterior distribution with finding the
single point estimate that maximizes it, while describing efficient algorithms
to perform either one. We also show how inferring the SBM can be used to
predict missing and spurious links, and shed light on the fundamental
limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool
at https://graph-tool.skewed.de . See also the HOWTO at
https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm
Security Evaluation of Support Vector Machines in Adversarial Environments
Support Vector Machines (SVMs) are among the most popular classification
techniques adopted in security applications like malware detection, intrusion
detection, and spam filtering. However, if SVMs are to be incorporated in
real-world security systems, they must be able to cope with attack patterns
that can either mislead the learning algorithm (poisoning), evade detection
(evasion), or gain information about their internal parameters (privacy
breaches). The main contributions of this chapter are twofold. First, we
introduce a formal general framework for the empirical evaluation of the
security of machine-learning systems. Second, according to our framework, we
demonstrate the feasibility of evasion, poisoning and privacy attacks against
SVMs in real-world security problems. For each attack technique, we evaluate
its impact and discuss whether (and how) it can be countered through an
adversary-aware design of SVMs. Our experiments are easily reproducible thanks
to open-source code that we have made available, together with all the employed
datasets, on a public repository.Comment: 47 pages, 9 figures; chapter accepted into book 'Support Vector
Machine Applications
- …