23,766 research outputs found
Kernels on Sample Sets via Nonparametric Divergence Estimates
Most machine learning algorithms, such as classification or regression, treat
the individual data point as the object of interest. Here we consider extending
machine learning algorithms to operate on groups of data points. We suggest
treating a group of data points as an i.i.d. sample set from an underlying
feature distribution for that group. Our approach employs kernel machines with
a kernel on i.i.d. sample sets of vectors. We define certain kernel functions
on pairs of distributions, and then use a nonparametric estimator to
consistently estimate those functions based on sample sets. The projection of
the estimated Gram matrix to the cone of symmetric positive semi-definite
matrices enables us to use kernel machines for classification, regression,
anomaly detection, and low-dimensional embedding in the space of distributions.
We present several numerical experiments both on real and simulated datasets to
demonstrate the advantages of our new approach.Comment: Substantially updated version as submitted to T-PAMI. 15 pages
including appendi
Scaling Multidimensional Inference for Structured Gaussian Processes
Exact Gaussian Process (GP) regression has O(N^3) runtime for data size N,
making it intractable for large N. Many algorithms for improving GP scaling
approximate the covariance with lower rank matrices. Other work has exploited
structure inherent in particular covariance functions, including GPs with
implied Markov structure, and equispaced inputs (both enable O(N) runtime).
However, these GP advances have not been extended to the multidimensional input
setting, despite the preponderance of multidimensional applications. This paper
introduces and tests novel extensions of structured GPs to multidimensional
inputs. We present new methods for additive GPs, showing a novel connection
between the classic backfitting method and the Bayesian framework. To achieve
optimal accuracy-complexity tradeoff, we extend this model with a novel variant
of projection pursuit regression. Our primary result -- projection pursuit
Gaussian Process Regression -- shows orders of magnitude speedup while
preserving high accuracy. The natural second and third steps include
non-Gaussian observations and higher dimensional equispaced grid methods. We
introduce novel techniques to address both of these necessary directions. We
thoroughly illustrate the power of these three advances on several datasets,
achieving close performance to the naive Full GP at orders of magnitude less
cost.Comment: 14 page
Budgeted Batch Bayesian Optimization With Unknown Batch Sizes
Parameter settings profoundly impact the performance of machine learning
algorithms and laboratory experiments. The classical grid search or trial-error
methods are exponentially expensive in large parameter spaces, and Bayesian
optimization (BO) offers an elegant alternative for global optimization of
black box functions. In situations where the black box function can be
evaluated at multiple points simultaneously, batch Bayesian optimization is
used. Current batch BO approaches are restrictive in that they fix the number
of evaluations per batch, and this can be wasteful when the number of specified
evaluations is larger than the number of real maxima in the underlying
acquisition function. We present the Budgeted Batch Bayesian Optimization (B3O)
for hyper-parameter tuning and experimental design - we identify the
appropriate batch size for each iteration in an elegant way. To set the batch
size flexible, we use the infinite Gaussian mixture model (IGMM) for
automatically identifying the number of peaks in the underlying acquisition
functions. We solve the intractability of estimating the IGMM directly from the
acquisition function by formulating the batch generalized slice sampling to
efficiently draw samples from the acquisition function. We perform extensive
experiments for both synthetic functions and two real world applications -
machine learning hyper-parameter tuning and experimental design for alloy
hardening. We show empirically that the proposed B3O outperforms the existing
fixed batch BO approaches in finding the optimum whilst requiring a fewer
number of evaluations, thus saving cost and time.Comment: 24 page
Classification using log Gaussian Cox processes
McCullagh and Yang (2006) suggest a family of classification algorithms based
on Cox processes. We further investigate the log Gaussian variant which has a
number of appealing properties. Conditioned on the covariates, the distribution
over labels is given by a type of conditional Markov random field. In the
supervised case, computation of the predictive probability of a single test
point scales linearly with the number of training points and the multiclass
generalization is straightforward. We show new links between the supervised
method and classical nonparametric methods. We give a detailed analysis of the
pairwise graph representable Markov random field, which we use to extend the
model to semi-supervised learning problems, and propose an inference method
based on graph min-cuts. We give the first experimental analysis on supervised
and semi-supervised datasets and show good empirical performance.Comment: 17 pages, 6 figure
Online Multivariate Anomaly Detection and Localization for High-dimensional Settings
This paper considers the real-time detection of anomalies in high-dimensional
systems. The goal is to detect anomalies quickly and accurately so that the
appropriate countermeasures could be taken in time, before the system possibly
gets harmed. We propose a sequential and multivariate anomaly detection method
that scales well to high-dimensional datasets. The proposed method follows a
nonparametric, i.e., data-driven, and semi-supervised approach, i.e., trains
only on nominal data. Thus, it is applicable to a wide range of applications
and data types. Thanks to its multivariate nature, it can quickly and
accurately detect challenging anomalies, such as changes in the correlation
structure and stealth low-rate cyberattacks. Its asymptotic optimality and
computational complexity are comprehensively analyzed. In conjunction with the
detection method, an effective technique for localizing the anomalous data
dimensions is also proposed. We further extend the proposed detection and
localization methods to a supervised setup where an additional anomaly dataset
is available, and combine the proposed semi-supervised and supervised
algorithms to obtain an online learning algorithm under the semi-supervised
framework. The practical use of proposed algorithms are demonstrated in DDoS
attack mitigation, and their performances are evaluated using a real IoT-botnet
dataset and simulations.Comment: 16 pages, LaTeX; references adde
Clustering Via Finite Nonparametric ICA Mixture Models
We propose an extension of non-parametric multivariate finite mixture models
by dropping the standard conditional independence assumption and incorporating
the independent component analysis (ICA) structure instead. We formulate an
objective function in terms of penalized smoothed Kullback Leibler distance and
introduce the nonlinear smoothed majorization-minimization independent
component analysis (NSMM-ICA) algorithm for optimizing this function and
estimating the model parameters. We have implemented a practical version of
this algorithm, which utilizes the FastICA algorithm, in the R package icamix.
We illustrate this new methodology using several applications in unsupervised
learning and image processing.Comment: 23 pages, 5 figures, Adv Data Anal Classif (2018
FLASH: Fast Bayesian Optimization for Data Analytic Pipelines
Modern data science relies on data analytic pipelines to organize
interdependent computational steps. Such analytic pipelines often involve
different algorithms across multiple steps, each with its own hyperparameters.
To achieve the best performance, it is often critical to select optimal
algorithms and to set appropriate hyperparameters, which requires large
computational efforts. Bayesian optimization provides a principled way for
searching optimal hyperparameters for a single algorithm. However, many
challenges remain in solving pipeline optimization problems with
high-dimensional and highly conditional search space. In this work, we propose
Fast LineAr SearcH (FLASH), an efficient method for tuning analytic pipelines.
FLASH is a two-layer Bayesian optimization framework, which firstly uses a
parametric model to select promising algorithms, then computes a nonparametric
model to fine-tune hyperparameters of the promising algorithms. FLASH also
includes an effective caching algorithm which can further accelerate the search
process. Extensive experiments on a number of benchmark datasets have
demonstrated that FLASH significantly outperforms previous state-of-the-art
methods in both search speed and accuracy. Using 50% of the time budget, FLASH
achieves up to 20% improvement on test error rate compared to the baselines.
FLASH also yields state-of-the-art performance on a real-world application for
healthcare predictive modeling.Comment: 21 pages, KDD 201
Measuring and Understanding Sensory Representations within Deep Networks Using a Numerical Optimization Framework
A central challenge in sensory neuroscience is describing how the activity of
populations of neurons can represent useful features of the external
environment. However, while neurophysiologists have long been able to record
the responses of neurons in awake, behaving animals, it is another matter
entirely to say what a given neuron does. A key problem is that in many sensory
domains, the space of all possible stimuli that one might encounter is
effectively infinite; in vision, for instance, natural scenes are
combinatorially complex, and an organism will only encounter a tiny fraction of
possible stimuli. As a result, even describing the response properties of
sensory neurons is difficult, and investigations of neuronal functions are
almost always critically limited by the number of stimuli that can be
considered. In this paper, we propose a closed-loop, optimization-based
experimental framework for characterizing the response properties of sensory
neurons, building on past efforts in closed-loop experimental methods, and
leveraging recent advances in artificial neural networks to serve as as a
proving ground for our techniques. Specifically, using deep convolutional
neural networks, we asked whether modern black-box optimization techniques can
be used to interrogate the "tuning landscape" of an artificial neuron in a
deep, nonlinear system, without imposing significant constraints on the space
of stimuli under consideration. We introduce a series of measures to quantify
the tuning landscapes, and show how these relate to the performances of the
networks in an object recognition task. To the extent that deep convolutional
neural networks increasingly serve as de facto working hypotheses for
biological vision, we argue that developing a unified approach for studying
both artificial and biological systems holds great potential to advance both
fields together
Review of Functional Data Analysis
With the advance of modern technology, more and more data are being recorded
continuously during a time interval or intermittently at several discrete time
points. They are both examples of "functional data", which have become a
prevailing type of data. Functional Data Analysis (FDA) encompasses the
statistical methodology for such data. Broadly interpreted, FDA deals with the
analysis and theory of data that are in the form of functions. This paper
provides an overview of FDA, starting with simple statistical notions such as
mean and covariance functions, then covering some core techniques, the most
popular of which is Functional Principal Component Analysis (FPCA). FPCA is an
important dimension reduction tool and in sparse data situations can be used to
impute functional data that are sparsely observed. Other dimension reduction
approaches are also discussed. In addition, we review another core technique,
functional linear regression, as well as clustering and classification of
functional data. Beyond linear and single or multiple index methods we touch
upon a few nonlinear approaches that are promising for certain applications.
They include additive and other nonlinear functional regression models, such as
time warping, manifold learning, and dynamic modeling with empirical
differential equations. The paper concludes with a brief discussion of future
directions.Comment: 47 page
Model Selection Techniques -- An Overview
In the era of big data, analysts usually explore various statistical models
or machine learning methods for observed data in order to facilitate scientific
discoveries or gain predictive power. Whatever data and fitting procedures are
employed, a crucial step is to select the most appropriate model or method from
a set of candidates. Model selection is a key ingredient in data analysis for
reliable and reproducible statistical inference or prediction, and thus central
to scientific studies in fields such as ecology, economics, engineering,
finance, political science, biology, and epidemiology. There has been a long
history of model selection techniques that arise from researches in statistics,
information theory, and signal processing. A considerable number of methods
have been proposed, following different philosophies and exhibiting varying
performances. The purpose of this article is to bring a comprehensive overview
of them, in terms of their motivation, large sample performance, and
applicability. We provide integrated and practically relevant discussions on
theoretical properties of state-of- the-art model selection approaches. We also
share our thoughts on some controversial views on the practice of model
selection.Comment: accepted by IEEE SIGNAL PROCESSING MAGAZIN
- …