46,034 research outputs found
Incremental Sparse Bayesian Ordinal Regression
Ordinal Regression (OR) aims to model the ordering information between
different data categories, which is a crucial topic in multi-label learning. An
important class of approaches to OR models the problem as a linear combination
of basis functions that map features to a high dimensional non-linear space.
However, most of the basis function-based algorithms are time consuming. We
propose an incremental sparse Bayesian approach to OR tasks and introduce an
algorithm to sequentially learn the relevant basis functions in the ordinal
scenario. Our method, called Incremental Sparse Bayesian Ordinal Regression
(ISBOR), automatically optimizes the hyper-parameters via the type-II maximum
likelihood method. By exploiting fast marginal likelihood optimization, ISBOR
can avoid big matrix inverses, which is the main bottleneck in applying basis
function-based algorithms to OR tasks on large-scale datasets. We show that
ISBOR can make accurate predictions with parsimonious basis functions while
offering automatic estimates of the prediction uncertainty. Extensive
experiments on synthetic and real word datasets demonstrate the efficiency and
effectiveness of ISBOR compared to other basis function-based OR approaches
Compression and Classification Methods for Galaxy Spectra in Large Redshift Surveys
Methods for compression and classification of galaxy spectra, which are
useful for large galaxy redshift surveys (such as the SDSS, 2dF, 6dF and
VIRMOS), are reviewed. In particular, we describe and contrast three methods:
(i) Principal Component Analysis, (ii) Information Bottleneck, and (iii) Fisher
Matrix. We show applications to 2dF galaxy spectra and to mock semi-analytic
spectra, and we discuss how these methods can be used to study physical
processes of galaxy formation, clustering and galaxy biasing in the new large
redshift surveys.Comment: Review talk, proceedings of MPA/MPE/ESO Conference "Mining the Sky",
2000, Garching, Germany; 20 pages, 5 figure
Bayesian inference for queueing networks and modeling of internet services
Modern Internet services, such as those at Google, Yahoo!, and Amazon, handle
billions of requests per day on clusters of thousands of computers. Because
these services operate under strict performance requirements, a statistical
understanding of their performance is of great practical interest. Such
services are modeled by networks of queues, where each queue models one of the
computers in the system. A key challenge is that the data are incomplete,
because recording detailed information about every request to a heavily used
system can require unacceptable overhead. In this paper we develop a Bayesian
perspective on queueing models in which the arrival and departure times that
are not observed are treated as latent variables. Underlying this viewpoint is
the observation that a queueing model defines a deterministic transformation
between the data and a set of independent variables called the service times.
With this viewpoint in hand, we sample from the posterior distribution over
missing data and model parameters using Markov chain Monte Carlo. We evaluate
our framework on data from a benchmark Web application. We also present a
simple technique for selection among nested queueing models. We are unaware of
any previous work that considers inference in networks of queues in the
presence of missing data.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS392 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …