5,577 research outputs found
Relax, no need to round: integrality of clustering formulations
We study exact recovery conditions for convex relaxations of point cloud
clustering problems, focusing on two of the most common optimization problems
for unsupervised clustering: -means and -median clustering. Motivations
for focusing on convex relaxations are: (a) they come with a certificate of
optimality, and (b) they are generic tools which are relatively parameter-free,
not tailored to specific assumptions over the input. More precisely, we
consider the distributional setting where there are clusters in
and data from each cluster consists of points sampled from a
symmetric distribution within a ball of unit radius. We ask: what is the
minimal separation distance between cluster centers needed for convex
relaxations to exactly recover these clusters as the optimal integral
solution? For the -median linear programming relaxation we show a tight
bound: exact recovery is obtained given arbitrarily small pairwise separation
between the balls. In other words, the pairwise center
separation is . Under the same distributional model, the
-means LP relaxation fails to recover such clusters at separation as large
as . Yet, if we enforce PSD constraints on the -means LP, we get
exact cluster recovery at center separation .
In contrast, common heuristics such as Lloyd's algorithm (a.k.a. the -means
algorithm) can fail to recover clusters in this setting; even with arbitrarily
large cluster separation, k-means++ with overseeding by any constant factor
fails with high probability at exact cluster recovery. To complement the
theoretical analysis, we provide an experimental study of the recovery
guarantees for these various methods, and discuss several open problems which
these experiments suggest.Comment: 30 pages, ITCS 201
Algoritmos de aproximação para problemas de alocação de instalações e outros problemas de cadeia de fornecimento
Orientadores: Flávio Keidi Miyazawa, Maxim SviridenkoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O resumo poderá ser visualizado no texto completo da tese digitalAbstract: The abstract is available with the full electronic documentDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã
Decision Making Under Uncertainty Using Machine Learning
RÉSUMÉ : Nous proposons un algorithme basé sur l’apprentissage supervisé pour obtenir de bonnes solutions primales pour les programmes stochastiques en deux étapes en nombres entiers (en anglais, two-stage stochastic integer programs - 2SIP). Le but de l’algorithme est de prédire un scénario représentatif (en anglais, representative scenario - RS) pour le problème tel qu’en résolvant de manière déterministe le 2SIP avec la réalisation aléatoire égale au scénario représentatif (en anglais, representative scenario (RS)), l’algorithme donne une solution quasi optimale au 2SIP original. Prédire un RS, au lieu de prédire directement une solution, garantit la faisabilité de la solution de première étape. Si le problème possède un recours complet, la réalisabilité de la deuxième étape est également garantie. Nous effectuons des expériences sur deux problèmes: le problème de localisation d’entrepôts avec capacité stochastique (en anglais, Stochastic Capacitated Facility Location Problem (S-CFLP)) et problème d’affectation généralisée stochastique (en anglais, Stochastic Generalized Assignment Problem (S-GAP)). Les deux problèmes ont des variables entières et des contraintes linéaires dans les première et deuxième étapes. La méthode proposée est capable de produire de bonnes solutions primales pour le S-CFLP lorsqu’elle est testée sur les tailles sur lesquelles elle a été entraînée. De plus, notre temps de calcul est compétitif par rapport à celui pris par Gurobi pour obtenir une qualité de solution similaire. Cependant, nos modèles ne sont pas capables de généraliser et de produire de bonnes solutions primales lorsqu’ils sont testés sur les tailles sur lesquelles ils n’ont pas été entraînés. Dans le cas de S-GAP, jusqu’à maintenant, notre méthode peine à trouver de bonnes solutions primales. Nous discutons des défis et des solutions potentielles que nous pourrions utiliser pour leur faire face.----------ABSTRACT : We propose a supervised learning based algorithm to obtain good primal solutions for twostage stochastic integer programming (2SIP) problems with constraints in the first and second stages. The goal of the algorithm is to predict a representative scenario for the problem such that, deterministically solving a two-stage stochastic integer program with the random realization equal to a representative scenario, gives a near-optimal solution to the original 2SIP. Predicting a representative scenario, instead of directly predicting a solution ensures
first-stage feasibility of the solution. If the problem is known to have complete recourse, second-stage feasibility is also guaranteed. We perform computational tests on two problems, namely, the stochastic capacitated facility
location problem (S-CFLP) and stochastic generalized assignment problem (S-GAP). Both the problems have integer variables and linear constraints in the first and second stages. The proposed method is able to produce good primal solutions for the S-CFLP when tested on the sizes on which it was trained. Also, our computing time is competitive to that taken by Gurobi to achieve a similar solution quality. However, our models are not able to generalize and produce good primal solutions when tested on the sizes on which they were not trained. In the case of stochastic generalized assignment problem, as of now, our method struggles to
find good primal solutions. We discuss the challenges and the potential solutions we would be pursuing to alleviate them
Derivative-free online learning of inverse dynamics models
This paper discusses online algorithms for inverse dynamics modelling in
robotics. Several model classes including rigid body dynamics (RBD) models,
data-driven models and semiparametric models (which are a combination of the
previous two classes) are placed in a common framework. While model classes
used in the literature typically exploit joint velocities and accelerations,
which need to be approximated resorting to numerical differentiation schemes,
in this paper a new `derivative-free' framework is proposed that does not
require this preprocessing step. An extensive experimental study with real data
from the right arm of the iCub robot is presented, comparing different model
classes and estimation procedures, showing that the proposed `derivative-free'
methods outperform existing methodologies.Comment: 14 pages, 11 figure
Quantum-Gravity Decoherence Effects in Neutrino Oscillations: Expected Constraints from CNGS and J-PARC
Quantum decoherence, the evolution of pure states into mixed states, may be a
feature of quantum-gravity models. In most cases, such models lead to fewer
neutrinos of all active flavours being detected in a long baseline experiment
as compared to three-flavour standard neutrino oscillations. We discuss the
potential of the CNGS and J-PARC beams in constraining models of
quantum-gravity induced decoherence using neutrino oscillations as a probe. We
use as much as possible model-independent parameterizations, even though they
are motivated by specific microscopic models, for fits to the expected
experimental data which yield bounds on quantum-gravity decoherence parameters.Comment: 40 pages, 8 figures, minor correction
Sinusoidal Modeling Applied to Spatially Variant Tropospheric Ozone Air Pollution
This paper demonstrates how parsimonious models of sinusoidal functions can be used to fit spatially variant time series in which there is considerable variation of a periodic type. A typical shortcoming of such tools relates to the difficulty in capturing idiosyncratic variation in periodic models. The strategy developed here addresses this deficiency. While previous work has sought to overcome the shortcoming by augmenting sinusoids with other techniques, the present approach employs station-specific sinusoids to supplement a common regional component, which succeeds in capturing local idiosyncratic behavior in a parsimonious manner. The experiments conducted herein reveal that a semi-parametric approach enables such models to fit spatially varying time series with periodic behavior in a remarkably tight fashion. The methods are applied to a panel data set consisting of hourly air pollution measurements. The augmented sinusoidal models produce an excellent fit to these data at three different levels of spatial detail.Air Pollution, Idiosyncratic component, Regional variation, Semiparametric model, Sinusoidal function, Spatial-temporal data, Tropospheric Ozone
A Survey of Contextual Optimization Methods for Decision Making under Uncertainty
Recently there has been a surge of interest in operations research (OR) and
the machine learning (ML) community in combining prediction algorithms and
optimization techniques to solve decision-making problems in the face of
uncertainty. This gave rise to the field of contextual optimization, under
which data-driven procedures are developed to prescribe actions to the
decision-maker that make the best use of the most recently updated information.
A large variety of models and methods have been presented in both OR and ML
literature under a variety of names, including data-driven optimization,
prescriptive optimization, predictive stochastic programming, policy
optimization, (smart) predict/estimate-then-optimize, decision-focused
learning, (task-based) end-to-end learning/forecasting/optimization, etc.
Focusing on single and two-stage stochastic programming problems, this review
article identifies three main frameworks for learning policies from data and
discusses their strengths and limitations. We present the existing models and
methods under a uniform notation and terminology and classify them according to
the three main frameworks identified. Our objective with this survey is to both
strengthen the general understanding of this active field of research and
stimulate further theoretical and algorithmic advancements in integrating ML
and stochastic programming
Bayesian semiparametric GARCH models
This paper aims to investigate a Bayesian sampling approach to parameter estimation in the semiparametric GARCH model with an unknown conditional error density, which we approximate by a mixture of Gaussian densities centered at individual errors and scaled by a common standard deviation. This mixture density has the form of a kernel density estimator of the errors with its bandwidth being the standard deviation. The proposed investigation is motivated by the lack of robustness in GARCH models with any parametric assumption of the error density for the purpose of error-density based inference such as value-at-risk (VaR) estimation. The contribution of the paper is to construct the likelihood and posterior of model and bandwidth parameters under the proposed mixture error density, and to forecast the one-step out-of-sample density of asset returns. The resulting VaR measure therefore would be distribution-free. Applying the semiparametric GARCH(1,1) model to daily stock-index returns in eight stock markets, we find that this semiparametric GARCH model is favoured against the GARCH(1,1) model with Student t errors for five indices, and that the GARCH model underestimates VaR compared to its semiparametric counterpart. We also investigate the use and benefit of localized bandwidths in the proposed mixture density of the errors.Bayes factors, kernel-form error density, localized bandwidths, Markov chain Monte Carlo, value-at-risk
- …