16,425 research outputs found
Have Econometric Analyses of Happiness Data Been Futile? A Simple Truth About Happiness Scales
Econometric analyses in the happiness literature typically use subjective
well-being (SWB) data to compare the mean of observed or latent happiness
across samples. Recent critiques show that comparing the mean of ordinal data
is only valid under strong assumptions that are usually rejected by SWB data.
This leads to an open question whether much of the empirical studies in the
economics of happiness literature have been futile. In order to salvage some of
the prior results and avoid future issues, we suggest regression analysis of
SWB (and other ordinal data) should focus on the median rather than the mean.
Median comparisons using parametric models such as the ordered probit and logit
can be readily carried out using familiar statistical softwares like STATA. We
also show a previously assumed impractical task of estimating a semiparametric
median ordered-response model is also possible by using a novel constrained
mixed integer optimization technique. We use GSS data to show the famous
Easterlin Paradox from the happiness literature holds for the US independent of
any parametric assumption
A comprehensive literature classification of simulation optimisation methods
Simulation Optimization (SO) provides a structured approach to the system design and configuration when analytical expressions for input/output relationships are unavailable. Several excellent surveys have been written on this topic. Each survey concentrates on only few classification criteria. This paper presents a literature survey with all classification criteria on techniques for SO according to the problem of characteristics such as shape of the response surface (global as compared to local optimization), objective functions (single or multiple objectives) and parameter spaces (discrete or continuous parameters). The survey focuses specifically on the SO problem that involves single per-formance measureSimulation Optimization, classification methods, literature survey
Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence
Data in the form of pairwise comparisons arises in many domains, including
preference elicitation, sporting competitions, and peer grading among others.
We consider parametric ordinal models for such pairwise comparison data
involving a latent vector that represents the
"qualities" of the items being compared; this class of models includes the
two most widely used parametric models--the Bradley-Terry-Luce (BTL) and the
Thurstone models. Working within a standard minimax framework, we provide tight
upper and lower bounds on the optimal error in estimating the quality score
vector under this class of models. The bounds depend on the topology of
the comparison graph induced by the subset of pairs being compared via its
Laplacian spectrum. Thus, in settings where the subset of pairs may be chosen,
our results provide principled guidelines for making this choice. Finally, we
compare these error rates to those under cardinal measurement models and show
that the error rates in the ordinal and cardinal settings have identical
scalings apart from constant pre-factors.Comment: 39 pages, 5 figures. Significant extension of arXiv:1406.661
Semi-parametric analysis of multi-rater data
Datasets that are subjectively labeled by a number of experts are becoming more common in tasks such as biological text annotation where class definitions are necessarily somewhat subjective. Standard classification and regression models are not suited to multiple labels and typically a pre-processing step (normally assigning the majority class) is performed. We propose Bayesian models for classification and ordinal regression that naturally incorporate multiple expert opinions in defining predictive distributions. The models make use of Gaussian process priors, resulting in great flexibility and particular suitability to text based problems where the number of covariates can be far greater than the number of data instances. We show that using all labels rather than just the majority improves performance on a recent biological dataset
Encrypted statistical machine learning: new privacy preserving methods
We present two new statistical machine learning methods designed to learn on
fully homomorphic encrypted (FHE) data. The introduction of FHE schemes
following Gentry (2009) opens up the prospect of privacy preserving statistical
machine learning analysis and modelling of encrypted data without compromising
security constraints. We propose tailored algorithms for applying extremely
random forests, involving a new cryptographic stochastic fraction estimator,
and na\"{i}ve Bayes, involving a semi-parametric model for the class decision
boundary, and show how they can be used to learn and predict from encrypted
data. We demonstrate that these techniques perform competitively on a variety
of classification data sets and provide detailed information about the
computational practicalities of these and other FHE methods.Comment: 39 page
- …