329 research outputs found
Measurement Error in Lasso: Impact and Correction
Regression with the lasso penalty is a popular tool for performing dimension
reduction when the number of covariates is large. In many applications of the
lasso, like in genomics, covariates are subject to measurement error. We study
the impact of measurement error on linear regression with the lasso penalty,
both analytically and in simulation experiments. A simple method of correction
for measurement error in the lasso is then considered. In the large sample
limit, the corrected lasso yields sign consistent covariate selection under
conditions very similar to the lasso with perfect measurements, whereas the
uncorrected lasso requires much more stringent conditions on the covariance
structure of the data. Finally, we suggest methods to correct for measurement
error in generalized linear models with the lasso penalty, which we study
empirically in simulation experiments with logistic regression, and also apply
to a classification problem with microarray data. We see that the corrected
lasso selects less false positives than the standard lasso, at a similar level
of true positives. The corrected lasso can therefore be used to obtain more
conservative covariate selection in genomic analysis
Pair-copula constructions of multiple dependence
Building on the work of Bedford, Cooke and Joe, we show how multivariate data, which exhibit complex patterns of dependence in the tails, can be modelled using a cascade of pair-copulae, acting on two variables at a time. We use the pair-copula decomposition of a general multivariate distribution and propose a method to perform inference. The model construction is hierarchical in nature, the various levels corresponding to the incorporation of more variables in the conditioning sets, using pair-copulae as simple building blocs. Pair-copula decomposed models also represent a very flexible way to construct higher-dimensional coplulae. We apply the methodology to a financial data set. Our approach represents the first step towards developing of an unsupervised algorithm that explores the space of possible pair-copula models, that also can be applied to huge data sets automatically
Diverse personalized recommendations with uncertainty from implicit preference data with the Bayesian Mallows Model
Clicking data, which exists in abundance and contains objective user
preference information, is widely used to produce personalized recommendations
in web-based applications. Current popular recommendation algorithms, typically
based on matrix factorizations, often have high accuracy and achieve good
clickthrough rates. However, diversity of the recommended items, which can
greatly enhance user experiences, is often overlooked. Moreover, most
algorithms do not produce interpretable uncertainty quantifications of the
recommendations. In this work, we propose the Bayesian Mallows for Clicking
Data (BMCD) method, which augments clicking data into compatible full ranking
vectors by enforcing all the clicked items to be top-ranked. User preferences
are learned using a Mallows ranking model. Bayesian inference leads to
interpretable uncertainties of each individual recommendation, and we also
propose a method to make personalized recommendations based on such
uncertainties. With a simulation study and a real life data example, we
demonstrate that compared to state-of-the-art matrix factorization, BMCD makes
personalized recommendations with similar accuracy, while achieving much higher
level of diversity, and producing interpretable and actionable uncertainty
estimation.Comment: 27 page
Unsupervised empirical Bayesian multiple testing with external covariates
In an empirical Bayesian setting, we provide a new multiple testing method,
useful when an additional covariate is available, that influences the
probability of each null hypothesis being true. We measure the posterior
significance of each test conditionally on the covariate and the data, leading
to greater power. Using covariate-based prior information in an unsupervised
fashion, we produce a list of significant hypotheses which differs in length
and order from the list obtained by methods not taking covariate-information
into account. Covariate-modulated posterior probabilities of each null
hypothesis are estimated using a fast approximate algorithm. The new method is
applied to expression quantitative trait loci (eQTL) data.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS158 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Controlli e solvency II: il rischio frode assicurativa tra disciplina nazionale ed europea
l'articolo analizza i provvedimenti europei e nazionali volti a contrastare le frodi nel settore assicurativo, anche in vista della recente introduzione della funzione antifrode da parte di IVAASS
- …