319,062 research outputs found
Classification of high dimensional data using LASSO ensembles
Urda, D., Franco, L. and Jerez, J.M. (2017). Classification of high dimensional data using LASSO ensembles. Proceedings IEEE SSCI'17, Symposium Series on Computational Intelligence, Honolulu, Hawaii, U.S.A. (2017). ISBN: 978-1-5386-2726-6The estimation of multivariable predictors with good performance in high dimensional settings is a crucial task in biomedical contexts. Usually, solutions based on the application
of a single machine learning model are provided while the use of ensemble methods is often overlooked within this area despite
the well-known benefits that these methods provide in terms of predictive performance. In this paper, four ensemble approaches are described using LASSO base learners to predict the vital status of a patient from RNA-Seq gene expression data. The results of the analysis carried out in a public breast invasive cancer (BRCA) dataset shows that the ensemble approaches outperform statistically significant the standard LASSO model
considered as baseline case. We also perform an analysis of the computational costs involved for each of the approaches,
providing different usage recommendations according to the available computational power.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec
High Dimensional Probability
About forty years ago it was realized by several researchers that the
essential features of certain objects of Probability theory, notably Gaussian
processes and limit theorems, may be better understood if they are considered
in settings that do not impose structures extraneous to the problems at hand.
For instance, in the case of sample continuity and boundedness of Gaussian
processes, the essential feature is the metric or pseudometric structure
induced on the index set by the covariance structure of the process, regardless
of what the index set may be. This point of view ultimately led to the
Fernique-Talagrand majorizing measure characterization of sample boundedness
and continuity of Gaussian processes, thus solving an important problem posed
by Kolmogorov. Similarly, separable Banach spaces provided a minimal setting
for the law of large numbers, the central limit theorem and the law of the
iterated logarithm, and this led to the elucidation of the minimal (necessary
and/or sufficient) geometric properties of the space under which different
forms of these theorems hold. However, in light of renewed interest in
Empirical processes, a subject that has considerably influenced modern
Statistics, one had to deal with a non-separable Banach space, namely
. With separability discarded, the techniques developed
for Gaussian processes and for limit theorems and inequalities in separable
Banach spaces, together with combinatorial techniques, led to powerful
inequalities and limit theorems for sums of independent bounded processes over
general index sets, or, in other words, for general empirical processes.Comment: Published at http://dx.doi.org/10.1214/074921706000000905 in the IMS
Lecture Notes Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
High-dimensional sequence transduction
We investigate the problem of transforming an input sequence into a
high-dimensional output sequence in order to transcribe polyphonic audio music
into symbolic notation. We introduce a probabilistic model based on a recurrent
neural network that is able to learn realistic output distributions given the
input and we devise an efficient algorithm to search for the global mode of
that distribution. The resulting method produces musically plausible
transcriptions even under high levels of noise and drastically outperforms
previous state-of-the-art approaches on five datasets of synthesized sounds and
real recordings, approximately halving the test error rate
High-dimensional neutrino masses
For Majorana neutrino masses the lowest dimensional operator possible is the
Weinberg operator at . Here we discuss the possibility that neutrino
masses originate from higher dimensional operators. Specifically, we consider
all tree-level decompositions of the , and neutrino mass
operators. With renormalizable interactions only, we find 18 topologies and 66
diagrams for , and 92 topologies plus 504 diagrams at the level. At
there are already 576 topologies and 4199 diagrams. However, among all
these there are only very few genuine neutrino mass models: At we
find only (2,2,2) genuine diagrams and a total of (2,2,6) models. Here, a model
is considered genuine at level if it automatically forbids lower order
neutrino masses {\em without} the use of additional symmetries. We also briefly
discuss how neutrino masses and angles can be easily fitted in these
high-dimensional models.Comment: Coincides with published version in JHE
On high-dimensional sign tests
Sign tests are among the most successful procedures in multivariate
nonparametric statistics. In this paper, we consider several testing problems
in multivariate analysis, directional statistics and multivariate time series
analysis, and we show that, under appropriate symmetry assumptions, the
fixed- multivariate sign tests remain valid in the high-dimensional case.
Remarkably, our asymptotic results are universal, in the sense that, unlike in
most previous works in high-dimensional statistics, may go to infinity in
an arbitrary way as does. We conduct simulations that (i) confirm our
asymptotic results, (ii) reveal that, even for relatively large , chi-square
critical values are to be favoured over the (asymptotically equivalent)
Gaussian ones and (iii) show that, for testing i.i.d.-ness against serial
dependence in the high-dimensional case, Portmanteau sign tests outperform
their competitors in terms of validity-robustness.Comment: Published at http://dx.doi.org/10.3150/15-BEJ710 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
- …
