319,062 research outputs found

    Classification of high dimensional data using LASSO ensembles

    Get PDF
    Urda, D., Franco, L. and Jerez, J.M. (2017). Classification of high dimensional data using LASSO ensembles. Proceedings IEEE SSCI'17, Symposium Series on Computational Intelligence, Honolulu, Hawaii, U.S.A. (2017). ISBN: 978-1-5386-2726-6The estimation of multivariable predictors with good performance in high dimensional settings is a crucial task in biomedical contexts. Usually, solutions based on the application of a single machine learning model are provided while the use of ensemble methods is often overlooked within this area despite the well-known benefits that these methods provide in terms of predictive performance. In this paper, four ensemble approaches are described using LASSO base learners to predict the vital status of a patient from RNA-Seq gene expression data. The results of the analysis carried out in a public breast invasive cancer (BRCA) dataset shows that the ensemble approaches outperform statistically significant the standard LASSO model considered as baseline case. We also perform an analysis of the computational costs involved for each of the approaches, providing different usage recommendations according to the available computational power.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    High Dimensional Probability

    Full text link
    About forty years ago it was realized by several researchers that the essential features of certain objects of Probability theory, notably Gaussian processes and limit theorems, may be better understood if they are considered in settings that do not impose structures extraneous to the problems at hand. For instance, in the case of sample continuity and boundedness of Gaussian processes, the essential feature is the metric or pseudometric structure induced on the index set by the covariance structure of the process, regardless of what the index set may be. This point of view ultimately led to the Fernique-Talagrand majorizing measure characterization of sample boundedness and continuity of Gaussian processes, thus solving an important problem posed by Kolmogorov. Similarly, separable Banach spaces provided a minimal setting for the law of large numbers, the central limit theorem and the law of the iterated logarithm, and this led to the elucidation of the minimal (necessary and/or sufficient) geometric properties of the space under which different forms of these theorems hold. However, in light of renewed interest in Empirical processes, a subject that has considerably influenced modern Statistics, one had to deal with a non-separable Banach space, namely L\mathcal{L}_{\infty}. With separability discarded, the techniques developed for Gaussian processes and for limit theorems and inequalities in separable Banach spaces, together with combinatorial techniques, led to powerful inequalities and limit theorems for sums of independent bounded processes over general index sets, or, in other words, for general empirical processes.Comment: Published at http://dx.doi.org/10.1214/074921706000000905 in the IMS Lecture Notes Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    High-dimensional sequence transduction

    Full text link
    We investigate the problem of transforming an input sequence into a high-dimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions given the input and we devise an efficient algorithm to search for the global mode of that distribution. The resulting method produces musically plausible transcriptions even under high levels of noise and drastically outperforms previous state-of-the-art approaches on five datasets of synthesized sounds and real recordings, approximately halving the test error rate

    High-dimensional neutrino masses

    Full text link
    For Majorana neutrino masses the lowest dimensional operator possible is the Weinberg operator at d=5d=5. Here we discuss the possibility that neutrino masses originate from higher dimensional operators. Specifically, we consider all tree-level decompositions of the d=9d=9, d=11d=11 and d=13d=13 neutrino mass operators. With renormalizable interactions only, we find 18 topologies and 66 diagrams for d=9d=9, and 92 topologies plus 504 diagrams at the d=11d=11 level. At d=13d=13 there are already 576 topologies and 4199 diagrams. However, among all these there are only very few genuine neutrino mass models: At d=(9,11,13)d=(9,11,13) we find only (2,2,2) genuine diagrams and a total of (2,2,6) models. Here, a model is considered genuine at level dd if it automatically forbids lower order neutrino masses {\em without} the use of additional symmetries. We also briefly discuss how neutrino masses and angles can be easily fitted in these high-dimensional models.Comment: Coincides with published version in JHE

    On high-dimensional sign tests

    Full text link
    Sign tests are among the most successful procedures in multivariate nonparametric statistics. In this paper, we consider several testing problems in multivariate analysis, directional statistics and multivariate time series analysis, and we show that, under appropriate symmetry assumptions, the fixed-pp multivariate sign tests remain valid in the high-dimensional case. Remarkably, our asymptotic results are universal, in the sense that, unlike in most previous works in high-dimensional statistics, pp may go to infinity in an arbitrary way as nn does. We conduct simulations that (i) confirm our asymptotic results, (ii) reveal that, even for relatively large pp, chi-square critical values are to be favoured over the (asymptotically equivalent) Gaussian ones and (iii) show that, for testing i.i.d.-ness against serial dependence in the high-dimensional case, Portmanteau sign tests outperform their competitors in terms of validity-robustness.Comment: Published at http://dx.doi.org/10.3150/15-BEJ710 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
    corecore