303 research outputs found

    Estimation of extended mixed models using latent classes and latent processes: the R package lcmm

    Get PDF
    The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging

    Central subspaces review: methods and applications

    Get PDF
    Central subspaces have long been a key concept for sufficient dimension reduction. Initially constructed for solving problems in the p n . In this article we review the theory of central subspaces and give an updated overview of central subspace methods for the p ≤ n , p > n and big data settings. We also develop a new classification system for these techniques and list some R and MATLAB packages that can be used for estimating the central subspace. Finally, we develop a central subspace framework for bioinformatics applications and show, using two distinct data sets, how this framework can be applied in practice

    A semiparametric approach for a multivariate sample selection model

    Get PDF
    International audienceMost of the common estimation methods for sample selection models rely heavily on parametric and normality assumptions. We consider in this paper a multivariate semiparametric sample selection model and develop a geometric approach to the estimation of the slope vectors in the outcome equation and in the selection equation. Contrary to most existing methods, we deal symmetrically with both slope vectors. Moreover, the estimation method is link-free and distributionfree. It works in two main steps: a multivariate sliced inverse regression step, and a canonical analysis step. We establish pn-consistency and asymptotic normality of the estimates. We describe how to estimate the observation and selection link functions. The theory is illustrated with a simulation study

    Choix d'estimateurs base sur le risque de Kullback-Leibler.

    Get PDF
    Estimators choice is a crucial topic in statistics. The most famous criterion is the Akaike information criterion. It has been constructed as an approximation, up to a constant, of the Kullback-Leibler risk. However, a precise value of the Akaike criterion has no direct interpretation and its variability is often ignored. We propose several approaches to estimate Kullback-Leibler risks. The criteria defined can be used in a parametric, non-parametric or semi-parametric context. An extension of these criteria for incomplete data is presented. The issue of the choice of estimators in the presence of incomplete data is described. Several applications in the survival framework is described: smooth estimators choice for the hazard function, estimators choice from proportional hazard model and stratified model, and estimators choice for markov model and non markov model. Finally, several criteria are defined for selecting estimators based on different observations

    Penalized Partial Least Square applied to structured data

    Get PDF
    Nowadays, data analysis applied to high dimension has arisen. The edification of high-dimensional data can be achieved by the gathering of different independent data. However, each independent set can introduce its own bias. We can cope with this bias introducing the observation set structure into our model. The goal of this article is to build theoretical background for the dimension reduction method sparse Partial Least Square (sPLS) in the context of data presenting such an observation set structure. The innovation consists in building different sPLS models and linking them through a common-Lasso penalization. This theory could be applied to any field, where observation present this kind of structure and, therefore, improve the sPLS in domains, where it is competitive. Furthermore, it can be extended to the particular case, where variables can be gathered in given a priori groups, where sPLS is defined as a sparse group Partial Least Square

    COMBSS: Best Subset Selection via Continuous Optimization

    Full text link
    The problem of best subset selection in linear regression is considered with the aim to find a fixed size subset of features that best fits the response. This is particularly challenging when the total available number of features is very large compared to the number of data samples. Existing optimal methods for solving this problem tend to be slow while fast methods tend to have low accuracy. Ideally, new methods perform best subset selection faster than existing optimal methods but with comparable accuracy, or, being more accurate than methods of comparable computational speed. Here, we propose a novel continuous optimization method that identifies a subset solution path, a small set of models of varying size, that consists of candidates for the single best subset of features, that is optimal in a specific sense in linear regression. Our method turns out to be fast, making the best subset selection possible when the number of features is well in excess of thousands. Because of the outstanding overall performance, framing the best subset selection challenge as a continuous optimization problem opens new research directions for feature extraction for a large variety of regression models

    Análisis de toma de decisión con AHP/ANP de energías renovables en República Dominicana

    Get PDF
    La República Dominicana ofrece una oportunidad de inversión extranjera al establecer un marco legal ventajoso para el desarrollo del sector energético mediante las Energías Renovables. Esta situación nos motiva a creer que habrá un crecimiento en el número de proyectos de Energías Renovables, y con ello la necesidad de utilizar distintos métodos de tomas de decisiones con el fin de seleccionar las alternativas adecuadas en las que invertir. En este estudio, comparamos las metodologías Proceso Analítico Jerárquico y el Proceso Analítico en Red para seleccionar el sistema de energía renovable más adecuado para el autoconsumo en edificaciones residenciales en Santo Domingo.Centro Universitario de la Defensa. Escuela de Turismo de Cartagena. Escuela Técnica Superior de Ingeniería Industrial UPCT. Escuela Técnica Superior de Ingeniería de Telecomunicación (ETSIT). Escuela de Ingeniería de Caminos y Minas (EICM). Escuela de Arquitectura e Ingeniería de Edificación (ARQ&IDE). Parque Tecnológico de Fuente Álamo. Navantia. Campus Mare Nostrum. Estación Experimental Agroalimentaria Tomás FerroUniversidad Politécnica de Cartagen

    ClustOfVar: An R Package for the Clustering of Variables

    Get PDF
    Clustering of variables is as a way to arrange variables into homogeneous clusters, i.e., groups of variables which are strongly related to each other and thus bring the same information. These approaches can then be useful for dimension reduction and variable selection. Several specific methods have been developed for the clustering of numerical variables. However concerning qualitative variables or mixtures of quantitative and qualitative variables, far fewer methods have been proposed. The R package ClustOfVar was specifically developed for this purpose. The homogeneity criterion of a cluster is defined as the sum of correlation ratios (for qualitative variables) and squared correlations (for quantitative variables) to a synthetic quantitative variable, summarizing "as good as possible" the variables in the cluster. This synthetic variable is the first principal component obtained with the PCAMIX method. Two algorithms for the clustering of variables are proposed: iterative relocation algorithm and ascendant hierarchical clustering. We also propose a bootstrap approach in order to determine suitable numbers of clusters. We illustrate the methodologies and the associated package on small datasets
    corecore