303 research outputs found
Estimation of extended mixed models using latent classes and latent processes: the R package lcmm
The R package lcmm provides a series of functions to estimate statistical
models based on linear mixed model theory. It includes the estimation of mixed
models and latent class mixed models for Gaussian longitudinal outcomes (hlme),
curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear
multivariate outcomes (multlcmm), as well as joint latent class mixed models
(Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a
time-to-event that can be possibly left-truncated right-censored and defined in
a competing setting. Maximum likelihood esimators are obtained using a modified
Marquardt algorithm with strict convergence criteria based on the parameters
and likelihood stability, and on the negativity of the second derivatives. The
package also provides various post-fit functions including goodness-of-fit
analyses, classification, plots, predicted trajectories, individual dynamic
prediction of the event and predictive accuracy assessment. This paper
constitutes a companion paper to the package by introducing each family of
models, the estimation technique, some implementation details and giving
examples through a dataset on cognitive aging
Central subspaces review: methods and applications
Central subspaces have long been a key concept for sufficient dimension reduction. Initially constructed for solving problems in the p n . In this article we review the theory of central subspaces and give an updated overview of central subspace methods for the p ≤ n , p > n and big data settings. We also develop a new classification system for these techniques and list some R and MATLAB packages that can be used for estimating the central subspace. Finally, we develop a central subspace framework for bioinformatics applications and show, using two distinct data sets, how this framework can be applied in practice
A semiparametric approach for a multivariate sample selection model
International audienceMost of the common estimation methods for sample selection models rely heavily on parametric and normality assumptions. We consider in this paper a multivariate semiparametric sample selection model and develop a geometric approach to the estimation of the slope vectors in the outcome equation and in the selection equation. Contrary to most existing methods, we deal symmetrically with both slope vectors. Moreover, the estimation method is link-free and distributionfree. It works in two main steps: a multivariate sliced inverse regression step, and a canonical analysis step. We establish pn-consistency and asymptotic normality of the estimates. We describe how to estimate the observation and selection link functions. The theory is illustrated with a simulation study
Choix d'estimateurs base sur le risque de Kullback-Leibler.
Estimators choice is a crucial topic in statistics. The most famous criterion is the Akaike information criterion. It has been constructed as an approximation, up to a constant, of the Kullback-Leibler risk. However, a precise value of the Akaike criterion has no direct interpretation and its variability is often ignored. We propose several approaches to estimate Kullback-Leibler risks. The criteria defined can be used in a parametric, non-parametric or semi-parametric context. An extension of these criteria for incomplete data is presented. The issue of the choice of estimators in the presence of incomplete data is described. Several applications in the survival framework is described: smooth estimators choice for the hazard function, estimators choice from proportional hazard model and stratified model, and estimators choice for markov model and non markov model. Finally, several criteria are defined for selecting estimators based on different observations
Penalized Partial Least Square applied to structured data
Nowadays, data analysis applied to high dimension has arisen. The edification of high-dimensional data can be achieved by the gathering of different independent data. However, each independent set can introduce its own bias. We can cope with this bias introducing the observation set structure into our model. The goal of this article is to build theoretical background for the dimension reduction method sparse Partial Least Square (sPLS) in the context of data presenting such an observation set structure. The innovation consists in building different sPLS models and linking them through a common-Lasso penalization. This theory could be applied to any field, where observation present this kind of structure and, therefore, improve the sPLS in domains, where it is competitive. Furthermore, it can be extended to the particular case, where variables can be gathered in given a priori groups, where sPLS is defined as a sparse group Partial Least Square
COMBSS: Best Subset Selection via Continuous Optimization
The problem of best subset selection in linear regression is considered with
the aim to find a fixed size subset of features that best fits the response.
This is particularly challenging when the total available number of features is
very large compared to the number of data samples. Existing optimal methods for
solving this problem tend to be slow while fast methods tend to have low
accuracy. Ideally, new methods perform best subset selection faster than
existing optimal methods but with comparable accuracy, or, being more accurate
than methods of comparable computational speed. Here, we propose a novel
continuous optimization method that identifies a subset solution path, a small
set of models of varying size, that consists of candidates for the single best
subset of features, that is optimal in a specific sense in linear regression.
Our method turns out to be fast, making the best subset selection possible when
the number of features is well in excess of thousands. Because of the
outstanding overall performance, framing the best subset selection challenge as
a continuous optimization problem opens new research directions for feature
extraction for a large variety of regression models
Análisis de toma de decisión con AHP/ANP de energías renovables en República Dominicana
La República Dominicana ofrece una oportunidad de inversión extranjera al establecer un marco legal ventajoso para el desarrollo del sector energético mediante las Energías Renovables. Esta situación nos motiva a creer que habrá un crecimiento en el número de proyectos de Energías Renovables, y con ello la necesidad de utilizar distintos métodos de tomas de decisiones con el fin de seleccionar las alternativas adecuadas en las que invertir. En este estudio, comparamos las metodologías Proceso Analítico Jerárquico y el Proceso Analítico en Red para seleccionar el sistema de energía renovable más adecuado para el autoconsumo en edificaciones residenciales en Santo Domingo.Centro Universitario de la Defensa. Escuela de Turismo de Cartagena. Escuela Técnica Superior de Ingeniería Industrial UPCT. Escuela Técnica Superior de Ingeniería de Telecomunicación (ETSIT). Escuela de Ingeniería de Caminos y Minas (EICM). Escuela de Arquitectura e Ingeniería de Edificación (ARQ&IDE). Parque Tecnológico de Fuente Álamo. Navantia. Campus Mare Nostrum. Estación Experimental Agroalimentaria Tomás FerroUniversidad Politécnica de Cartagen
ClustOfVar: An R Package for the Clustering of Variables
Clustering of variables is as a way to arrange variables into homogeneous
clusters, i.e., groups of variables which are strongly related to each other
and thus bring the same information. These approaches can then be useful for
dimension reduction and variable selection. Several specific methods have been
developed for the clustering of numerical variables. However concerning
qualitative variables or mixtures of quantitative and qualitative variables,
far fewer methods have been proposed. The R package ClustOfVar was specifically
developed for this purpose. The homogeneity criterion of a cluster is defined
as the sum of correlation ratios (for qualitative variables) and squared
correlations (for quantitative variables) to a synthetic quantitative variable,
summarizing "as good as possible" the variables in the cluster. This synthetic
variable is the first principal component obtained with the PCAMIX method. Two
algorithms for the clustering of variables are proposed: iterative relocation
algorithm and ascendant hierarchical clustering. We also propose a bootstrap
approach in order to determine suitable numbers of clusters. We illustrate the
methodologies and the associated package on small datasets
- …