2,273,255 research outputs found
Post Selection Shrinkage Estimation for High Dimensional Data Analysis
In high-dimensional data settings where , many penalized
regularization approaches were studied for simultaneous variable selection and
estimation. However, with the existence of covariates with weak effect, many
existing variable selection methods, including Lasso and its generations,
cannot distinguish covariates with weak and no contribution. Thus, prediction
based on a subset model of selected covariates only can be inefficient. In this
paper, we propose a post selection shrinkage estimation strategy to improve the
prediction performance of a selected subset model. Such a post selection
shrinkage estimator (PSE) is data-adaptive and constructed by shrinking a post
selection weighted ridge estimator in the direction of a selected candidate
subset. Under an asymptotic distributional quadratic risk criterion, its
prediction performance is explored analytically. We show that the proposed post
selection PSE performs better than the post selection weighted ridge estimator.
More importantly, it improves the prediction performance of any candidate
subset model selected from most existing Lasso-type variable selection methods
significantly. The relative performance of the post selection PSE is
demonstrated by both simulation studies and real data analysis.Comment: 40 pages, 2 figures, discussion pape
LONGITUDINAL HIGH-DIMENSIONAL DATA ANALYSIS
We develop a flexible framework for modeling high-dimensional functional and imaging data observed longitudinally. The approach decomposes the observed variability of high-dimensional observations measured at multiple visits into three additive components: a subject-specific functional random intercept that quantifies the cross-sectional variability, a subject-specific functional slope that quantifies the dynamic irreversible deformation over multiple visits, and a subject-visit specific functional deviation that quantifies exchangeable or reversible visit-to-visit changes. The proposed method is very fast, scalable to studies including ultra-high dimensional data, and can easily be adapted to and executed on modest computing infrastructures. The method is applied to the longitudinal analysis of diffusion tensor imaging (DTI) data of the corpus callosum of multiple sclerosis (MS) subjects. The study includes 176 subjects observed at 466 visits. For each subject and visit the study contains a registered DTI scan of the corpus callosum at roughly 30,000 voxels
Viewpoints: A high-performance high-dimensional exploratory data analysis tool
Scientific data sets continue to increase in both size and complexity. In the
past, dedicated graphics systems at supercomputing centers were required to
visualize large data sets, but as the price of commodity graphics hardware has
dropped and its capability has increased, it is now possible, in principle, to
view large complex data sets on a single workstation. To do this in practice,
an investigator will need software that is written to take advantage of the
relevant graphics hardware. The Viewpoints visualization package described
herein is an example of such software. Viewpoints is an interactive tool for
exploratory visual analysis of large, high-dimensional (multivariate) data. It
leverages the capabilities of modern graphics boards (GPUs) to run on a single
workstation or laptop. Viewpoints is minimalist: it attempts to do a small set
of useful things very well (or at least very quickly) in comparison with
similar packages today. Its basic feature set includes linked scatter plots
with brushing, dynamic histograms, normalization and outlier detection/removal.
Viewpoints was originally designed for astrophysicists, but it has since been
used in a variety of fields that range from astronomy, quantum chemistry, fluid
dynamics, machine learning, bioinformatics, and finance to information
technology server log mining. In this article, we describe the Viewpoints
package and show examples of its usage.Comment: 18 pages, 3 figures, PASP in press, this version corresponds more
closely to that to be publishe
Distribution-free factor analysis - Estimation theory and applicability to high-dimensional data
We here provide a distribution-free approach to the random factor analysis
model. We show that it leads to the same estimating equations as for the
classical ML estimates under normality, but more easily derived, and valid also
in the case of more variables than observations (). For this case we also
advocate a simple iteration method. In an illustration with and
it was seen to lead to convergence after just a few iterations. We show that
there is no reason to expect Heywood cases to appear, and that the factor
scores will typically be precisely estimated/predicted as soon as is large.
We state as a general conjecture that the nice behaviour is not despite ,
but because .Comment: 12 pages, 2 figure
Statistical Methods in Topological Data Analysis for Complex, High-Dimensional Data
The utilization of statistical methods an their applications within the new
field of study known as Topological Data Analysis has has tremendous potential
for broadening our exploration and understanding of complex, high-dimensional
data spaces. This paper provides an introductory overview of the mathematical
underpinnings of Topological Data Analysis, the workflow to convert samples of
data to topological summary statistics, and some of the statistical methods
developed for performing inference on these topological summary statistics. The
intention of this non-technical overview is to motivate statisticians who are
interested in learning more about the subject.Comment: 15 pages, 7 Figures, 27th Annual Conference on Applied Statistics in
Agricultur
Projection Pursuit for Exploratory Supervised Classification
In high-dimensional data, one often seeks a few interesting low-dimensional projections that reveal important features of the data. Projection pursuit is a procedure for searching high-dimensional data for interesting low-dimensional projections via the optimization of a criterion function called the projection pursuit index. Very few projection pursuit indices incorporate class or group information in the calculation. Hence, they cannot be adequately applied in supervised classification problems to provide low-dimensional projections revealing class differences in the data . We introduce new indices derived from linear discriminant analysis that can be used for exploratory supervised classification.Data mining, Exploratory multivariate data analysis, Gene expression data, Discriminant analysis
- …
