11,366 research outputs found
The alternating least-squares algorithm for CDPCA
Clustering and Disjoint Principal Component Analysis (CDP CA) is a constrained principal component analysis recently proposed for clustering of objects and partitioning of variables, simultaneously, which we have implemented in R language. In this paper, we deal in detail with the alternating least-squares algorithm for CDPCA and highlight its algebraic features for constructing both interpretable principal components and clusters of objects. Two applications are given to illustrate the capabilities of this new methodology
Statistical Methods and Optimization in Data Mining
The main objective of this work is to test the ability of the new tech-
nique CDPCA - Clustering and Disjoint Principal Component Analysis on biological data sets to make possible visual representation of
relevant characteristics for data interpretation. For this purpose, we im-
plemented CDPCA in R language and conducted several experiments. Numerical results show its efficiency
Geographic Distribution of Environmental Relative Moldiness Index Molds in USA Homes
Objective. The objective of this study was to quantify and describe the distribution of the 36 molds that make up the Environmental Relative Moldiness Index (ERMI).
Materials and Methods. As part of the 2006 American Healthy Homes Survey, settled dust samples were analyzed by mold-specific quantitative PCR (MSQPCR) for the 36 ERMI molds. Each species' geographical distribution pattern was examined individually, followed by partitioning analysis in order to identify spatially meaningful patterns. For mapping, the 36 mold populations were divided into disjoint clusters on the basis of their standardized concentrations, and First Principal Component (FPC) scores were computed.
Results and Conclusions. The partitioning analyses failed to uncover a valid partitioning that yielded compact, well-separated partitions with systematic spatial distributions, either on global or local criteria. Disjoint variable clustering resulted in seven mold clusters. The 36 molds and ERMI values themselves were found to be heterogeneously distributed across the United States of America (USA)
Two-Step-SDP approach to clustering and dimensionality reduction
Inspired by the recently proposed statistical technique called clustering and disjoint principal component
analysis (CDPCA), this paper presents a new algorithm for clustering objects and dimensionality reduction, based on
Semidefinite Programming (SDP) models. The Two-Step-SDP algorithm is based on SDP relaxations of two clustering
problems and on a K-means step in a reduced space. The Two-Step-SDP algorithm was implemented and tested in R, a
widely used open source software. Besides returning clusters of both objects and attributes, the Two-Step-SDP algorithm
returns the variance explained by each component and the component loadings. The numerical experiments on different
data sets show that the algorithm is quite efficient and fast. Comparing to other known iterative algorithms for clustering,
namely, the K-means and ALS algorithms, the computational time of the Two-Step-SDP algorithm is comparable to the
K-means algorithm, and it is faster than the ALS algorithm
Sparse Subspace Clustering: Algorithm, Theory, and Applications
In many real-world problems, we are dealing with collections of
high-dimensional data, such as images, videos, text and web documents, DNA
microarray data, and more. Often, high-dimensional data lie close to
low-dimensional structures corresponding to several classes or categories the
data belongs to. In this paper, we propose and study an algorithm, called
Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of
low-dimensional subspaces. The key idea is that, among infinitely many possible
representations of a data point in terms of other points, a sparse
representation corresponds to selecting a few points from the same subspace.
This motivates solving a sparse optimization program whose solution is used in
a spectral clustering framework to infer the clustering of data into subspaces.
Since solving the sparse optimization program is in general NP-hard, we
consider a convex relaxation and show that, under appropriate conditions on the
arrangement of subspaces and the distribution of data, the proposed
minimization program succeeds in recovering the desired sparse representations.
The proposed algorithm can be solved efficiently and can handle data points
near the intersections of subspaces. Another key advantage of the proposed
algorithm with respect to the state of the art is that it can deal with data
nuisances, such as noise, sparse outlying entries, and missing entries,
directly by incorporating the model of the data into the sparse optimization
program. We demonstrate the effectiveness of the proposed algorithm through
experiments on synthetic data as well as the two real-world problems of motion
segmentation and face clustering
Clustering and disjoint principal component analysis of emissions and driving volatility data collected from a hybrid electric vehicle in real drive conditions
Despite the fuel use and emission benefits of Hybrid Electric Vehicles (HEVs), few studies have characterized in detail emission patterns and driving volatility profiles from HEVs in different road types under Real Driving Emission (RDE) conditions. This paper characterized second-by-second tailpipe emissions, vehicle engine, and dynamics from a 2020 Toyota HEV sub-compact on a 44 km driving route over rural, urban, and highway roads in the Aveiro region (Portugal). Driving volatility was represented by six driving styles based on combinations of acceleration/deceleration and vehicular jerk (the rate at which an object’s acceleration changes with respect to the time). Clustering and Disjoint Principal Component Analysis (CDPCA) was applied to examine the relationships between emissions, engine, internal combustion engine (ICE) status, roadway characteristics, and vehicular jerk types. Although the urban route yielded lower carbon dioxide and nitrogen oxides emissions than rural and highway routes did, it resulted in highly volatile driving behaviors at low speeds (< 45 km.h-1). Both route type and HEV ICE operating behavior showed to have an impact on the distribution of vehicular jerk types. CDPCA constrained to road sector exhibited different shapes in the clusters of the jerk types between ICE operation status. This paper can provide insights into RDE analysis of the new generation of HEVs about the characterization of volatile driving behaviors. Such information can be integrated into vehicle electronic car units and navigation systems to provide feedback for drivers about their driving behavior in terms of high emission rates and jerkings to the vehicle.publishe
Unsupervised clustering of Type II supernova light curves
As new facilities come online, the astronomical community will be provided
with extremely large datasets of well-sampled light curves (LCs) of transient
objects. This motivates systematic studies of the light curves of supernovae
(SNe) of all types, including the early rising phase. We performed unsupervised
k-means clustering on a sample of 59 R-band Type II SN light curves and find
that our sample can be divided into three classes: slowly-rising (II-S),
fast-rise/slow-decline (II-FS), and fast-rise/fast-decline (II-FF). We also
identify three outliers based on the algorithm. We find that performing
clustering on the first two components of a principal component analysis gives
equivalent results to the analysis using the full LC morphologies. This may
indicate that Type II LCs could possibly be reduced to two parameters. We
present several important caveats to the technique, and find that the division
into these classes is not fully robust and is sensitive to the uncertainty on
the time of first light. Moreover these classes have some overlap, and are
defined in the R-band only. It is currently unclear if they represent distinct
physical classes, and more data is needed to study these issues. However, our
analysis shows that the outliers are actually composed of slowly-evolving SN
IIb, demonstrating the potential use of such methods. The slowly-evolving SNe
IIb may arise from single massive progenitors.Comment: Comments welcome. Fixed small typo
Pattern recognition for Space Applications Center director's discretionary fund
Results and conclusions are presented on the application of recent developments in pattern recognition to spacecraft star mapping systems. Sensor data for two representative starfields are processed by an adaptive shape-seeking version of the Fc-V algorithm with good results. Cluster validity measures are evaluated, but not found especially useful to this application. Recommendations are given two system configurations worthy of additional study
- …