11,366 research outputs found

    The alternating least-squares algorithm for CDPCA

    Get PDF
    Clustering and Disjoint Principal Component Analysis (CDP CA) is a constrained principal component analysis recently proposed for clustering of objects and partitioning of variables, simultaneously, which we have implemented in R language. In this paper, we deal in detail with the alternating least-squares algorithm for CDPCA and highlight its algebraic features for constructing both interpretable principal components and clusters of objects. Two applications are given to illustrate the capabilities of this new methodology

    Statistical Methods and Optimization in Data Mining

    Get PDF
    The main objective of this work is to test the ability of the new tech- nique CDPCA - Clustering and Disjoint Principal Component Analysis on biological data sets to make possible visual representation of relevant characteristics for data interpretation. For this purpose, we im- plemented CDPCA in R language and conducted several experiments. Numerical results show its efficiency

    Geographic Distribution of Environmental Relative Moldiness Index Molds in USA Homes

    Get PDF
    Objective. The objective of this study was to quantify and describe the distribution of the 36 molds that make up the Environmental Relative Moldiness Index (ERMI). Materials and Methods. As part of the 2006 American Healthy Homes Survey, settled dust samples were analyzed by mold-specific quantitative PCR (MSQPCR) for the 36 ERMI molds. Each species' geographical distribution pattern was examined individually, followed by partitioning analysis in order to identify spatially meaningful patterns. For mapping, the 36 mold populations were divided into disjoint clusters on the basis of their standardized concentrations, and First Principal Component (FPC) scores were computed. Results and Conclusions. The partitioning analyses failed to uncover a valid partitioning that yielded compact, well-separated partitions with systematic spatial distributions, either on global or local criteria. Disjoint variable clustering resulted in seven mold clusters. The 36 molds and ERMI values themselves were found to be heterogeneously distributed across the United States of America (USA)

    Two-Step-SDP approach to clustering and dimensionality reduction

    Get PDF
    Inspired by the recently proposed statistical technique called clustering and disjoint principal component analysis (CDPCA), this paper presents a new algorithm for clustering objects and dimensionality reduction, based on Semidefinite Programming (SDP) models. The Two-Step-SDP algorithm is based on SDP relaxations of two clustering problems and on a K-means step in a reduced space. The Two-Step-SDP algorithm was implemented and tested in R, a widely used open source software. Besides returning clusters of both objects and attributes, the Two-Step-SDP algorithm returns the variance explained by each component and the component loadings. The numerical experiments on different data sets show that the algorithm is quite efficient and fast. Comparing to other known iterative algorithms for clustering, namely, the K-means and ALS algorithms, the computational time of the Two-Step-SDP algorithm is comparable to the K-means algorithm, and it is faster than the ALS algorithm

    Sparse Subspace Clustering: Algorithm, Theory, and Applications

    Full text link
    In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories the data belongs to. In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of subspaces and the distribution of data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm can be solved efficiently and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering

    Clustering and disjoint principal component analysis of emissions and driving volatility data collected from a hybrid electric vehicle in real drive conditions

    Get PDF
    Despite the fuel use and emission benefits of Hybrid Electric Vehicles (HEVs), few studies have characterized in detail emission patterns and driving volatility profiles from HEVs in different road types under Real Driving Emission (RDE) conditions. This paper characterized second-by-second tailpipe emissions, vehicle engine, and dynamics from a 2020 Toyota HEV sub-compact on a 44 km driving route over rural, urban, and highway roads in the Aveiro region (Portugal). Driving volatility was represented by six driving styles based on combinations of acceleration/deceleration and vehicular jerk (the rate at which an object’s acceleration changes with respect to the time). Clustering and Disjoint Principal Component Analysis (CDPCA) was applied to examine the relationships between emissions, engine, internal combustion engine (ICE) status, roadway characteristics, and vehicular jerk types. Although the urban route yielded lower carbon dioxide and nitrogen oxides emissions than rural and highway routes did, it resulted in highly volatile driving behaviors at low speeds (< 45 km.h-1). Both route type and HEV ICE operating behavior showed to have an impact on the distribution of vehicular jerk types. CDPCA constrained to road sector exhibited different shapes in the clusters of the jerk types between ICE operation status. This paper can provide insights into RDE analysis of the new generation of HEVs about the characterization of volatile driving behaviors. Such information can be integrated into vehicle electronic car units and navigation systems to provide feedback for drivers about their driving behavior in terms of high emission rates and jerkings to the vehicle.publishe

    Unsupervised clustering of Type II supernova light curves

    Get PDF
    As new facilities come online, the astronomical community will be provided with extremely large datasets of well-sampled light curves (LCs) of transient objects. This motivates systematic studies of the light curves of supernovae (SNe) of all types, including the early rising phase. We performed unsupervised k-means clustering on a sample of 59 R-band Type II SN light curves and find that our sample can be divided into three classes: slowly-rising (II-S), fast-rise/slow-decline (II-FS), and fast-rise/fast-decline (II-FF). We also identify three outliers based on the algorithm. We find that performing clustering on the first two components of a principal component analysis gives equivalent results to the analysis using the full LC morphologies. This may indicate that Type II LCs could possibly be reduced to two parameters. We present several important caveats to the technique, and find that the division into these classes is not fully robust and is sensitive to the uncertainty on the time of first light. Moreover these classes have some overlap, and are defined in the R-band only. It is currently unclear if they represent distinct physical classes, and more data is needed to study these issues. However, our analysis shows that the outliers are actually composed of slowly-evolving SN IIb, demonstrating the potential use of such methods. The slowly-evolving SNe IIb may arise from single massive progenitors.Comment: Comments welcome. Fixed small typo

    Pattern recognition for Space Applications Center director's discretionary fund

    Get PDF
    Results and conclusions are presented on the application of recent developments in pattern recognition to spacecraft star mapping systems. Sensor data for two representative starfields are processed by an adaptive shape-seeking version of the Fc-V algorithm with good results. Cluster validity measures are evaluated, but not found especially useful to this application. Recommendations are given two system configurations worthy of additional study
    corecore