6,518 research outputs found

    PCA in Autocorrelation Space

    Get PDF
    The use of higher order autocorrelations as features for pattern classification has been usually restricted to second or third orders due to high computational costs. Since the autocorrelation space is a high dimensional space we are interested in reducing the dimensionality of feature vectors for the benefit of the pattern classification task. An established technique is Principal Component Analysis (PCA) which, however, cannot be applied directly in the autocorrelation space. In this paper we develop a new method for performing PCA in autocorrelation space, without explicitly computing the autocorrelations. The connections with the nonlinear PCA and possible extensions are also discussed

    Revisiting Guerry's data: Introducing spatial constraints in multivariate analysis

    Full text link
    Standard multivariate analysis methods aim to identify and summarize the main structures in large data sets containing the description of a number of observations by several variables. In many cases, spatial information is also available for each observation, so that a map can be associated to the multivariate data set. Two main objectives are relevant in the analysis of spatial multivariate data: summarizing covariation structures and identifying spatial patterns. In practice, achieving both goals simultaneously is a statistical challenge, and a range of methods have been developed that offer trade-offs between these two objectives. In an applied context, this methodological question has been and remains a major issue in community ecology, where species assemblages (i.e., covariation between species abundances) are often driven by spatial processes (and thus exhibit spatial patterns). In this paper we review a variety of methods developed in community ecology to investigate multivariate spatial patterns. We present different ways of incorporating spatial constraints in multivariate analysis and illustrate these different approaches using the famous data set on moral statistics in France published by Andr\'{e}-Michel Guerry in 1833. We discuss and compare the properties of these different approaches both from a practical and theoretical viewpoint.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS356 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Analyzing big time series data in solar engineering using features and PCA

    Get PDF
    In solar engineering, we encounter big time series data such as the satellite-derived irradiance data and string-level measurements from a utility-scale photovoltaic (PV) system. While storing and hosting big data are certainly possible using today’s data storage technology, it is challenging to effectively and efficiently visualize and analyze the data. We consider a data analytics algorithm to mitigate some of these challenges in this work. The algorithm computes a set of generic and/or application-specific features to characterize the time series, and subsequently uses principal component analysis to project these features onto a two-dimensional space. As each time series can be represented by features, it can be treated as a single data point in the feature space, allowing many operations to become more amenable. Three applications are discussed within the overall framework, namely (1) the PV system type identification, (2) monitoring network design, and (3) anomalous string detection. The proposed framework can be easily translated to many other solar engineer applications

    Geographical, socioeconomic, and ecological determinants of exotic plant naturalization in the United States : insights and updates from improved data

    Get PDF
    Previous studies on alien species establishment in the United States and around the world have drastically improved our understanding of the patterns of species naturalization, biological invasions, and underlying mechanisms. Meanwhile, relevant new data have been added and the data quality has significantly increased along with the consistency of related concepts and terminology that are being developed. Here using new and/or improved data on the native and exotic plant richness and many socioeconomic and physical variables at the state level in the United States, we attempt to test whether previously discovered patterns still hold, particularly how native and exotic species are related and what are the dominant factors controlling the plant naturalization. We found that, while the number of native species is largely controlled by natural factors such as area and temperature, exotic species and exotic fraction are predominantly influenced by social factors such as human population. When domestically introduced species were included, several aspects in earlier findings were somewhat altered and additional insights regarding the mechanisms of naturalization could be achieved. With increased data availability, however, a greater challenge ahead appears to be how many and which variables to include in analyses
    • …
    corecore