318,519 research outputs found

    Finding the principal points of a random variable

    Get PDF
    The p-principal points of a random variable X with finite second moment are those p points in R minimizing the expected squared distance from X to the closest point. Although the determination of principal points involves in general the resolution of a multiextremal optimization problem, existing procedures in the literature provide just a local optimum. In this paper we show that standard Global Optimization techniques can be applied.Ministerio de Ciencia y Tecnologí

    Another look at principal curves and surfaces

    Get PDF
    © . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Principal curves have been defined as smooth curves passing through the “middle” of a multidimensional data set. They are nonlinear generalizations of the first principal component, a characterization of which is the basis of the definition of principal curves. We establish a new characterization of the first principal component and base our new definition of a principal curve on this property. We introduce the notion of principal oriented points and we prove the existence of principal curves passing through these points. We extend the definition of principal curves to multivariate data sets and propose an algorithm to find them. The new notions lead us to generalize the definition of total variance. Successive principal curves are recursively defined from this generalization. The new methods are illustrated on simulated and real data sets.Peer ReviewedPostprint (author's final draft

    Search for high-amplitude Delta Scuti and RR Lyrae stars in Sloan Digital Sky Survey Stripe 82 using principal component analysis

    Get PDF
    We propose a robust principal component analysis (PCA) framework for the exploitation of multi-band photometric measurements in large surveys. Period search results are improved using the time series of the first principal component due to its optimized signal-to-noise ratio.The presence of correlated excess variations in the multivariate time series enables the detection of weaker variability. Furthermore, the direction of the largest variance differs for certain types of variable stars. This can be used as an efficient attribute for classification. The application of the method to a subsample of Sloan Digital Sky Survey Stripe 82 data yielded 132 high-amplitude Delta Scuti variables. We found also 129 new RR Lyrae variables, complementary to the catalogue of Sesar et al., 2010, extending the halo area mapped by Stripe 82 RR Lyrae stars towards the Galactic bulge. The sample comprises also 25 multiperiodic or Blazhko RR Lyrae stars.Comment: 23 pages, 17 figure

    Product-Driven Data Mining

    Get PDF
    Manifold Data Mining has developed innovative demographic and household spending pattern databases for six-digit postal codes in Canada. Their collection of information consists of both demographic and expenditure variables which are expressed through thousands of individually tracked factors. This large collection of information about consumer behaviour is typically referred to as a mine. Although very large in practice, for the purposes of this report, the data mine consisted of mm individuals and nn factors where m2000m \sim 2000 and n50n \sim 50 . Ideally, the first algorithm would identify a few factors in the data mine which would differentiate customers in terms of a particular product preference. Then the second algorithm would build on this information by looking for patterns in the data mine which would identify related areas of consumer spending. To test the algorithms two case studies were undertaken. The first study involved differentiating BMW and Honda car owners. The algorithms developed were reasonably successful at both finding questions that differentiate these two populations and identifying common characteristics amongst the groups of respondents. For the second case study it was hoped that the same algorithms could differentiate between consumers of two brands of beer. In this case the first algorithm was not as successful as differentiating between all groups; it showed some distinctions between beer drinkers and non-beer drinkers, but not as clearly defined as in the first case study. The second algorithm was then used successfully to further identify spending patterns once this distinction was made. In this second case study a deeper factor analysis could be used to identify a combination of factors which could be used in the first algorithm
    corecore