1,535 research outputs found

    Multidimensional scaling

    Get PDF
    Multidimensional scaling is a statistical technique to visualize dissimilarity data. In multidimensional scaling, objects are represented as points in a usually two dimensional space, such that the distances between the points match the observed dissimilarities as closely as possible. Here, we discuss what kind of data can be used for multidimensional scaling, what the essence of the technique is, how to choose the dimensionality, transformations of the dissimilarities, and some pitfalls to watch out for when using multidimensional scaling.

    Choosing Attribute Weights for Item Dissimilarity using Clikstream Data with an Application to a Product Catalog Map

    Get PDF
    In content- and knowledge-based recommender systems often a measure of (dis)similarity between items is used. Frequently, this measure is based on the attributes of the items. However, which attributes are important for the users of the system remains an important question to answer. In this paper, we present an approach to determine attribute weights in a dissimilarity measure using clickstream data of an e-commerce website. Counted is how many times products are sold and based on this a Poisson regression model is estimated. Estimates of this model are then used to determine the attribute weights in the dissimilarity measure. We show an application of this approach on a product catalog of MP3 players provided by Compare Group, owner of the Dutch price comparison site http://www.vergelijk.nl, and show how the dissimilarity measure can be used to improve 2D product catalog visualizations.dissimilarity measure;attribute weights;clickstream data;comparison

    Seriation by constrained correspondence analysis: a simulation study

    Get PDF
    One of the many areas in which Correspondence Analysis (CA) is an effectivemethod, concerns ordination problems. For example, CA is a well-knowntechnique for the seriation of archaeological assemblages. A problem withthe CA seriation solution, however, is that only a relative ordering of theassemblages is obtained. To improve the usual CA solution, a constrained CAapproach that incorporates additional information in the form of equalityand inequality constraints concerning the time points of the assemblages maybe considered. Using such constraints, explicit dates can be assigned to theseriation solution. In this paper, we extend the set of constraints that canbe used in CA by introducing interval constraints. That is, constraints thatput the CA\\ solution within a specific time-frame. Moreover, we study thequality of the constrained CA solution in a simulation study. In particular,by means of the simulation study we are able to assess how well ordinary andconstrained CA can recover the true time order. Furthermore, for theconstrained approach, we see how well the true dates are retrieved. Thesimulation study is set up in such a way that it mimics the data of a seriesof ceramic assemblages consisting of the locally produced tableware fromSagalassos (SW Turkey). We find that the dating of the assemblages on thebasis of constraints appears to work quite well.

    Area Biplots

    Get PDF
    Classical multivariate analysis techniques such as principal components analysis and correspondence analysis use inner products to estimate data values. The results of these techniques may be visualized by representing the row and column points jointly in a biplot where the projection of a row point onto a column point vector followed by a multiplication by the length of the column point vector gives the inner-product that estimates the corresponding data element. In this paper, we propose a newvisualization: after a 90 degrees rotation of the row points, the areas spanned by a triangle of a row point, a column point and the origin estimate the data values. In contrast to the projection biplot, the areas spanned by different row and column points can be compared directly. Areas can only be produced for two dimensions at a time, but higher dimensional solutions can be represented by summing areas over subsequent pairs of dimensions. Here, the area biplot is developed for principal components analysis, correspondence analysis, and for interaction biplots but has general applicability.interaction;correspondence analysis;visualization;principal component analysis;biplot

    Inverse correspondence analysis

    Get PDF
    AbstractIn correspondence analysis (CA), rows and columns of a data matrix are depicted as points in low-dimensional space. The row and column profiles are approximated by minimizing the so-called weighted chi-squared distance between the original profiles and their approximations, see for example, [Theory and applications of correspondence analysis, Academic Press, New York, 1984]. In this paper, we will study the inverse CA problem, that is, the possibilities for retrieving one or more data matrices from a low-dimensional CA solution. We will show that there exists a nonempty closed and bounded polyhedron of such matrices. We also present two algorithms to find the vertices of the polyhedron: an exact algorithm that finds all vertices and a heuristic approach for larger sized problems that will find some of the vertices. A proof that the maximum of the Pearson chi-squared statistic is attained at one of the vertices is given. In addition, it is discussed how extra equality constraints on some elements of the data matrix can be imposed on the inverse CA problem. As a special case, we present a method for imposing integer restrictions on the data matrix as well. The approach to inverse CA followed here is similar to the one employed by De Leeuw and Groenen [J. Classification 14 (1997) 3] in their inverse multidimensional scaling problem

    Participatory plant breeding: a way to arrive at better-adapted onion varieties

    Get PDF
    The search for varieties that are better adapted to organic farming is a current topic in the organic sector. Breeding programmes specific for organic agriculture should solve this problem. Collaborating with organic farmers in such programmes, particularly in the selection process, can potentially result in varieties better adapted to their needs. Here, we assume that organic farmers' perceptive of plant health is broader than that of conventional breeders. Two organic onion farmers and one conventional onion breeder were monitored in their selection activities in 2004 and 2005 in order to verify whether and in which way this broader view on plant health contributes to improvement of organic varieties. They made selections by positive mass selection in three segregating populations under organic conditions. The monitoring showed that the organic farmers selected in the field for earliness and downy mildew and after storage for bulb characteristics. The conventional breeder selected only after storage. Farmers and breeder applied identical selection directions for bulb traits as a round shape, better hardness and skin firmness. This resulted in smaller bulbs in the breeders’ populations, while the bulbs in the farmer populations were bigger than in the original population. In 2006 and 2007 the new onion populations will be compared with each other and the original populations to determine the selection response

    Multidimensional scaling

    Get PDF
    Multidimensional scaling is a statistical technique to visualize dissimilarity data. In multidimensional scaling, objects are represented as points in a usually two dimensional space, such that the distances between the points match the observed dissimilarities as closely as possible. Here, we discuss what kind of data can be used for multidimensional scaling, what the essence of the technique is, how to choose the dimensionality, transformations of the dissimilarities, and some pitfalls to watch out for when using multidimensional scaling

    Inverse correspondence analysis

    Get PDF
    In correspondence analysis, rows and columns of a data matrix are depicted as points in low-dimensional space. The row and column profiles are approximated by minimizing the so-called weighted chi squared distance between the original profiles and their approximations, see or example, Greenacre (1984). In this paper, we will study the inverse correspondence analysis solution. We will show that there exists a nonempty closed and bounded polyhedron of such matrices. We also present an algorithm to find the vertices of the polyhedron. A proof that the maximum of the Pearson chi-squared statistic is attained at one of the vertices is given. In addition, it is discussed how extra equality constraints on some elements of the data matrix can be imposed on the inverse correspondence analysis problem. As a special case, we present a method for imposing integer restrictions on the data matrix as well. The approach to inverse correspondence analysis followed here is similar to the one employed by De Leeuw and Groenen (1997) in their inverse multidimensional scaling problem

    Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA

    Get PDF
    Background Artificial selection has caused rapid evolution in domesticated species. The identification of selection footprints across domesticated genomes can contribute to uncover the genetic basis of phenotypic diversity. Methodology/Main Findings Genome wide footprints of pig domestication and selection were identified using massive parallel sequencing of pooled reduced representation libraries (RRL) representing ~2% of the genome from wild boar and four domestic pig breeds (Large White, Landrace, Duroc and Pietrain) which have been under strong selection for muscle development, growth, behavior and coat color. Using specifically developed statistical methods that account for DNA pooling, low mean sequencing depth, and sequencing errors, we provide genome-wide estimates of nucleotide diversity and genetic differentiation in pig. Widespread signals suggestive of positive and balancing selection were found and the strongest signals were observed in Pietrain, one of the breeds most intensively selected for muscle development. Most signals were population-specific but affected genomic regions which harbored genes for common biological categories including coat color, brain development, muscle development, growth, metabolism, olfaction and immunity. Genetic differentiation in regions harboring genes related to muscle development and growth was higher between breeds than between a given breed and the wild boar. Conclusions/Significance These results, suggest that although domesticated breeds have experienced similar selective pressures, selection has acted upon different genes. This might reflect the multiple domestication events of European breeds or could be the result of subsequent introgression of Asian alleles. Overall, it was estimated that approximately 7% of the porcine genome has been affected by selection events. This study illustrates that the massive parallel sequencing of genomic pools is a cost-effective approach to identify footprints of selection
    corecore