98 research outputs found

    A New Flexible Dendrogram Seriation Algorithm for Data Visualisation

    Get PDF
    Seriation is a data analytic tool for obtaining a permutation of a set of objects with the goal of revealing structural information within the set of objects. Seriating variables, cases or categories generally improves visualisations of statistical data, for example, by revealing hidden patterns in data or by making large datasets easier to understand. In this paper we present a new algorithm for seriation based on dendrograms. Dendrogram seriation algorithms rearrange the nodes in a dendrogram in order to obtain a permutation of the leaves (i.e. objects) that optimises a given criterion. Our algorithm is more flexible than currently available seriation algorithms because it allows the user to either choose from a variety of seriation criteria or to input their own criteria. This choice of seriation criteria is an important feature because different criteria are suitable for different visualisation settings. Common seriation criteria include measurements of the path length through a set of objects and measurements of anti-Robinson form in a symmetric matrix. We propose new seriation criteria called lazy path length and banded anti-Robinson form, and demonstrate their effectiveness in a variety of visualisation settings

    A New Flexible Dendrogram Seriation Algorithm for Data Visualisation

    Get PDF
    Seriation is a data analytic tool for obtaining a permutation of a set of objects with the goal of revealing structural information within the set of objects. Seriating variables, cases or categories generally improves visualisations of statistical data, for example, by revealing hidden patterns in data or by making large datasets easier to understand. In this paper we present a new algorithm for seriation based on dendrograms. Dendrogram seriation algorithms rearrange the nodes in a dendrogram in order to obtain a permutation of the leaves (i.e. objects) that optimises a given criterion. Our algorithm is more flexible than currently available seriation algorithms because it allows the user to either choose from a variety of seriation criteria or to input their own criteria. This choice of seriation criteria is an important feature because different criteria are suitable for different visualisation settings. Common seriation criteria include measurements of the path length through a set of objects and measurements of anti-Robinson form in a symmetric matrix. We propose new seriation criteria called lazy path length and banded anti-Robinson form, and demonstrate their effectiveness in a variety of visualisation settings

    Dendrogram seriation in data visualisation: algorithms and applications

    Get PDF
    Seriation is a data analytic tool for obtaining a permutation of a set of objects with the goal of revealing structural information within the set of objects. The purpose of this thesis is to investigate and develop tools for seriation with the goal of using these tools to enhance data visualisation. The particular focus of this thesis is on dendrogram seriation algorithms. A dendrogram is a tree-like structure used for visualising the results of a hierarchical clustering and the order of the leaves in a dendrogram provides a permutation of a set of objects. Dendrogram seriation algorithms rearrange the leaves of a dendrogram in order to find a permutation that optimises a given criterion. Dendrogram seriation algorithms are widely used, however, the research in this area is often confusing because of inconsistent or inadequate terminology. This thesis proposes new notation and terminology with the goal of better understanding and comparing dendrogram seriation algorithms. Seriation criteria measure the goodness of a permutation of a set of objects. Popular seriation criteria include the path length of a permutation and measuring anti-Robinson form in a symmetric matrix. This thesis proposes two new seriation criteria, lazy path length and banded anti-Robinson form, and demonstrates their effectiveness in improving a variety of visualisations. The main contribution of this thesis is a new dendrogram seriation algorithm. This algorithm improves on other dendrogram seriation algorithms and is also flexible because it allows the user to either choose from a variety of seriation criteria, including the new criteria mentioned above, or to input their own criteria. Finally, this thesis performs a comparison of several seriation algorithms, the results of which show that the proposed algorithm performs competitively against other algorithms. This leads to a set of general guidelines for choosing the most appropriate seriation algorithm for different seriation interests and visualisation settings

    A new physical mapping approach refines the sex-determining gene positions on the Silene latifolia Y-chromosome

    Get PDF
    Sex chromosomes are particularly interesting regions of the genome for both molecular genetics and evolutionary studies; yet, for most species, we lack basic information, such as the gene order along the chromosome. Because they lack recombination, Y-linked genes cannot be mapped genetically, leaving physical mapping as the only option for establishing the extent of synteny and homology with the X chromosome. Here, we developed a novel and general method for deletion mapping of non-recombining regions by solving "the travelling salesman problem", and evaluate its accuracy using simulated datasets. Unlike the existing radiation hybrid approach, this method allows us to combine deletion mutants from different experiments and sources. We applied our method to a set of newly generated deletion mutants in the dioecious plant Silene latifolia and refined the locations of the sex-determining loci on its Y chromosome map
    • …
    corecore