749 research outputs found

    Combinatorial Structure of the Deterministic Seriation Method with Multiple Subset Solutions

    Get PDF
    Seriation methods order a set of descriptions given some criterion (e.g., unimodality or minimum distance between similarity scores). Seriation is thus inherently a problem of finding the optimal solution among a set of permutations of objects. In this short technical note, we review the combinatorial structure of the classical seriation problem, which seeks a single solution out of a set of objects. We then extend those results to the iterative frequency seriation approach introduced by Lipo (1997), which finds optimal subsets of objects which each satisfy the unimodality criterion within each subset. The number of possible solutions across multiple solution subsets is larger than n!n!, which underscores the need to find new algorithms and heuristics to assist in the deterministic frequency seriation problem.Comment: 8 pages, 2 figure

    Measuring Cultural Relatedness Using Multiple Seriation Ordering Algorithms

    Get PDF
    Seriation is a long-standing archaeological method for relative dating that has proven effective in probing regional-scale patterns of inheritance, social networks , and cultural contact in their full spatiotemporal context. The orderings produced by seriation are produced by the continuity of class distributions and uni-modality of class frequencies, properties that are related to social learning and transmission models studied by evolutionary archaeologists. Linking seriation to social learning and transmission enables one to consider ordering principles beyond the classic unimodal curve. Unimodality is a highly visible property that can be used to probe and measure the relationships between assemblages, and it was especially useful when seriation was accomplished with simple algorithms and manual effort. With modern algorithms and computing power, multiple ordering principles can be employed to better understand the spatiotemporal relations between assemblages. Ultimately, the expansion of seriation to additional ordering algorithms allows us an ability to more thoroughly explore underlying models of cultural contact, social networks, and modes of social learning. In this paper, we review our progress to date in extending seriation to multiple ordering algorithms, with examples from Eastern North America and Oceania

    Overview of the Relational Analysis approach in Data-Mining and Multi-criteria Decision Making

    Get PDF
    International audienceIn this chapter we introduce a general framework called the Relational Analysis approach and its related contributions and applications in the fields of data analysis, data mining and multi-criteria decision making. This approach was initiated by J.F. Marcotorchino and P. Michaud at the end of the 70's and has generated many research activities. However, the aspects of this framework that we would like to focus on are of a theoretical kind. Indeed, we are aimed at recalling the background and the basics of this framework, the unifying results and the modeling contributions that it has allowed to achieve. Besides, the main tasks that we are interested in are the ranking aggregation problem, the clustering problem and the block seriation problem. Those problems are combinatorial ones and the computational considerations of such tasks in the context of the RA methodology will not be covered. However, among the list of references that we give thoughout this chapter, there are numerous articles that the interested reader could consult to this end

    Dendrogram seriation in data visualisation: algorithms and applications

    Get PDF
    Seriation is a data analytic tool for obtaining a permutation of a set of objects with the goal of revealing structural information within the set of objects. The purpose of this thesis is to investigate and develop tools for seriation with the goal of using these tools to enhance data visualisation. The particular focus of this thesis is on dendrogram seriation algorithms. A dendrogram is a tree-like structure used for visualising the results of a hierarchical clustering and the order of the leaves in a dendrogram provides a permutation of a set of objects. Dendrogram seriation algorithms rearrange the leaves of a dendrogram in order to find a permutation that optimises a given criterion. Dendrogram seriation algorithms are widely used, however, the research in this area is often confusing because of inconsistent or inadequate terminology. This thesis proposes new notation and terminology with the goal of better understanding and comparing dendrogram seriation algorithms. Seriation criteria measure the goodness of a permutation of a set of objects. Popular seriation criteria include the path length of a permutation and measuring anti-Robinson form in a symmetric matrix. This thesis proposes two new seriation criteria, lazy path length and banded anti-Robinson form, and demonstrates their effectiveness in improving a variety of visualisations. The main contribution of this thesis is a new dendrogram seriation algorithm. This algorithm improves on other dendrogram seriation algorithms and is also flexible because it allows the user to either choose from a variety of seriation criteria, including the new criteria mentioned above, or to input their own criteria. Finally, this thesis performs a comparison of several seriation algorithms, the results of which show that the proposed algorithm performs competitively against other algorithms. This leads to a set of general guidelines for choosing the most appropriate seriation algorithm for different seriation interests and visualisation settings

    Modules in Robinson Spaces

    Full text link
    A Robinson space is a dissimilarity space (X,d)(X,d) (i.e., a set XX of size nn and a dissimilarity dd on XX) for which there exists a total order << on XX such that x<y<zx<y<z implies that d(x,z)max{d(x,y),d(y,z)}d(x,z)\ge \max\{ d(x,y), d(y,z)\}. Recognizing if a dissimilarity space is Robinson has numerous applications in seriation and classification. An mmodule of (X,d)(X,d) (generalizing the notion of a module in graph theory) is a subset MM of XX which is not distinguishable from the outside of MM, i.e., the distance from any point of XMX\setminus M to all points of MM is the same. If pp is any point of XX, then {p}\{ p\} and the maximal by inclusion mmodules of (X,d)(X,d) not containing pp define a partition of XX, called the copoint partition. In this paper, we investigate the structure of mmodules in Robinson spaces and use it and the copoint partition to design a simple and practical divide-and-conquer algorithm for recognition of Robinson spaces in optimal O(n2)O(n^2) time

    A New Measure for Analyzing and Fusing Sequences of Objects

    Get PDF
    This work is related to the combinatorial data analysis problem of seriation used for data visualization and exploratory analysis. Seriation re-sequences the data, so that more similar samples or objects appear closer together, whereas dissimilar ones are further apart. Despite the large number of current algorithms to realize such re-sequencing, there has not been a systematic way for analyzing the resulting sequences, comparing them, or fusing them to obtain a single unifying one. We propose a new positional proximity measure that evaluates the similarity of two arbitrary sequences based on their agreement on pairwise positional information of the sequenced objects. Furthermore, we present various statistical properties of this measure as well as its normalized version modeled as an instance of the generalized correlation coefficient. Based on this measure, we define a new procedure for consensus seriation that fuses multiple arbitrary sequences based on a quadratic assignment problem formulation and an efficient way of approximating its solution. We also derive theoretical links with other permutation distance functions and present their associated combinatorial optimization forms for consensus tasks. The utility of the proposed contributions is demonstrated through the comparison and fusion of multiple seriation algorithms we have implemented, using many real-world datasets from different application domains
    corecore