749 research outputs found
Combinatorial Structure of the Deterministic Seriation Method with Multiple Subset Solutions
Seriation methods order a set of descriptions given some criterion (e.g.,
unimodality or minimum distance between similarity scores). Seriation is thus
inherently a problem of finding the optimal solution among a set of
permutations of objects. In this short technical note, we review the
combinatorial structure of the classical seriation problem, which seeks a
single solution out of a set of objects. We then extend those results to the
iterative frequency seriation approach introduced by Lipo (1997), which finds
optimal subsets of objects which each satisfy the unimodality criterion within
each subset. The number of possible solutions across multiple solution subsets
is larger than , which underscores the need to find new algorithms and
heuristics to assist in the deterministic frequency seriation problem.Comment: 8 pages, 2 figure
Measuring Cultural Relatedness Using Multiple Seriation Ordering Algorithms
Seriation is a long-standing archaeological method for relative dating that has proven effective in probing regional-scale patterns of inheritance, social networks , and cultural contact in their full spatiotemporal context. The orderings produced by seriation are produced by the continuity of class distributions and uni-modality of class frequencies, properties that are related to social learning and transmission models studied by evolutionary archaeologists. Linking seriation to social learning and transmission enables one to consider ordering principles beyond the classic unimodal curve. Unimodality is a highly visible property that can be used to probe and measure the relationships between assemblages, and it was especially useful when seriation was accomplished with simple algorithms and manual effort. With modern algorithms and computing power, multiple ordering principles can be employed to better understand the spatiotemporal relations between assemblages. Ultimately, the expansion of seriation to additional ordering algorithms allows us an ability to more thoroughly explore underlying models of cultural contact, social networks, and modes of social learning. In this paper, we review our progress to date in extending seriation to multiple ordering algorithms, with examples from Eastern North America and Oceania
Overview of the Relational Analysis approach in Data-Mining and Multi-criteria Decision Making
International audienceIn this chapter we introduce a general framework called the Relational Analysis approach and its related contributions and applications in the fields of data analysis, data mining and multi-criteria decision making. This approach was initiated by J.F. Marcotorchino and P. Michaud at the end of the 70's and has generated many research activities. However, the aspects of this framework that we would like to focus on are of a theoretical kind. Indeed, we are aimed at recalling the background and the basics of this framework, the unifying results and the modeling contributions that it has allowed to achieve. Besides, the main tasks that we are interested in are the ranking aggregation problem, the clustering problem and the block seriation problem. Those problems are combinatorial ones and the computational considerations of such tasks in the context of the RA methodology will not be covered. However, among the list of references that we give thoughout this chapter, there are numerous articles that the interested reader could consult to this end
Dendrogram seriation in data visualisation: algorithms and applications
Seriation is a data analytic tool for obtaining a permutation of a set of objects
with the goal of revealing structural information within the set of objects. The
purpose of this thesis is to investigate and develop tools for seriation with the
goal of using these tools to enhance data visualisation.
The particular focus of this thesis is on dendrogram seriation algorithms.
A dendrogram is a tree-like structure used for visualising the results of a hierarchical
clustering and the order of the leaves in a dendrogram provides a
permutation of a set of objects. Dendrogram seriation algorithms rearrange
the leaves of a dendrogram in order to find a permutation that optimises a
given criterion.
Dendrogram seriation algorithms are widely used, however, the research in
this area is often confusing because of inconsistent or inadequate terminology.
This thesis proposes new notation and terminology with the goal of better
understanding and comparing dendrogram seriation algorithms.
Seriation criteria measure the goodness of a permutation of a set of objects.
Popular seriation criteria include the path length of a permutation and measuring
anti-Robinson form in a symmetric matrix. This thesis proposes two
new seriation criteria, lazy path length and banded anti-Robinson form,
and demonstrates their effectiveness in improving a variety of visualisations.
The main contribution of this thesis is a new dendrogram seriation algorithm.
This algorithm improves on other dendrogram seriation algorithms and
is also flexible because it allows the user to either choose from a variety of seriation
criteria, including the new criteria mentioned above, or to input their
own criteria.
Finally, this thesis performs a comparison of several seriation algorithms,
the results of which show that the proposed algorithm performs competitively
against other algorithms. This leads to a set of general guidelines for choosing
the most appropriate seriation algorithm for different seriation interests and
visualisation settings
Modules in Robinson Spaces
A Robinson space is a dissimilarity space (i.e., a set of size
and a dissimilarity on ) for which there exists a total order on
such that implies that .
Recognizing if a dissimilarity space is Robinson has numerous applications in
seriation and classification. An mmodule of (generalizing the notion of
a module in graph theory) is a subset of which is not distinguishable
from the outside of , i.e., the distance from any point of to
all points of is the same. If is any point of , then and
the maximal by inclusion mmodules of not containing define a
partition of , called the copoint partition. In this paper, we investigate
the structure of mmodules in Robinson spaces and use it and the copoint
partition to design a simple and practical divide-and-conquer algorithm for
recognition of Robinson spaces in optimal time
A New Measure for Analyzing and Fusing Sequences of Objects
This work is related to the combinatorial data analysis problem of seriation used for data visualization and exploratory analysis. Seriation re-sequences the data, so that more similar samples or objects appear closer together, whereas dissimilar ones are further apart. Despite the large number of current algorithms to realize such re-sequencing, there has not been a systematic way for analyzing the resulting sequences, comparing them, or fusing them to obtain a single unifying one. We propose a new positional proximity measure that evaluates the similarity of two arbitrary sequences based on their agreement on pairwise positional information of the sequenced objects. Furthermore, we present various statistical properties of this measure as well as its normalized version modeled as an instance of the generalized correlation coefficient. Based on this measure, we define a new procedure for consensus seriation that fuses multiple arbitrary sequences based on a quadratic assignment problem formulation and an efficient way of approximating its solution. We also derive theoretical links with other permutation distance functions and present their associated combinatorial optimization forms for consensus tasks. The utility of the proposed contributions is demonstrated through the comparison and fusion of multiple seriation algorithms we have implemented, using many real-world datasets from different application domains
- …