7 research outputs found

    Author index

    Get PDF

    Contribution à l'étude de la distributivité d'un treillis de concepts

    Get PDF
    National audienceNous nous intéressons aux treillis distributifs dans le cadre de l'analyse formelle de concepts (FCA). La motivation primitive vient de la phylogénie et des graphes médians pour représenter les dérivations biologiques et les arbres parcimonieux. La FCA propose des algorithmes efficaces de construction de treillis de concepts. Cependant, un treillis de concepts n'est pas en correspondance avec un graphe médian sauf s'il est distributif, d'où l'idée d'étudier la transformation d'un treillis de concepts en un treillis distributif. Pour ce faire, nous nous appuyons sur le théorème de représentation de Birkhoff qui nous permet de systématiser la transformation d'un contexte quelconque en un contexte de treillis de concepts distributif. Ainsi, nous pouvons bénéficier de l'algorithmique de FCA pour construire mais aussi visualiser les treillis de concepts distributifs, et enfin étudier les graphes médians associés

    Affine and Projective Tree Metric Theorems

    Get PDF
    The tree metric theorem provides a combinatorial four-point condition that characterizes dissimilarity maps derived from pairwise compatible split systems. A related weaker four point condition characterizes dissimilarity maps derived from circular split systems known as Kalmanson metrics. The tree metric theorem was first discovered in the context of phylogenetics and forms the basis of many tree reconstruction algorithms, whereas Kalmanson metrics were first considered by computer scientists, and are notable in that they are a non-trivial class of metrics for which the traveling salesman problem is tractable. We present a unifying framework for these theorems based on combinatorial structures that are used for graph planarity testing. These are (projective) PC-trees, and their affine analogs, PQ-trees. In the projective case, we generalize a number of concepts from clustering theory, including hierarchies, pyramids, ultrametrics, and Robinsonian matrices, and the theorems that relate them. As with tree metrics and ultrametrics, the link between PC-trees and PQ-trees is established via the Gromov product

    Combinatorial algorithms for the seriation problem

    Get PDF
    In this thesis we study the seriation problem, a combinatorial problem arising in data analysis, which asks to sequence a set of objects in such a way that similar objects are ordered close to each other. We focus on the combinatorial structure and properties of Robinsonian matrices, a special class of structured matrices which best achieve the seriation goal. Our contribution is both theoretical and practical, with a particular emphasis on algorithms. In Chapter 2 we introduce basic concepts about graphs, permutations and proximity matrices used throughout the thesis. In Chapter 3 we present Robinsonian matrices, discussing their characterizations and recognition algorithms existing in the literature. In Chapter 4 we discuss Lexicographic Breadth-First search (Lex-BFS), a special graph traversal algorithm used in multisweep algorithms for the recognition of several classes of graphs. In Chapter 5 we introduce a new Lex-BFS based algorithm to recognize Robinsonian matrices, which is derived from a new characterization of Robinsonian matrices in terms of straight enumerations of unit interval graphs. In Chapter 6 we introduce the novel Similarity-First Search algorithm (SFS), a weighted version of Lex-BFS which we use in a multisweep algorithm for the recognition of Robinsonian matrices. In Chapter 7 we model the seriation problem as an instance of Quadratic Assignment Problem (QAP) and we show that if the data has a Robinsonian structure, then one can find an optimal solution for QAP using a Robinsonian recognition algorithm. In Chapter 8 we discuss how to solve the seriation problem when the data does not have a Robinsonian structure, by finding a Robinsonian approximation of the original data. Finally, in Chapter 9 we discuss some experiments which we have carried out in order to compare the performance of the algorithms introduced in the thesis

    Combinatorial algorithms for the seriation problem

    Get PDF

    New approaches for clustering high dimensional data

    Get PDF
    Clustering is one of the most effective methods for analyzing datasets that contain a large number of objects with numerous attributes. Clustering seeks to identify groups, or clusters, of similar objects. In low dimensional space, the similarity between objects is often evaluated by summing the difference across all of their attributes. High dimensional data, however, may contain irrelevant attributes which mask the existence of clusters. The discovery of groups of objects that are highly similar within some subsets of relevant attributes becomes an important but challenging task. My thesis focuses on various models and algorithms for this task. We first present a flexible clustering model, namely OP-Cluster (Order Preserving Cluster). Under this model, two objects are similar on a subset of attributes if the values of these two objects induce the same relative ordering of these attributes. OPClustering algorithm has demonstrated to be useful to identify co-regulated genes in gene expression data. We also propose a semi-supervised approach to discover biologically meaningful OP-Clusters by incorporating existing gene function classifications into the clustering process. This semi-supervised algorithm yields only OP-clusters that are significantly enriched by genes from specific functional categories. Real datasets are often noisy. We propose a noise-tolerant clustering algorithm for mining frequently occuring itemsets. This algorithm is called approximate frequent itemsets (AFI). Both the theoretical and experimental results demonstrate that our AFI mining algorithm has higher recoverability of real clusters than any other existing itemset mining approaches. Pair-wise dissimilarities are often derived from original data to reduce the complexities of high dimensional data. Traditional clustering algorithms taking pair-wise dissimilarities as input often generate disjoint clusters from pair-wise dissimilarities. It is well known that the classification model represented by disjoint clusters is inconsistent with many real classifications, such gene function classifications. We develop a Poclustering algorithm, which generates overlapping clusters from pair-wise dissimilarities. We prove that by allowing overlapping clusters, Poclustering fully preserves the information of any dissimilarity matrices while traditional partitioning algorithms may cause significant information loss
    corecore