5 research outputs found

    Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework

    Get PDF
    While many existing formal concept analysis algorithms are efficient, they are typically unsuitable for distributed implementation. Taking the MapReduce (MR) framework as our inspiration we introduce a distributed approach for performing formal concept mining. Our method has its novelty in that we use a light-weight MapReduce runtime called Twister which is better suited to iterative algorithms than recent distributed approaches. First, we describe the theoretical foundations underpinning our distributed formal concept analysis approach. Second, we provide a representative exemplar of how a classic centralized algorithm can be implemented in a distributed fashion using our methodology: we modify Ganter's classic algorithm by introducing a family of MR* algorithms, namely MRGanter and MRGanter+ where the prefix denotes the algorithm's lineage. To evaluate the factors that impact distributed algorithm performance, we compare our MR* algorithms with the state-of-the-art. Experiments conducted on real datasets demonstrate that MRGanter+ is efficient, scalable and an appealing algorithm for distributed problems.Comment: 17 pages, ICFCA 201, Formal Concept Analysis 201

    Mining Biclusters of Similar Values with Triadic Concept Analysis

    Get PDF
    Biclustering numerical data became a popular data-mining task in the beginning of 2000's, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non redundant enumeration of such patterns, which is a well-known intractable problem, while no formal framework exists. In this paper, we introduce important links between biclustering and formal concept analysis. More specifically, we originally show that Triadic Concept Analysis (TCA), provides a nice mathematical framework for biclustering. Interestingly, existing algorithms of TCA, that usually apply on binary data, can be used (directly or with slight modifications) after a preprocessing step for extracting maximal biclusters of similar values.Comment: Concept Lattices and their Applications (CLA) (2011

    Parallelization of formal concept analysis algorithms

    Get PDF
    Formal Concept Analysis provides the mathematical notations for representing concepts and concept hierarchies making use of order and lattice theory. This has now been used in numerous applications which include software engineering, linguistics, sociology, information sciences, information technology, genetics, biology and in engineering. The algorithms derived from Kustenskov's CbO were found to provide the most efficient means of computing formal concepts in several research papers. In this thesis key enhancements to the original CbO algorithms are discussed in detail. The effects of these key features are presented in both isolation and combination. Eight different variations of the CbO algorithms highlighting the key features were compared in a level playing field by presenting them using the same notation and implementing them from the notation in the same way. The three main enhancements considered are the partial closure with incremental closure of intents, inherited canonicity test failures and using a combined depth first and breadth first search. The algorithms were implemented in an un-optimized way to focus on the comparison on the algorithms themselves and not on any efficiencies provided by optimizing code. One of the findings were that there is a significant performance improvement when partial closure with incremental closure of intents is used in isolation. However there is no significant performance improvement when the combined depth and breadth first search or the inherited canonicity test failure feature is used in isolation. The inherited canonicity test failure needs to be combined with the combined depth and breadth first feature to obtain a performance increase. Combining all the three enhancements brought the best performance. The main contribution of the thesis are the four new parallel In-Close3 algorithms. The shared memory algorithms Direct Parallel In-Close3, the Queue Parallel In-Close3 algorithm and the Distributed Memory In-Close3 algorithm showed significant potential. The shared memory algorithms were implemented using OpenMP and the distributed memory algorithm was implemented using MPI. All implementations were validated and showed scalability. Experiments were carried to test the features of the parallel algorithms and their implementations using the UK National Super Computer Archer and Colfax Clusters. The thesis presents the key parallelization strategies used and presents experimental results of the parallelization
    corecore