13 research outputs found

    In-Close, a fast algorithm for computing formal concepts

    Get PDF
    This paper presents an algorithm, called In-Close, that uses incremental closure and matrix searching to quickly compute all formal concepts in a formal context. In-Close is based, conceptually, on a well known algorithm called Close-By-One. The serial version of a recently published algorithm (Krajca, 2008) was shown to be in the order of 100 times faster than several well-known algorithms, and timings of other algorithms in reviews suggest that none of them are faster than Krajca. This paper compares In-Close to Krajca, discussing computational methods, data requirements and memory considerations. From experiments using several public data sets and random data, this paper shows that In-Close is in the order of 20 times faster than Krajca. In-Close is small, straightforward, requires no matrix pre-processing and is simple to implement.</p

    Redescription Mining and Applications in Bioinformatics

    Full text link
    Our ability to interrogate the cell and computationally assimilate its answers is improving at a dramatic pace. For instance, the study of even a focused aspect of cellular activity, such as gene action, now benefits from multiple high-throughput data acquisition technologies such as microarrays, genome-wide deletion screens, and RNAi assays. A critical need is the development of algorithms that can bridge, relate, and unify diverse categories of data descriptors. Redescription mining is such an approach. Given a set of biological objects (e.g., genes, proteins) and a collection of descriptors defined over this set, the goal of redescription mining is to use the given descriptors as a vocabulary and find subsets of data that afford multiple definitions. The premise of redescription mining is that subsets that afford multiple definitions are likely to exhibit concerted behavior and are, hence, interesting. We present algorithms for redescription mining based on formal concept analysis and applications of redescription mining to multiple biological datasets. We demonstrate how redescriptions identify conceptual clusters of data using mutually reinforcing features, without explicit training information.

    Redescription Mining: An Overview.

    Get PDF
    International audienceIn many real-world data analysis tasks, we have different types of data over the same objects or entities, perhaps because the data originate from distinct sources or are based on different terminologies. In order to understand such data, an intuitive approach is to identify thecorrespondences that exist between these different aspects. This isthe motivating principle behind redescription mining, a data analysistask that aims at finding distinct commoncharacterizations of the same objects.This paper provides a short overview of redescription mining; what it is and how it is connected to other data analysis methods; the basic principles behind current algorithms for redescription mining; and examples and applications of redescription mining for real-world data analysis problems

    Compositional Mining of Multi-Relational Biological Datasets

    Get PDF
    High-throughput biological screens are yielding ever-growing streams of information about multiple aspects of cellular activity. As more and more categories of datasets come online, there is a corresponding multitude of ways in which inferences can be chained across them, motivating the need for compositional data mining algorithms. In this paper, we argue that such compositional data mining can be effectively realized by functionally cascading redescription mining and biclustering algorithms as primitives. Both these primitives mirror shifts of vocabulary that can be composed in arbitrary ways to create rich chains of inferences. Given a relational database and its schema, we show how the schema can be automatically compiled into a compositional data mining program, and how different domains in the schema can be related through logical sequences of biclustering and redescription invocations. This feature allows us to rapidly prototype new data mining applications, yielding greater understanding of scientific datasets. We describe two applications of compositional data mining: (i) matching terms across categories of the Gene Ontology and (ii) understanding the molecular mechanisms underlying stress response in human cells
    corecore