1,758 research outputs found
Redescription Mining and Applications in Bioinformatics
Our ability to interrogate the cell and computationally assimilate its answers is improving at a dramatic pace. For instance, the study of even a focused aspect of cellular activity, such as gene action, now benefits from multiple high-throughput data acquisition technologies such as microarrays, genome-wide deletion screens, and RNAi assays. A critical need is the development of algorithms that can bridge, relate, and unify diverse categories of data descriptors. Redescription mining is such an approach. Given a set of biological objects (e.g., genes, proteins) and a collection of descriptors defined over this set, the goal of redescription mining is to use the given descriptors as a vocabulary and find subsets of data that afford multiple definitions. The premise of redescription mining is that subsets that afford multiple definitions are likely to exhibit concerted behavior and are, hence, interesting. We present algorithms for redescription mining based on formal concept analysis and applications of redescription mining to multiple biological datasets. We demonstrate how redescriptions identify conceptual clusters of data using mutually reinforcing features, without explicit training information.
Compositional Mining of Multi-Relational Biological Datasets
High-throughput biological screens are yielding ever-growing streams of
information about multiple aspects of cellular activity. As more and more
categories of datasets come online, there is a corresponding multitude of ways
in which inferences can be chained across them, motivating the need for
compositional data mining algorithms. In this paper, we argue that such
compositional data mining can be effectively realized by functionally cascading
redescription mining and biclustering algorithms as primitives. Both these
primitives mirror shifts of vocabulary that can be composed in arbitrary ways
to create rich chains of inferences. Given a relational database and its
schema, we show how the schema can be automatically compiled into a
compositional data mining program, and how different domains in the schema can
be related through logical sequences of biclustering and redescription
invocations. This feature allows us to rapidly prototype new data mining
applications, yielding greater understanding of scientific datasets. We
describe two applications of compositional data mining: (i) matching terms
across categories of the Gene Ontology and (ii) understanding the molecular
mechanisms underlying stress response in human cells
Redescription Mining: An Overview.
International audienceIn many real-world data analysis tasks, we have different types of data over the same objects or entities, perhaps because the data originate from distinct sources or are based on different terminologies. In order to understand such data, an intuitive approach is to identify thecorrespondences that exist between these different aspects. This isthe motivating principle behind redescription mining, a data analysistask that aims at finding distinct commoncharacterizations of the same objects.This paper provides a short overview of redescription mining; what it is and how it is connected to other data analysis methods; the basic principles behind current algorithms for redescription mining; and examples and applications of redescription mining for real-world data analysis problems
Finding relational redescriptions
We introduce relational redescription mining, that is, the task of finding two structurally different patterns that describe nearly the same set of object pairs in a relational dataset. By extending redescription mining beyond propositional and real-valued attributes, it provides a powerful tool to match different relational descriptions of the same concept.
We propose an alternating scheme for solving this problem. Its core consists of a novel relational query miner that efficiently identifies discriminative connection patterns between pairs of objects. Compared to a baseline Inductive Logic Programming (ILP) approach, our query miner is able to mine more complex queries, much faster. We performed extensive experiments on three real world relational datasets, and present examples of redescriptions found, exhibiting the power of the method to expressively capture relations present in these networks
Representational redescription and cognitive architectures
We focus on Karmiloff-Smith's Representational redescription model, arguing that it poses some problems concerning the architecture of a redescribing system. To discuss the topic, we consider the implicit/explicit dichotomy and the relations between natur al language and the language of thought. We argue that the model regards how knowledge is employed rather than how it is represented in the system
Subjectively Interesting Subgroup Discovery on Real-valued Targets
Deriving insights from high-dimensional data is one of the core problems in
data mining. The difficulty mainly stems from the fact that there are
exponentially many variable combinations to potentially consider, and there are
infinitely many if we consider weighted combinations, even for linear
combinations. Hence, an obvious question is whether we can automate the search
for interesting patterns and visualizations. In this paper, we consider the
setting where a user wants to learn as efficiently as possible about
real-valued attributes. For example, to understand the distribution of crime
rates in different geographic areas in terms of other (numerical, ordinal
and/or categorical) variables that describe the areas. We introduce a method to
find subgroups in the data that are maximally informative (in the formal
Information Theoretic sense) with respect to a single or set of real-valued
target attributes. The subgroup descriptions are in terms of a succinct set of
arbitrarily-typed other attributes. The approach is based on the Subjective
Interestingness framework FORSIED to enable the use of prior knowledge when
finding most informative non-redundant patterns, and hence the method also
supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio
- …