3,375 research outputs found
Mining co-regulated gene profiles for the detection of functional associations in gene expression data
Motivation: Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods. Results: We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques. Contact: [email protected] Supplementary information: Supplementary data and an executable demo program of the MAP implementation are freely available at http://www.fgcz.ch/publications/ma
Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining
The notion of meta-mining has appeared recently and extends the traditional
meta-learning in two ways. First it does not learn meta-models that provide
support only for the learning algorithm selection task but ones that support
the whole data-mining process. In addition it abandons the so called black-box
approach to algorithm description followed in meta-learning. Now in addition to
the datasets, algorithms also have descriptors, workflows as well. For the
latter two these descriptions are semantic, describing properties of the
algorithms. With the availability of descriptors both for datasets and data
mining workflows the traditional modelling techniques followed in
meta-learning, typically based on classification and regression algorithms, are
no longer appropriate. Instead we are faced with a problem the nature of which
is much more similar to the problems that appear in recommendation systems. The
most important meta-mining requirements are that suggestions should use only
datasets and workflows descriptors and the cold-start problem, e.g. providing
workflow suggestions for new datasets.
In this paper we take a different view on the meta-mining modelling problem
and treat it as a recommender problem. In order to account for the meta-mining
specificities we derive a novel metric-based-learning recommender approach. Our
method learns two homogeneous metrics, one in the dataset and one in the
workflow space, and a heterogeneous one in the dataset-workflow space. All
learned metrics reflect similarities established from the dataset-workflow
preference matrix. We demonstrate our method on meta-mining over biological
(microarray datasets) problems. The application of our method is not limited to
the meta-mining problem, its formulations is general enough so that it can be
applied on problems with similar requirements
Mining Order-Preserving Submatrices from Data with Repeated Measurements
published_or_final_versio
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets
As advances in technology allow for the collection, storage, and analysis of
vast amounts of data, the task of screening and assessing the significance of
discovered patterns is becoming a major challenge in data mining applications.
In this work, we address significance in the context of frequent itemset
mining. Specifically, we develop a novel methodology to identify a meaningful
support threshold s* for a dataset, such that the number of itemsets with
support at least s* represents a substantial deviation from what would be
expected in a random dataset with the same number of transactions and the same
individual item frequencies. These itemsets can then be flagged as
statistically significant with a small false discovery rate. We present
extensive experimental results to substantiate the effectiveness of our
methodology.Comment: A preliminary version of this work was presented in ACM PODS 2009. 20
pages, 0 figure
- …