583 research outputs found
Clones in Graphs
Finding structural similarities in graph data, like social networks, is a
far-ranging task in data mining and knowledge discovery. A (conceptually)
simple reduction would be to compute the automorphism group of a graph.
However, this approach is ineffective in data mining since real world data does
not exhibit enough structural regularity. Here we step in with a novel approach
based on mappings that preserve the maximal cliques. For this we exploit the
well known correspondence between bipartite graphs and the data structure
formal context from Formal Concept Analysis. From there we utilize
the notion of clone items. The investigation of these is still an open problem
to which we add new insights with this work. Furthermore, we produce a
substantial experimental investigation of real world data. We conclude with
demonstrating the generalization of clone items to permutations.Comment: 11 pages, 2 figures, 1 tabl
Knowledge-based gene expression classification via matrix factorization
Motivation: Modern machine learning methods based on matrix decomposition techniques, like independent component analysis (ICA) or non-negative matrix factorization (NMF), provide new and efficient analysis tools which are currently explored to analyze gene expression profiles. These exploratory feature extraction techniques yield expression modes (ICA) or metagenes (NMF). These extracted features are considered indicative of underlying regulatory processes. They can as well be applied to the classification of gene expression datasets by grouping samples into different categories for diagnostic purposes or group genes into functional categories for further investigation of related metabolic pathways and regulatory networks.
Results: In this study we focus on unsupervised matrix factorization techniques and apply ICA and sparse NMF to microarray datasets. The latter monitor the gene expression levels of human peripheral blood cells during differentiation from monocytes to macrophages. We show that these tools are able to identify relevant signatures in the deduced component matrices and extract informative sets of marker genes from these gene expression profiles. The methods rely on the joint discriminative power of a set of marker genes rather than on single marker genes. With these sets of marker genes, corroborated by leave-one-out or random forest cross-validation, the datasets could easily be classified into related diagnostic categories. The latter correspond to either monocytes versus macrophages or healthy vs Niemann Pick C disease patients.Siemens AG, MunichDFG (Graduate College 638)DAAD (PPP Luso - Alem˜a and PPP Hispano - Alemanas
Templates for Convex Cone Problems with Applications to Sparse Signal Recovery
This paper develops a general framework for solving a variety of convex cone
problems that frequently arise in signal processing, machine learning,
statistics, and other fields. The approach works as follows: first, determine a
conic formulation of the problem; second, determine its dual; third, apply
smoothing; and fourth, solve using an optimal first-order method. A merit of
this approach is its flexibility: for example, all compressed sensing problems
can be solved via this approach. These include models with objective
functionals such as the total-variation norm, ||Wx||_1 where W is arbitrary, or
a combination thereof. In addition, the paper also introduces a number of
technical contributions such as a novel continuation scheme, a novel approach
for controlling the step size, and some new results showing that the smooth and
unsmoothed problems are sometimes formally equivalent. Combined with our
framework, these lead to novel, stable and computationally efficient
algorithms. For instance, our general implementation is competitive with
state-of-the-art methods for solving intensively studied problems such as the
LASSO. Further, numerical experiments show that one can solve the Dantzig
selector problem, for which no efficient large-scale solvers exist, in a few
hundred iterations. Finally, the paper is accompanied with a software release.
This software is not a single, monolithic solver; rather, it is a suite of
programs and routines designed to serve as building blocks for constructing
complete algorithms.Comment: The TFOCS software is available at http://tfocs.stanford.edu This
version has updated reference
From error bounds to the complexity of first-order descent methods for convex functions
This paper shows that error bounds can be used as effective tools for
deriving complexity results for first-order descent methods in convex
minimization. In a first stage, this objective led us to revisit the interplay
between error bounds and the Kurdyka-\L ojasiewicz (KL) inequality. One can
show the equivalence between the two concepts for convex functions having a
moderately flat profile near the set of minimizers (as those of functions with
H\"olderian growth). A counterexample shows that the equivalence is no longer
true for extremely flat functions. This fact reveals the relevance of an
approach based on KL inequality. In a second stage, we show how KL inequalities
can in turn be employed to compute new complexity bounds for a wealth of
descent methods for convex problems. Our approach is completely original and
makes use of a one-dimensional worst-case proximal sequence in the spirit of
the famous majorant method of Kantorovich. Our result applies to a very simple
abstract scheme that covers a wide class of descent methods. As a byproduct of
our study, we also provide new results for the globalization of KL inequalities
in the convex framework.
Our main results inaugurate a simple methodology: derive an error bound,
compute the desingularizing function whenever possible, identify essential
constants in the descent method and finally compute the complexity using the
one-dimensional worst case proximal sequence. Our method is illustrated through
projection methods for feasibility problems, and through the famous iterative
shrinkage thresholding algorithm (ISTA), for which we show that the complexity
bound is of the form where the constituents of the bound only depend
on error bound constants obtained for an arbitrary least squares objective with
regularization
Ants Constructing Rule-Based Classifiers
Book series: Studies in Computational Intelligencestatus: publishe
Clustering problems in optimization models
We discuss a variety of clustering problems arising in combinatorial applications and in classifying objects into homogenous groups. For each problem we discuss solution strategies that work well in practice. We also discuss the importance of careful modelling in clustering problems.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/44350/1/10614_2004_Article_BF00121636.pd
- …