677 research outputs found
Linear Time Feature Selection for Regularized Least-Squares
We propose a novel algorithm for greedy forward feature selection for
regularized least-squares (RLS) regression and classification, also known as
the least-squares support vector machine or ridge regression. The algorithm,
which we call greedy RLS, starts from the empty feature set, and on each
iteration adds the feature whose addition provides the best leave-one-out
cross-validation performance. Our method is considerably faster than the
previously proposed ones, since its time complexity is linear in the number of
training examples, the number of features in the original data set, and the
desired size of the set of selected features. Therefore, as a side effect we
obtain a new training algorithm for learning sparse linear RLS predictors which
can be used for large scale learning. This speed is possible due to matrix
calculus based short-cuts for leave-one-out and feature addition. We
experimentally demonstrate the scalability of our algorithm and its ability to
find good quality feature sets.Comment: 17 pages, 15 figure
Partitioning networks into cliques: a randomized heuristic approach
In the context of community detection in social networks, the term community can be grounded in the strict way that simply everybody should know each other within the community. We consider the corresponding community detection problem. We search for a partitioning of a network into the minimum number of non-overlapping cliques, such that the cliques cover all vertices. This problem is called the clique covering problem (CCP) and is one of the classical NP-hard problems. For CCP, we propose a randomized heuristic approach. To construct a high quality solution to CCP, we present an iterated greedy (IG) algorithm. IG can also be combined with a heuristic used to determine how far the algorithm is from the optimum in the worst case. Randomized local search (RLS) for maximum independent set was proposed to find such a bound. The experimental results of IG and the bounds obtained by RLS indicate that IG is a very suitable technique for solving CCP in real-world graphs. In addition, we summarize our basic rigorous results, which were developed for analysis of IG and understanding of its behavior on several relevant graph classes
On combinatorial optimisation in analysis of protein-protein interaction and protein folding networks
Abstract: Protein-protein interaction networks and protein folding networks represent prominent research topics at the intersection of bioinformatics and network science. In this paper, we present a study of these networks from combinatorial optimisation point of view. Using a combination of classical heuristics and stochastic optimisation techniques, we were able to identify several interesting combinatorial properties of biological networks of the COSIN project. We obtained optimal or near-optimal solutions to maximum clique and chromatic number problems for these networks. We also explore patterns of both non-overlapping and overlapping cliques in these networks. Optimal or near-optimal solutions to partitioning of these networks into non-overlapping cliques and to maximum independent set problem were discovered. Maximal cliques are explored by enumerative techniques. Domination in these networks is briefly studied, too. Applications and extensions of our findings are discussed
Where Graph Topology Matters: The Robust Subgraph Problem
Robustness is a critical measure of the resilience of large networked
systems, such as transportation and communication networks. Most prior works
focus on the global robustness of a given graph at large, e.g., by measuring
its overall vulnerability to external attacks or random failures. In this
paper, we turn attention to local robustness and pose a novel problem in the
lines of subgraph mining: given a large graph, how can we find its most robust
local subgraph (RLS)?
We define a robust subgraph as a subset of nodes with high communicability
among them, and formulate the RLS-PROBLEM of finding a subgraph of given size
with maximum robustness in the host graph. Our formulation is related to the
recently proposed general framework for the densest subgraph problem, however
differs from it substantially in that besides the number of edges in the
subgraph, robustness also concerns with the placement of edges, i.e., the
subgraph topology. We show that the RLS-PROBLEM is NP-hard and propose two
heuristic algorithms based on top-down and bottom-up search strategies.
Further, we present modifications of our algorithms to handle three practical
variants of the RLS-PROBLEM. Experiments on synthetic and real-world graphs
demonstrate that we find subgraphs with larger robustness than the densest
subgraphs even at lower densities, suggesting that the existing approaches are
not suitable for the new problem setting.Comment: 13 pages, 10 Figures, 3 Tables, to appear at SDM 2015 (9 pages only
Scalable Greedy Algorithms for Transfer Learning
In this paper we consider the binary transfer learning problem, focusing on
how to select and combine sources from a large pool to yield a good performance
on a target task. Constraining our scenario to real world, we do not assume the
direct access to the source data, but rather we employ the source hypotheses
trained from them. We propose an efficient algorithm that selects relevant
source hypotheses and feature dimensions simultaneously, building on the
literature on the best subset selection problem. Our algorithm achieves
state-of-the-art results on three computer vision datasets, substantially
outperforming both transfer learning and popular feature selection baselines in
a small-sample setting. We also present a randomized variant that achieves the
same results with the computational cost independent from the number of source
hypotheses and feature dimensions. Also, we theoretically prove that, under
reasonable assumptions on the source hypotheses, our algorithm can learn
effectively from few examples
- …