389 research outputs found
Low rank methods for optimizing clustering
Complex optimization models and problems in machine learning often have the majority of information in a low rank subspace. By careful exploitation of these low rank structures in clustering problems, we find new optimization approaches that reduce the memory and computational cost.
We discuss two cases where this arises. First, we consider the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective as a way to address overlapping and outliers in an integrated fashion. Optimizing this discrete objective is NP-hard, and even though there is a convex relaxation of the objective, straightforward convex optimization approaches are too expensive for large datasets. We utilize low rank structures in the solution matrix of the convex formulation and use a low-rank factorization of the solution matrix directly as a practical alternative. The resulting optimization problem is non-convex, but has a smaller number of solution variables, and can be locally optimized using an augmented Lagrangian method. In addition, we consider two fast multiplier methods to accelerate the convergence of the augmented Lagrangian scheme: a proximal method of multipliers and an alternating direction method of multipliers. For the proximal augmented Lagrangian, we show a convergence result for the non-convex case with bound-constrained subproblems. When the clustering performance is evaluated on real-world datasets, we show this technique is effective in finding the ground-truth clusters and cohesive overlapping communities in real-world networks.
The second case is where the low-rank structure appears in the objective function. Inspired by low rank matrix completion techniques, we propose a low rank symmetric matrix completion scheme to approximate a kernel matrix. For the kernel k-means problem, we show empirically that the clustering performance with the approximation is comparable to the full kernel k-means
Large-Scale Sensor Network Localization via Rigid Subnetwork Registration
In this paper, we describe an algorithm for sensor network localization (SNL)
that proceeds by dividing the whole network into smaller subnetworks, then
localizes them in parallel using some fast and accurate algorithm, and finally
registers the localized subnetworks in a global coordinate system. We
demonstrate that this divide-and-conquer algorithm can be used to leverage
existing high-precision SNL algorithms to large-scale networks, which could
otherwise only be applied to small-to-medium sized networks. The main
contribution of this paper concerns the final registration phase. In
particular, we consider a least-squares formulation of the registration problem
(both with and without anchor constraints) and demonstrate how this otherwise
non-convex problem can be relaxed into a tractable convex program. We provide
some preliminary simulation results for large-scale SNL demonstrating that the
proposed registration algorithm (together with an accurate localization scheme)
offers a good tradeoff between run time and accuracy.Comment: 5 pages, 8 figures, 1 table. To appear in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing, April 19-24, 201
Recommended from our members
Overlapping community detection in massive social networks
Massive social networks have become increasingly popular in recent years. Community detection is one of the most important techniques for the analysis of such complex networks. A community is a set of cohesive vertices that has more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple communities. In this thesis, we propose scalable overlapping community detection algorithms that effectively identify high quality overlapping communities in various real-world networks.
We first develop an efficient overlapping community detection algorithm using a seed set expansion approach. The key idea of this algorithm is to find good seeds and then greedily expand these seeds using a personalized PageRank clustering scheme. Experimental results show that our algorithm significantly outperforms other state-of-the-art overlapping community detection methods in terms of run time, cohesiveness of communities, and ground-truth accuracy.
To develop more principled methods, we formulate the overlapping community detection problem as a non-exhaustive, overlapping graph clustering problem where clusters are allowed to overlap with each other, and some nodes are allowed to be outside of any cluster. To tackle this non-exhaustive, overlapping clustering problem, we propose a simple and intuitive objective function that captures the issues of overlap and non-exhaustiveness in a unified manner. To optimize the objective, we develop not only fast iterative algorithms but also more sophisticated algorithms using a low-rank semidefinite programming technique. Our experimental results show that the new objective and the algorithms are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness.
We extend our non-exhaustive, overlapping clustering techniques to co-clustering where the goal is to simultaneously identify a clustering of the rows as well as the columns of a data matrix. As an example application, consider recommender systems where users have ratings on items. This can be represented by a bipartite graph where users and items are denoted by two different types of nodes, and the ratings are denoted by weighted edges between the users and the items. In this case, co-clustering would be a simultaneous clustering of users and items. We propose a new co-clustering objective function and an efficient co-clustering algorithm that is able to identify overlapping clusters as well as outliers on both types of the nodes in the bipartite graph. We show that our co-clustering algorithm is able to effectively capture the underlying co-clustering structure of the data, which results in boosting the performance of a standard one-dimensional clustering.
Finally, we study the design of parallel data-driven algorithms, which enables us to further increase the scalability of our overlapping community detection algorithms. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve a significantly superior scalability than standard PageRank implementations. The design choices affect both single-threaded performance as well as parallel scalability. The lessons learned from this study not only guide efficient implementations of many graph mining algorithms but also provide a framework for designing new scalable algorithms, especially for large-scale community detection.Computer Science
Linear Precoding in Cooperative MIMO Cellular Networks with Limited Coordination Clusters
In a cooperative multiple-antenna downlink cellular network, maximization of
a concave function of user rates is considered. A new linear precoding
technique called soft interference nulling (SIN) is proposed, which performs at
least as well as zero-forcing (ZF) beamforming. All base stations share channel
state information, but each user's message is only routed to those that
participate in the user's coordination cluster. SIN precoding is particularly
useful when clusters of limited sizes overlap in the network, in which case
traditional techniques such as dirty paper coding or ZF do not directly apply.
The SIN precoder is computed by solving a sequence of convex optimization
problems. SIN under partial network coordination can outperform ZF under full
network coordination at moderate SNRs. Under overlapping coordination clusters,
SIN precoding achieves considerably higher throughput compared to myopic ZF,
especially when the clusters are large.Comment: 13 pages, 5 figure
- …