Search CORE

153,199 research outputs found

Distributed Clustering and Learning Over Networks

Author: Ali H. Sayed
Student Member
Xiaochuan Zhao
Publication venue
Publication date: 22/09/2014
Field of study

Distributed processing over networks relies on in-network processing and cooperation among neighboring agents. Cooperation is beneficial when agents share a common objective. However, in many applications agents may belong to different clusters that pursue different objectives. Then, indiscriminate cooperation will lead to undesired results. In this work, we propose an adaptive clustering and learning scheme that allows agents to learn which neighbors they should cooperate with and which other neighbors they should ignore. In doing so, the resulting algorithm enables the agents to identify their clusters and to attain improved learning and estimation accuracy over networks. We carry out a detailed mean-square analysis and assess the error probabilities of Types I and II, i.e., false alarm and mis-detection, for the clustering mechanism. Among other results, we establish that these probabilities decay exponentially with the step-sizes so that the probability of correct clustering can be made arbitrarily close to one.Comment: 47 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Distributed graph clustering and sparsification

Author: Sun H
Zanetti L
Publication venue: ACM Transactions on Parallel Computing
Publication date: 30/09/2019
Field of study

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation and cannot be directly applied for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest and has comprehensive applications for processing big datasets. In this article, we present a simple and distributed algorithm for graph clustering: For a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds and recovers a partition of the graph close to optimal. One of the main procedures behind our algorithm is a sampling scheme that, given a dense graph as input, produces a sparse subgraph that provably preserves the cluster-structure of the input. Compared with previous sparsification algorithms that require Laplacian solvers or involve combinatorial constructions, this procedure is easy to implement in a distributed setting and runs fast in practice.</jats:p

Apollo (Cambridge)

Redundancy-Free Self-Supervised Relational Learning for Graph Clustering

Author: Ju Wei
Liu Luchen
Luo Xiao
Qin Yifang
Yi Si-Yu
Zhang Ming
Zhou Yong-Dao
Publication venue
Publication date: 09/09/2023
Field of study

Graph clustering, which learns the node representations for effective cluster assignments, is a fundamental yet challenging task in data analysis and has received considerable attention accompanied by graph neural networks in recent years. However, most existing methods overlook the inherent relational information among the non-independent and non-identically distributed nodes in a graph. Due to the lack of exploration of relational attributes, the semantic information of the graph-structured data fails to be fully exploited which leads to poor clustering performance. In this paper, we propose a novel self-supervised deep graph clustering method named Relational Redundancy-Free Graph Clustering (R

^2

FGC) to tackle the problem. It extracts the attribute- and structure-level relational information from both global and local views based on an autoencoder and a graph autoencoder. To obtain effective representations of the semantic information, we preserve the consistent relation among augmented nodes, whereas the redundant relation is further reduced for learning discriminative embeddings. In addition, a simple yet valid strategy is utilized to alleviate the over-smoothing issue. Extensive experiments are performed on widely used benchmark datasets to validate the superiority of our R

^2

FGC over state-of-the-art baselines. Our codes are available at https://github.com/yisiyu95/R2FGC.Comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS 2024

arXiv.org e-Print Archive

FLCC: Efficient Distributed Federated Learning on IoMT over CSMA/CA

Author: McLernon Des
Qazzaz Mohammed M. H.
Salama Abdelaziz
Zaidi Syed Ali
Publication venue
Publication date: 29/03/2023
Field of study

Federated Learning (FL) has emerged as a promising approach for privacy preservation, allowing sharing of the model parameters between users and the cloud server rather than the raw local data. FL approaches have been adopted as a cornerstone of distributed machine learning (ML) to solve several complex use cases. FL presents an interesting interplay between communication and ML performance when implemented over distributed wireless nodes. Both the dynamics of networking and learning play an important role. In this article, we investigate the performance of FL on an application that might be used to improve a remote healthcare system over ad hoc networks which employ CSMA/CA to schedule its transmissions. Our FL over CSMA/CA (FLCC) model is designed to eliminate untrusted devices and harness frequency reuse and spatial clustering techniques to improve the throughput required for coordinating a distributed implementation of FL in the wireless network. In our proposed model, frequency allocation is performed on the basis of spatial clustering performed using virtual cells. Each cell assigns a FL server and dedicated carrier frequencies to exchange the updated model's parameters within the cell. We present two metrics to evaluate the network performance: 1) probability of successful transmission while minimizing the interference, and 2) performance of distributed FL model in terms of accuracy and loss while considering the networking dynamics. We benchmark the proposed approach using a well-known MNIST dataset for performance evaluation. We demonstrate that the proposed approach outperforms the baseline FL algorithms in terms of explicitly defining the chosen users' criteria and achieving high accuracy in a robust network

arXiv.org e-Print Archive

Distributed and Federated Learning Optimization with Federated Clustering of IID-users

Author: Braun Torsten
Pacheco Lucas
Samikwa Eric
Publication venue
Publication date: 23/04/2021
Field of study

Federated Learning (FL) is one of the leading learning paradigms for enabling a more significant presence of intelligent applications in networked and Internet of Things (IoT) systems. It consists of individual user devices performing machine learning (ML) models training locally, so that only trained models due to privacy concerns, but not raw data, is transferred through the network for aggregation at the edge or cloud data centers [Li et al. 2019]. Due to the pervasive presence of connected devices such as smart phones and IoT devices in peoples lives, there is a growing concern about how we can preserve and secure users’ information. FL reduces the risk of exposing user information to attackers during transmission over networks or information leakages at the central data centers. Another advantage of FL is scalability and maintainability of intelligent applications in networked and IoT systems. Considering highly distributed environments in which such systems are deployed, collecting and transmitting raw user data for training of ML models at central data centers is a challenging task as it imposes huge workload on the networks and consumes high bandwidth. Training of ML models is distributed over locations and transmitting the trained models for aggregation alleviates these challenges. Among others, distributed and federated learning have applications in smart healthcare systems, where very sensitive user data is involved, and industrial IoT applications, where the amount of data for training may be too large and cumbersome to transport to central data centers. However, FL has the significant shortcoming of requiring user data to be Independent Identically Distributed (IID) (i.e., users which have similar data statistical distributions and are not mutually dependent) and make reliable predictions for a given group of users aggregated into a single model. IID users have similar statistical features, and thus can be aggregated into the same ML models. Since raw data is not available at the model aggregator, it is necessary to find IID users based solely on their trained machine learning models. We present a Neural Network-based Federated Clustering mechanism capable of clustering IID with no access to their raw data called Neural-network SIMilarity estimator, NSIM. Such mechanism performs significantly better than competing techniques for neural-network clustering [Pacheco et al. 2021]. We also present an alternative to the FedAvg aggregation algorithm used in traditional FL, which significantly increases the aggregated models’ reliability in terms of Mean Square Error by creating several training models over IID users in a real-world mobility prediction dataset. We observe improvements of up to 97.52% in terms of Pearson correlation between the similarity estimation by NSIM and ground truth based on the LCSS (Longest Common Sub-Sequence) similarity metric, in comparison with other state-of-the-art approaches. Federated Clustering of IID data in different geographical locations can improve performance of early warning applications such as flood prediction [Samikwa et al. 2020], where the data for some locations may have more statistical similarities. We further present a technique for accelerating ML inference in resource-constrained devices through distributed computation of ML models over IoT networks, while preserving privacy. This has the potential to improve the performance of time sensitive ML applications

Bern Open Repository and Information System (BORIS)

Sparse Allreduce: Efficient Scalable Communication for Power-Law Data

Author: Canny John
Zhao Huasha
Publication venue
Publication date: 10/12/2013
Field of study

Many large datasets exhibit power-law statistics: The web graph, social networks, text data, click through data etc. Their adjacency graphs are termed natural graphs, and are known to be difficult to partition. As a consequence most distributed algorithms on these graphs are communication intensive. Many algorithms on natural graphs involve an Allreduce: a sum or average of partitioned data which is then shared back to the cluster nodes. Examples include PageRank, spectral partitioning, and many machine learning algorithms including regression, factor (topic) models, and clustering. In this paper we describe an efficient and scalable Allreduce primitive for power-law data. We point out scaling problems with existing butterfly and round-robin networks for Sparse Allreduce, and show that a hybrid approach improves on both. Furthermore, we show that Sparse Allreduce stages should be nested instead of cascaded (as in the dense case). And that the optimum throughput Allreduce network should be a butterfly of heterogeneous degree where degree decreases with depth into the network. Finally, a simple replication scheme is introduced to deal with node failures. We present experiments showing significant improvements over existing systems such as PowerGraph and Hadoop

arXiv.org e-Print Archive

CiteSeerX

Deep Semantic Clustering by Partition Confidence Maximisation

Author: Gong S
Huang J
IEEE Conference on Computer Vision and Pattern Recognition
Zhu X
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

By simultaneously learning visual features and data grouping, deep clustering has shown impressive ability to deal with unsupervised learning for structure analysis of high-dimensional visual data. Existing deep clustering methods typically rely on local learning constraints based on inter-sample relations and/or self-estimated pseudo labels. This is susceptible to the inevitable errors distributed in the neighbourhoods and suffers from error-propagation during training. In this work, we propose to solve this problem by learning the most confident clustering solution from all the possible separations, based on the observation that assigning samples from the same semantic categories into different clusters will reduce both the intra-cluster compactness and inter-cluster diversity, i.e. lower partition confidence. Specifically, we introduce a novel deep clustering method named PartItion Confidence mAximisation (PICA). It is established on the idea of learning the most semantically plausible data separation, in which all clusters can be mapped to the ground-truth classes one-to-one, by maximising the 'global' partition confidence of clustering solution. This is realised by introducing a differentiable partition uncertainty index and its stochastic approximation as well as a principled objective loss function that minimises such index, all of which together enables a direct adoption of the conventional deep networks and mini-batch based model training. Extensive experiments on six widely-adopted clustering benchmarks demonstrate our model's performance superiority over a wide range of the state-of-the-art approaches. The code is available online

Crossref

Queen Mary Research Online