Search CORE

11 research outputs found

Gravitational Clustering: A Simple, Robust and Adaptive Approach for Distributed Networks

Author: Binder Patricia
Muma Michael
Zoubir Abdelhak M.
Publication venue
Publication date: 31/08/2017
Field of study

Distributed signal processing for wireless sensor networks enables that different devices cooperate to solve different signal processing tasks. A crucial first step is to answer the question: who observes what? Recently, several distributed algorithms have been proposed, which frame the signal/object labelling problem in terms of cluster analysis after extracting source-specific features, however, the number of clusters is assumed to be known. We propose a new method called Gravitational Clustering (GC) to adaptively estimate the time-varying number of clusters based on a set of feature vectors. The key idea is to exploit the physical principle of gravitational force between mass units: streaming-in feature vectors are considered as mass units of fixed position in the feature space, around which mobile mass units are injected at each time instant. The cluster enumeration exploits the fact that the highest attraction on the mobile mass units is exerted by regions with a high density of feature vectors, i.e., gravitational clusters. By sharing estimates among neighboring nodes via a diffusion-adaptation scheme, cooperative and distributed cluster enumeration is achieved. Numerical experiments concerning robustness against outliers, convergence and computational complexity are conducted. The application in a distributed cooperative multi-view camera network illustrates the applicability to real-world problems.Comment: 12 pages, 9 figure

arXiv.org e-Print Archive

TUbiblio

One-class classifiers based on entropic spanning graphs

Author: Alippi Cesare
Livi Lorenzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/08/2016
Field of study

One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the

\alpha

-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Open Research Exeter

CLUSTERING SEBARAN ALUMNI PROGRAM STUDI SISTEM INFORMASI POLITEKNIK NEGERI NUSA UTARA

Author: Lumasuge Oktavianus
Sehang Steve Arthur
Sinsu Noldy
Publication venue: Pusat Penelitian dan Pengabdian kepada Masyarakat, POLITEKNIK NEGERI NUSA UTARA
Publication date: 03/03/2020
Field of study

Teknologi database saat ini memungkinkan untuk menyimpan sejumlah data dalam jumlah yang sangat besar dan terakumulasi namun disinilah awal timbulnya persoalan dengan semakin banyaknya data, seperti pada Program Studi Sistem Informasi Politeknik Negeri Nusa Utara. Oleh sebab itu sangat penting untuk mengetahui sebaran mahasiswa dengan menggunakan tracer alumni, sehingga data yang ada dapat dipakai guna mengelompokkan sebaran mahasiswa berdasarkan kesamaan ciri dari data menggunakan metode k-means clustering. Penelitian ini bertujuan untuk memudahkan menganalisis pengelompokan sebaran mahasiswa. Data tracer tersebut diperoleh data alumni yang ada pada Program Studi Sistem Informasi, berdasarkan data alumni yang ada informasi yang tersembuyi dapat diketahui dengan cara melakukan pengolahan terhadap data tersebut sehingga berguna bagi pihak Program Studi Sistem Informasi. Penelitian ini mengenalisis Tracer Alumni Program Studi Sistem Informasi dari angkatan 2006 sampai dengan angkatan 2015 dengan menggunakan algoritma k-means clustering menggunakan microsoft excel. Attribut yang digunakan adalah domisili, waktu masuk, waktu wisuda dan waktu tunggu kerja. Cluster yang terbentuk adalah dua cluster. Hasil dari penelitian ini digunakan sebagai salah satu dasar pengambilan keputusan untuk menentukan strategi promosi berdasarkan cluster yang terbentuk oleh pihak Program Studi Sistem Informasi Politeknik Negeri Nusa Utara.   In this era database technology makes it possible to store a large amount of data and accumulates, but this is where the beginning of the problem arises with the increasing number of data, such as in Polytechnic State of Nusa Information System Study Program. That’s way it I was very important to know the distribution of students use alumni tracers, so the available data can be used to classify student distribution based on the similarity of features of the data using the K-means Clustering Method. This study aims make easy analyze distribution of student distribution. Tracer data is obtained alumni data in the Information Systems Study Program, based on existing alumnus, and the hidden data about information can be known by processing it so it is useful for the Information Systems Study Program. This research introduces Information Systems Study Program Alumni Tracer from class of 2006 to the class of 2015 used the K-Means Clustering Algorithm by used Microsoft Excel. The attributes were used domicile, time of entry, graduation time and waiting time for work. Clusters were formed two clusters. The results of this study were used as a basic to made decision to determine promotion strategies based on clusters formed by the Polytechic State of Nusa Utara in Information System Study Program

e-jurnal Politeknik Negeri Nusa Utara

Protection of big data privacy

Author: Guo Song
Hua Guang
Mehmood Abid
Natgunanathan Iynkaran
Xiang Yong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In recent years, big data have become a hot research topic. The increasing amount of big data also increases the chance of breaching the privacy of individuals. Since big data require high computational power and large storage, distributed systems are used. As multiple parties are involved in these systems, the risk of privacy violation is increased. There have been a number of privacy-preserving mechanisms developed for privacy protection at different stages (e.g., data generation, data storage, and data processing) of a big data life cycle. The goal of this paper is to provide a comprehensive overview of the privacy preservation mechanisms in big data and present the challenges for existing mechanisms. In particular, in this paper, we illustrate the infrastructure of big data and the state-of-the-art privacy-preserving mechanisms in each stage of the big data life cycle. Furthermore, we discuss the challenges and future research directions related to privacy preservation in big data

Deakin Research Online

GDCluster: a general decentralized clustering algorithm

Author: Habibi Jafar
Khalafbeigi Tania
Mashayekhi Hoda
Steen Maarten van
Voulgaris Spyros
Publication venue: IEEE Computer Society
Publication date: 01/01/2015
Field of study

In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P

VU Research Portal

Crossref

University of Twente Research Information

Energy-Efficient and Fresh Data Collection in IoT Networks by Machine Learning

Author: Zhu Botao
Publication venue: 'University of Saskatchewan Library'
Publication date: 29/08/2022
Field of study

The Internet-of-Things (IoT) is rapidly changing our lives in almost every field, such as smart agriculture, environmental monitoring, intelligent manufacturing system, etc. How to improve the efficiency of data collection in IoT networks has attracted increasing attention. Clustering-based algorithms are the most common methods used to improve the efficiency of data collection. They group devices into distinct clusters, where each device belongs to one cluster only. All member devices sense their surrounding environment and transmit the results to the cluster heads (CHs). The CHs then send the received data to a control center via single-hop or multi-hops transmission. Using unmanned aerial vehicles (UAVs) to collect data in IoT networks is another effective method for improving the efficiency of data collection. This is because UAVs can be flexibly deployed to communicate with ground devices via reliable air-to-ground communication links. Given that energy-efficient data collection and freshness of the collected data are two important factors in IoT networks, this thesis is concerned with designing algorithms to improve the energy efficiency of data collection and guarantee the freshness of the collected data. Our first contribution is an improved soft-k-means (IS-k-means) clustering algorithm that balances the energy consumption of nodes in wireless sensor networks (WSNs). The techniques of “clustering by fast search and find of density peaks” (CFSFDP) and kernel density estimation (KDE) are used to improve the selection of the initial cluster centers of the soft k-means clustering algorithm. Then, we utilize the flexibility of the soft-k-means and reassign member nodes by considering their membership probabilities at the boundary of clusters to balance the number of nodes per cluster. Furthermore, we use multi-CHs to balance the energy consumption within clusters. Extensive simulation results show that, on average, the proposed algorithm can postpone the first node death, the half of nodes death, and the last node death when compared to various clustering algorithms from the literature. The second contribution tackles the problem of minimizing the total energy consumption of the UAV-IoT network. Specifically, we formulate and solve the optimization problem that jointly finds the UAV’s trajectory and selects CHs in the IoT network. The formulated problem is a constrained combinatorial optimization and we develop a novel deep reinforcement learning (DRL) with a sequential model strategy to solve it. The proposed method can effectively learn the policy represented by a sequence-to-sequence neural network for designing the UAV’s trajectory in an unsupervised manner. Extensive simulation results show that the proposed DRL method can find the UAV’s trajectory with much less energy consumption when compared to other baseline algorithms and achieves close-to-optimal performance. In addition, simulation results show that the model trained by our proposed DRL algorithm has an excellent generalization ability, i.e., it can be used for larger-size problems without the need to retrain the model. The third contribution is also concerned with minimizing the total energy consumption of the UAV-aided IoT networks. A novel DRL technique, namely the pointer network-A* (Ptr-A*), is proposed, which can efficiently learn the UAV trajectory policy for minimizing the energy consumption. The UAV’s start point and the ground network with a set of pre-determined clusters are fed to the Ptr-A*, and the Ptr-A* outputs a group of CHs and the visiting order of CHs, i.e., the UAV’s trajectory. The parameters of the Ptr-A* are trained on problem instances having small-scale clusters by using the actor-critic algorithm in an unsupervised manner. Simulation results show that the models trained based on 20- clusters and 40-clusters have a good generalization ability to solve the UAV’s trajectory planning problem with different numbers of clusters, without the need to retrain the models. Furthermore, the results show that our proposed DRL algorithm outperforms two baseline techniques. In the last contribution, the new concept, age-of-information (AoI), is used to quantify the freshness of collected data in IoT networks. An optimization problem is formulated to minimize the total AoI of the collected data by the UAV from the ground IoT network. Since the total AoI of the IoT network depends on the flight time of the UAV and the data collection time at hovering points, we jointly optimize the selection of the hovering points and the visiting order to these points. We exploit the state-of-the-art transformer and the weighted A* to design a machine learning algorithm to solve the formulated problem. The whole UAV-IoT system, including all ground clusters and potential hovering points of the UAV, is fed to the encoder network of the proposed algorithm, and the algorithm’s decoder network outputs the visiting order to ground clusters. Then, the weighted A* is used to find the hovering point for each cluster in the ground IoT network. Simulation results show that the model trained by the proposed algorithm has a good generalization ability to generate solutions for IoT networks with different numbers of ground clusters, without the need to retrain the model. Furthermore, results show that our proposed algorithm can find better UAV trajectories with the minimum total AoI when compared to other algorithms

University of Saskatchewan Research Archive

Distributed information-theoretic clustering

Author: Matz Gerald
Piantanida Pablo
Pichler Georg
Publication venue: Oxford University Press (OUP)
Publication date: 24/11/2021
Field of study

International audienceAbstract We study a novel multi-terminal source coding setup motivated by the biclustering problem. Two separate encoders observe two i.i.d. sequences

X^n

and

Y^n

, respectively. The goal is to find rate-limited encodings

f(x^n)

and

g(z^n)

that maximize the mutual information

\textrm{I}(\,{f(X^n)};{g(Y^n)})/n

. We discuss connections of this problem with hypothesis testing against independence, pattern recognition and the information bottleneck method. Improving previous cardinality bounds for the inner and outer bounds allows us to thoroughly study the special case of a binary symmetric source and to quantify the gap between the inner and the outer bound in this special case. Furthermore, we investigate a multiple description (MD) extension of the CEO problem with mutual information constraint. Surprisingly, this MD-CEO problem permits a tight single-letter characterization of the achievable region

HAL-CentraleSupelec

arXiv.org e-Print Archive

Distributed Information Theoretic Clustering

Author: Chunguang Li
Pengcheng Shen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref