11 research outputs found

    Gravitational Clustering: A Simple, Robust and Adaptive Approach for Distributed Networks

    Full text link
    Distributed signal processing for wireless sensor networks enables that different devices cooperate to solve different signal processing tasks. A crucial first step is to answer the question: who observes what? Recently, several distributed algorithms have been proposed, which frame the signal/object labelling problem in terms of cluster analysis after extracting source-specific features, however, the number of clusters is assumed to be known. We propose a new method called Gravitational Clustering (GC) to adaptively estimate the time-varying number of clusters based on a set of feature vectors. The key idea is to exploit the physical principle of gravitational force between mass units: streaming-in feature vectors are considered as mass units of fixed position in the feature space, around which mobile mass units are injected at each time instant. The cluster enumeration exploits the fact that the highest attraction on the mobile mass units is exerted by regions with a high density of feature vectors, i.e., gravitational clusters. By sharing estimates among neighboring nodes via a diffusion-adaptation scheme, cooperative and distributed cluster enumeration is achieved. Numerical experiments concerning robustness against outliers, convergence and computational complexity are conducted. The application in a distributed cooperative multi-view camera network illustrates the applicability to real-world problems.Comment: 12 pages, 9 figure

    One-class classifiers based on entropic spanning graphs

    Get PDF
    One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the α\alpha-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

    CLUSTERING SEBARAN ALUMNI PROGRAM STUDI SISTEM INFORMASI POLITEKNIK NEGERI NUSA UTARA

    Get PDF
    Teknologi database saat ini memungkinkan untuk menyimpan sejumlah data dalam jumlah yang sangat besar dan terakumulasi namun disinilah awal timbulnya persoalan dengan semakin banyaknya data, seperti pada Program Studi Sistem Informasi Politeknik Negeri Nusa Utara. Oleh sebab itu sangat penting untuk mengetahui sebaran mahasiswa dengan menggunakan tracer alumni, sehingga data yang ada dapat dipakai guna mengelompokkan sebaran mahasiswa berdasarkan kesamaan ciri dari data menggunakan metode k-means clustering. Penelitian ini bertujuan untuk memudahkan menganalisis pengelompokan sebaran mahasiswa. Data tracer tersebut diperoleh data alumni yang ada pada Program Studi Sistem Informasi, berdasarkan data alumni yang ada informasi yang tersembuyi dapat diketahui dengan cara melakukan pengolahan terhadap data tersebut sehingga berguna bagi pihak Program Studi Sistem Informasi. Penelitian ini mengenalisis Tracer Alumni Program Studi Sistem Informasi dari angkatan 2006 sampai dengan angkatan 2015 dengan menggunakan algoritma k-means clustering menggunakan microsoft excel. Attribut yang digunakan adalah domisili, waktu masuk, waktu wisuda dan waktu tunggu kerja. Cluster yang terbentuk adalah dua cluster. Hasil dari penelitian ini digunakan sebagai salah satu dasar pengambilan keputusan untuk menentukan strategi promosi berdasarkan cluster yang terbentuk oleh pihak Program Studi Sistem Informasi Politeknik Negeri Nusa Utara.   In this era database technology makes it possible to store a large amount of data and accumulates, but this is where the beginning of the problem arises with the increasing number of data, such as in Polytechnic State of Nusa Information System Study Program. That’s way it I was very important to know the distribution of students use alumni tracers, so the available data can be used to classify student distribution based on the similarity of features of the data using the K-means Clustering Method. This study aims make easy analyze distribution of student distribution. Tracer data is obtained alumni data in the Information Systems Study Program, based on existing alumnus, and the hidden data about information can be known by processing it so it is useful for the Information Systems Study Program. This research introduces Information Systems Study Program Alumni Tracer from class of 2006 to the class of 2015 used the K-Means Clustering Algorithm by used Microsoft Excel. The attributes were used domicile, time of entry, graduation time and waiting time for work. Clusters were formed two clusters. The results of this study were used as a basic to made decision to determine promotion strategies based on clusters formed by the Polytechic State of Nusa Utara in Information System Study Program

    Protection of big data privacy

    Full text link
    In recent years, big data have become a hot research topic. The increasing amount of big data also increases the chance of breaching the privacy of individuals. Since big data require high computational power and large storage, distributed systems are used. As multiple parties are involved in these systems, the risk of privacy violation is increased. There have been a number of privacy-preserving mechanisms developed for privacy protection at different stages (e.g., data generation, data storage, and data processing) of a big data life cycle. The goal of this paper is to provide a comprehensive overview of the privacy preservation mechanisms in big data and present the challenges for existing mechanisms. In particular, in this paper, we illustrate the infrastructure of big data and the state-of-the-art privacy-preserving mechanisms in each stage of the big data life cycle. Furthermore, we discuss the challenges and future research directions related to privacy preservation in big data

    GDCluster: a general decentralized clustering algorithm

    Get PDF
    In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed data sets. Nodes continuously cooperate through decentralized gossip-based communication to maintain summarized views of the data set. We customize GDCluster for execution of the partition-based and density-based clustering methods on the summarized views, and also offer enhancements to the basic algorithm. Coping with dynamic data is made possible by gradually adapting the clustering model. Our experimental evaluations show that GDCluster can discover the clusters efficiently with scalable transmission cost, and also expose its supremacy in comparison to the popular method LSP2P

    Energy-Efficient and Fresh Data Collection in IoT Networks by Machine Learning

    Get PDF
    The Internet-of-Things (IoT) is rapidly changing our lives in almost every field, such as smart agriculture, environmental monitoring, intelligent manufacturing system, etc. How to improve the efficiency of data collection in IoT networks has attracted increasing attention. Clustering-based algorithms are the most common methods used to improve the efficiency of data collection. They group devices into distinct clusters, where each device belongs to one cluster only. All member devices sense their surrounding environment and transmit the results to the cluster heads (CHs). The CHs then send the received data to a control center via single-hop or multi-hops transmission. Using unmanned aerial vehicles (UAVs) to collect data in IoT networks is another effective method for improving the efficiency of data collection. This is because UAVs can be flexibly deployed to communicate with ground devices via reliable air-to-ground communication links. Given that energy-efficient data collection and freshness of the collected data are two important factors in IoT networks, this thesis is concerned with designing algorithms to improve the energy efficiency of data collection and guarantee the freshness of the collected data. Our first contribution is an improved soft-k-means (IS-k-means) clustering algorithm that balances the energy consumption of nodes in wireless sensor networks (WSNs). The techniques of “clustering by fast search and find of density peaks” (CFSFDP) and kernel density estimation (KDE) are used to improve the selection of the initial cluster centers of the soft k-means clustering algorithm. Then, we utilize the flexibility of the soft-k-means and reassign member nodes by considering their membership probabilities at the boundary of clusters to balance the number of nodes per cluster. Furthermore, we use multi-CHs to balance the energy consumption within clusters. Extensive simulation results show that, on average, the proposed algorithm can postpone the first node death, the half of nodes death, and the last node death when compared to various clustering algorithms from the literature. The second contribution tackles the problem of minimizing the total energy consumption of the UAV-IoT network. Specifically, we formulate and solve the optimization problem that jointly finds the UAV’s trajectory and selects CHs in the IoT network. The formulated problem is a constrained combinatorial optimization and we develop a novel deep reinforcement learning (DRL) with a sequential model strategy to solve it. The proposed method can effectively learn the policy represented by a sequence-to-sequence neural network for designing the UAV’s trajectory in an unsupervised manner. Extensive simulation results show that the proposed DRL method can find the UAV’s trajectory with much less energy consumption when compared to other baseline algorithms and achieves close-to-optimal performance. In addition, simulation results show that the model trained by our proposed DRL algorithm has an excellent generalization ability, i.e., it can be used for larger-size problems without the need to retrain the model. The third contribution is also concerned with minimizing the total energy consumption of the UAV-aided IoT networks. A novel DRL technique, namely the pointer network-A* (Ptr-A*), is proposed, which can efficiently learn the UAV trajectory policy for minimizing the energy consumption. The UAV’s start point and the ground network with a set of pre-determined clusters are fed to the Ptr-A*, and the Ptr-A* outputs a group of CHs and the visiting order of CHs, i.e., the UAV’s trajectory. The parameters of the Ptr-A* are trained on problem instances having small-scale clusters by using the actor-critic algorithm in an unsupervised manner. Simulation results show that the models trained based on 20- clusters and 40-clusters have a good generalization ability to solve the UAV’s trajectory planning problem with different numbers of clusters, without the need to retrain the models. Furthermore, the results show that our proposed DRL algorithm outperforms two baseline techniques. In the last contribution, the new concept, age-of-information (AoI), is used to quantify the freshness of collected data in IoT networks. An optimization problem is formulated to minimize the total AoI of the collected data by the UAV from the ground IoT network. Since the total AoI of the IoT network depends on the flight time of the UAV and the data collection time at hovering points, we jointly optimize the selection of the hovering points and the visiting order to these points. We exploit the state-of-the-art transformer and the weighted A* to design a machine learning algorithm to solve the formulated problem. The whole UAV-IoT system, including all ground clusters and potential hovering points of the UAV, is fed to the encoder network of the proposed algorithm, and the algorithm’s decoder network outputs the visiting order to ground clusters. Then, the weighted A* is used to find the hovering point for each cluster in the ground IoT network. Simulation results show that the model trained by the proposed algorithm has a good generalization ability to generate solutions for IoT networks with different numbers of ground clusters, without the need to retrain the model. Furthermore, results show that our proposed algorithm can find better UAV trajectories with the minimum total AoI when compared to other algorithms

    Distributed information-theoretic clustering

    No full text
    International audienceAbstract We study a novel multi-terminal source coding setup motivated by the biclustering problem. Two separate encoders observe two i.i.d. sequences XnX^n and YnY^n, respectively. The goal is to find rate-limited encodings f(xn)f(x^n) and g(zn)g(z^n) that maximize the mutual information I( f(Xn);g(Yn))/n\textrm{I}(\,{f(X^n)};{g(Y^n)})/n. We discuss connections of this problem with hypothesis testing against independence, pattern recognition and the information bottleneck method. Improving previous cardinality bounds for the inner and outer bounds allows us to thoroughly study the special case of a binary symmetric source and to quantify the gap between the inner and the outer bound in this special case. Furthermore, we investigate a multiple description (MD) extension of the CEO problem with mutual information constraint. Surprisingly, this MD-CEO problem permits a tight single-letter characterization of the achievable region

    Distributed Information Theoretic Clustering

    No full text
    corecore