13,889 research outputs found
Hybrid Cloud-Based Privacy Preserving Clustering as Service for Enterprise Big Data
Clustering as service is being offered by many cloud service providers. It helps enterprises to learn hidden patterns and learn knowledge from large, big data generated by enterprises. Though it brings lot of value to enterprises, it also exposes the data to various security and privacy threats. Privacy preserving clustering is being proposed a solution to address this problem. But the privacy preserving clustering as outsourced service model involves too much overhead on querying user, lacks adaptivity to incremental data and involves frequent interaction between service provider and the querying user. There is also a lack of personalization to clustering by the querying user. This work “Locality Sensitive Hashing for Transformed Dataset (LSHTD)” proposes a hybrid cloud-based clustering as service model for streaming data that address the problems in the existing model such as privacy preserving k-means clustering outsourcing under multiple keys (PPCOM) and secure nearest neighbor clustering (SNNC) models, The solution combines hybrid cloud, LSHTD clustering algorithm as outsourced service model. Through experiments, the proposed solution is able is found to reduce the computation cost by 23% and communication cost by 6% and able to provide better clustering accuracy with ARI greater than 4.59% compared to existing works
Massively-Parallel Heat Map Sorting and Applications To Explainable Clustering
Given a set of points labeled with labels, we introduce the heat map
sorting problem as reordering and merging the points and dimensions while
preserving the clusters (labels). A cluster is preserved if it remains
connected, i.e., if it is not split into several clusters and no two clusters
are merged.
We prove the problem is NP-hard and we give a fixed-parameter algorithm with
a constant number of rounds in the massively parallel computation model, where
each machine has a sublinear memory and the total memory of the machines is
linear. We give an approximation algorithm for a NP-hard special case of the
problem. We empirically compare our algorithm with k-means and density-based
clustering (DBSCAN) using a dimensionality reduction via locality-sensitive
hashing on several directed and undirected graphs of email and computer
networks
Recommended from our members
Effectiveness of landmark analysis for establishing locality in p2p networks
Locality to other nodes on a peer-to-peer overlay network can be established by means of a set of landmarks shared among the participating nodes. Each node independently collects a set of latency measures to landmark nodes, which are used as a multi-dimensional feature vector. Each peer node uses the feature vector to generate a unique scalar index which is correlated to its topological locality. A popular dimensionality reduction technique is the space filling Hilbert’s curve, as it possesses good locality
preserving properties. However, there exists little comparison between Hilbert’s curve and other techniques for dimensionality reduction. This work carries out a quantitative analysis of their properties. Linear and non-linear techniques for scaling the landmark vectors to a single dimension are investigated. Hilbert’s curve, Sammon’s mapping and Principal Component Analysis
have been used to generate a 1d space with locality preserving properties. This work provides empirical evidence to support the use of Hilbert’s curve in the context of locality preservation when generating peer identifiers by means of landmark vector analysis. A comparative analysis is carried out with an artificial 2d network model and with a realistic network topology model
with a typical power-law distribution of node connectivity in the Internet. Nearest neighbour analysis confirms Hilbert’s curve to be very effective in both artificial and realistic network topologies. Nevertheless, the results in the realistic network model show that there is scope for improvements and better techniques to preserve locality information are required
Locality Preserving Projections for Grassmann manifold
Learning on Grassmann manifold has become popular in many computer vision
tasks, with the strong capability to extract discriminative information for
imagesets and videos. However, such learning algorithms particularly on
high-dimensional Grassmann manifold always involve with significantly high
computational cost, which seriously limits the applicability of learning on
Grassmann manifold in more wide areas. In this research, we propose an
unsupervised dimensionality reduction algorithm on Grassmann manifold based on
the Locality Preserving Projections (LPP) criterion. LPP is a commonly used
dimensionality reduction algorithm for vector-valued data, aiming to preserve
local structure of data in the dimension-reduced space. The strategy is to
construct a mapping from higher dimensional Grassmann manifold into the one in
a relative low-dimensional with more discriminative capability. The proposed
method can be optimized as a basic eigenvalue problem. The performance of our
proposed method is assessed on several classification and clustering tasks and
the experimental results show its clear advantages over other Grassmann based
algorithms.Comment: Accepted by IJCAI 201
Statistical Traffic State Analysis in Large-scale Transportation Networks Using Locality-Preserving Non-negative Matrix Factorization
Statistical traffic data analysis is a hot topic in traffic management and
control. In this field, current research progresses focus on analyzing traffic
flows of individual links or local regions in a transportation network. Less
attention are paid to the global view of traffic states over the entire
network, which is important for modeling large-scale traffic scenes. Our aim is
precisely to propose a new methodology for extracting spatio-temporal traffic
patterns, ultimately for modeling large-scale traffic dynamics, and long-term
traffic forecasting. We attack this issue by utilizing Locality-Preserving
Non-negative Matrix Factorization (LPNMF) to derive low-dimensional
representation of network-level traffic states. Clustering is performed on the
compact LPNMF projections to unveil typical spatial patterns and temporal
dynamics of network-level traffic states. We have tested the proposed method on
simulated traffic data generated for a large-scale road network, and reported
experimental results validate the ability of our approach for extracting
meaningful large-scale space-time traffic patterns. Furthermore, the derived
clustering results provide an intuitive understanding of spatial-temporal
characteristics of traffic flows in the large-scale network, and a basis for
potential long-term forecasting.Comment: IET Intelligent Transport Systems (2013
- …