9,876 research outputs found
Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation
With the wide deployment of public cloud computing infrastructures, using
clouds to host data query services has become an appealing solution for the
advantages on scalability and cost-saving. However, some data might be
sensitive that the data owner does not want to move to the cloud unless the
data confidentiality and query privacy are guaranteed. On the other hand, a
secured query service should still provide efficient query processing and
significantly reduce the in-house workload to fully realize the benefits of
cloud computing. We propose the RASP data perturbation method to provide secure
and efficient range query and kNN query services for protected data in the
cloud. The RASP data perturbation method combines order preserving encryption,
dimensionality expansion, random noise injection, and random projection, to
provide strong resilience to attacks on the perturbed data and queries. It also
preserves multidimensional ranges, which allows existing indexing techniques to
be applied to speedup range query processing. The kNN-R algorithm is designed
to work with the RASP range query algorithm to process the kNN queries. We have
carefully analyzed the attacks on data and queries under a precisely defined
threat model and realistic security assumptions. Extensive experiments have
been conducted to show the advantages of this approach on efficiency and
security.Comment: 18 pages, to appear in IEEE TKDE, accepted in December 201
Mining Biclusters of Similar Values with Triadic Concept Analysis
Biclustering numerical data became a popular data-mining task in the
beginning of 2000's, especially for analysing gene expression data. A bicluster
reflects a strong association between a subset of objects and a subset of
attributes in a numerical object/attribute data-table. So called biclusters of
similar values can be thought as maximal sub-tables with close values. Only few
methods address a complete, correct and non redundant enumeration of such
patterns, which is a well-known intractable problem, while no formal framework
exists. In this paper, we introduce important links between biclustering and
formal concept analysis. More specifically, we originally show that Triadic
Concept Analysis (TCA), provides a nice mathematical framework for
biclustering. Interestingly, existing algorithms of TCA, that usually apply on
binary data, can be used (directly or with slight modifications) after a
preprocessing step for extracting maximal biclusters of similar values.Comment: Concept Lattices and their Applications (CLA) (2011
Clustering for Different Scales of Measurement - the Gap-Ratio Weighted K-means Algorithm
This paper describes a method for clustering data that are spread out over
large regions and which dimensions are on different scales of measurement. Such
an algorithm was developed to implement a robotics application consisting in
sorting and storing objects in an unsupervised way. The toy dataset used to
validate such application consists of Lego bricks of different shapes and
colors. The uncontrolled lighting conditions together with the use of RGB color
features, respectively involve data with a large spread and different levels of
measurement between data dimensions. To overcome the combination of these two
characteristics in the data, we have developed a new weighted K-means
algorithm, called gap-ratio K-means, which consists in weighting each dimension
of the feature space before running the K-means algorithm. The weight
associated with a feature is proportional to the ratio of the biggest gap
between two consecutive data points, and the average of all the other gaps.
This method is compared with two other variants of K-means on the Lego bricks
clustering problem as well as two other common classification datasets.Comment: 13 pages, 6 figures, 2 tables. This paper is under the review process
for AIAP 201
- …