29,441 research outputs found
Indexability, concentration, and VC theory
Degrading performance of indexing schemes for exact similarity search in high
dimensions has long since been linked to histograms of distributions of
distances and other 1-Lipschitz functions getting concentrated. We discuss this
observation in the framework of the phenomenon of concentration of measure on
the structures of high dimension and the Vapnik-Chervonenkis theory of
statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded,
improved and corrected version of the SISAP'2010 invited paper, this e-print,
v3
A secure data outsourcing scheme based on Asmuth – Bloom secret sharing
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Data outsourcing is an emerging paradigm for data management in which a database is provided as a service by third-party service providers. One of the major benefits of offering database as a service is to provide organisations, which are unable to purchase expensive hardware and software to host their databases, with efficient data storage accessible online at a cheap rate. Despite that, several issues of data confidentiality, integrity, availability and efficient indexing of users’ queries at the server side have to be addressed in the data outsourcing paradigm. Service providers have to guarantee that their clients’ data are secured against internal (insider) and external attacks. This paper briefly analyses the existing indexing schemes in data outsourcing and highlights their advantages and disadvantages. Then, this paper proposes a secure data outsourcing scheme based on Asmuth–Bloom secret sharing which tries to address the issues in data outsourcing such as data confidentiality, availability and order preservation for efficient indexing
Intrinsic Dimensionality
This entry for the SIGSPATIAL Special July 2010 issue on Similarity Searching
in Metric Spaces discusses the notion of intrinsic dimensionality of data in
the context of similarity search.Comment: 4 pages, 4 figures, latex; diagram (c) has been correcte
Recommended from our members
Noise-tolerant approximate blocking for dynamic real-time entity resolution
Entity resolution is the process of identifying records in one or multiple data sources that represent the same real-world entity. This process needs to deal with noisy data that contain for example wrong pronunciation or spelling errors. Many real world applications require rapid responses for entity queries on dynamic datasets. This brings challenges to existing approaches which are mainly aimed at the batch matching of records in static data. Locality sensitive hashing (LSH) is an approximate
blocking approach that hashes objects within a certain distance into the same block with high probability. How to make approximate blocking approaches scalable to large datasets and effective for entity resolution in real-time remains an open question. Targeting this problem, we propose a noise-tolerant approximate blocking approach to index records based on their distance ranges using LSH and sorting trees within large sized hash blocks. Experiments conducted on both synthetic and real-world
datasets show the effectiveness of the proposed approach
Multidimensional Index Modulation in Wireless Communications
In index modulation schemes, information bits are conveyed through indexing
of transmission entities such as antennas, subcarriers, times slots, precoders,
subarrays, and radio frequency (RF) mirrors. Index modulation schemes are
attractive for their advantages such as good performance, high rates, and
hardware simplicity. This paper focuses on index modulation schemes in which
multiple transmission entities, namely, {\em antennas}, {\em time slots}, and
{\em RF mirrors}, are indexed {\em simultaneously}. Recognizing that such
multidimensional index modulation schemes encourage sparsity in their transmit
signal vectors, we propose efficient signal detection schemes that use
compressive sensing based reconstruction algorithms. Results show that, for a
given rate, improved performance is achieved when the number of indexed
transmission entities is increased. We also explore indexing opportunities in
{\em load modulation}, which is a modulation scheme that offers power
efficiency and reduced RF hardware complexity advantages in multiantenna
systems. Results show that indexing space and time in load modulated
multiantenna systems can achieve improved performance
Using bag-of-concepts to improve the performance of support vector machines in text categorization
This paper investigates the use of concept-based representations for text categorization. We introduce a new approach to create concept-based text representations, and apply it to a standard text categorization collection. The representations are used as input to a Support Vector Machine classifier, and the results show that there are certain categories for which concept-based representations constitute a viable supplement to word-based ones. We also demonstrate how the performance of the Support Vector Machine can be improved by combining representations
- …