17,312 research outputs found
Limited-Magnitude Error-Correcting Gray Codes for Rank Modulation
We construct Gray codes over permutations for the rank-modulation scheme,
which are also capable of correcting errors under the infinity-metric. These
errors model limited-magnitude or spike errors, for which only
single-error-detecting Gray codes are currently known. Surprisingly, the
error-correcting codes we construct achieve a better asymptotic rate than that
of presently known constructions not having the Gray property, and exceed the
Gilbert-Varshamov bound. Additionally, we present efficient ranking and
unranking procedures, as well as a decoding procedure that runs in linear time.
Finally, we also apply our methods to solve an outstanding issue with
error-detecting rank-modulation Gray codes (snake-in-the-box codes) under a
different metric, the Kendall -metric, in the group of permutations over
an even number of elements , where we provide asymptotically optimal
codes.Comment: Revised version for journal submission. Additional results include
more tight auxiliary constructions, a decoding shcema, ranking/unranking
procedures, and application to snake-in-the-box codes under the Kendall
tau-metri
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measures
Nearest neighbor classification using shape context can yield highly accurate results in a number of recognition problems. Unfortunately, the approach can be too slow for practical applications, and thus approximation strategies are needed to make shape context practical. This paper proposes a method for efficient and accurate nearest neighbor classification in non-Euclidean spaces, such as the space induced by the shape context measure. First, a method is introduced for constructing a Euclidean embedding that is optimized for nearest neighbor classification accuracy. Using that embedding, multiple approximations of the underlying non-Euclidean similarity measure are obtained, at different levels of accuracy and efficiency. The approximations are automatically combined to form a cascade classifier, which applies the slower approximations only to the hardest cases. Unlike typical cascade-of-classifiers approaches, that are applied to binary classification problems, our method constructs a cascade for a multiclass problem. Experiments with a standard shape data set indicate that a two-to-three order of magnitude speed up is gained over the standard shape context classifier, with minimal losses in classification accuracy.National Science Foundation (IIS-0308213, IIS-0329009, EIA-0202067); Office of Naval Research (N00014-03-1-0108
A practical and secure multi-keyword search method over encrypted cloud data
Cloud computing technologies become more and more popular every year, as many organizations tend to outsource their data utilizing robust and fast services of clouds while lowering the cost of hardware ownership. Although its benefits are welcomed, privacy is still a remaining concern that needs to be addressed. We propose an efficient privacy-preserving search method over encrypted cloud data that utilizes minhash functions. Most of the work in literature can only support a single feature search in queries which reduces the effectiveness. One of the main advantages of our proposed method is the capability of multi-keyword search in a single query. The proposed method is proved to satisfy adaptive semantic security definition. We also combine an effective ranking capability that is based on term frequency-inverse document frequency (tf-idf) values of keyword document pairs. Our analysis demonstrates that the proposed scheme is proved to be privacy-preserving, efficient and effective
How to measure metallicity from five-band photometry with supervised machine learning algorithms
We demonstrate that it is possible to measure metallicity from the SDSS
five-band photometry to better than 0.1 dex using supervised machine learning
algorithms. Using spectroscopic estimates of metallicity as ground truth, we
build, optimize and train several estimators to predict metallicity. We use the
observed photometry, as well as derived quantities such as stellar mass and
photometric redshift, as features, and we build two sample data sets at median
redshifts of 0.103 and 0.218 and median r-band magnitude of 17.5 and 18.3
respectively. We find that ensemble methods, such as Random Forests of Trees
and Extremely Randomized Trees, and Support Vector Machines all perform
comparably well and can measure metallicity with a Root Mean Square Error
(RMSE) of 0.081 and 0.090 for the two data sets when all objects are included.
The fraction of outliers (objects for which |Z_true - Z_pred| > 0.2 dex) is 2.2
and 3.9%, respectively and the RMSE decreases to 0.068 and 0.069 if those
objects are excluded. Because of the ability of these algorithms to capture
complex relationships between data and target, our technique performs better
than previously proposed methods that sought to fit metallicity using an
analytic fitting formula, and has 3x more constraining power than SED
fitting-based methods. Additionally, this method is extremely forgiving of
contamination in the training set, and can be used with very satisfactory
results for training sample sizes of just a few hundred objects. We distribute
all the routines to reproduce our results and apply them to other data sets.Comment: Minor revisions, matching version published in MNRA
Visual Comfort Assessment for Stereoscopic Image Retargeting
In recent years, visual comfort assessment (VCA) for 3D/stereoscopic content
has aroused extensive attention. However, much less work has been done on the
perceptual evaluation of stereoscopic image retargeting. In this paper, we
first build a Stereoscopic Image Retargeting Database (SIRD), which contains
source images and retargeted images produced by four typical stereoscopic
retargeting methods. Then, the subjective experiment is conducted to assess
four aspects of visual distortion, i.e. visual comfort, image quality, depth
quality and the overall quality. Furthermore, we propose a Visual Comfort
Assessment metric for Stereoscopic Image Retargeting (VCA-SIR). Based on the
characteristics of stereoscopic retargeted images, the proposed model
introduces novel features like disparity range, boundary disparity as well as
disparity intensity distribution into the assessment model. Experimental
results demonstrate that VCA-SIR can achieve high consistency with subjective
perception
- …