165 research outputs found
Hybrid image representation methods for automatic image annotation: a survey
In most automatic image annotation systems, images are represented with low level features using either global
methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is
beneficial in annotating images. In this paper, we provide a
survey on automatic image annotation techniques according to
one aspect: feature extraction, and, in order to complement
existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
Image Labeling on a Network: Using Social-Network Metadata for Image Classification
Large-scale image retrieval benchmarks invariably consist of images from the
Web. Many of these benchmarks are derived from online photo sharing networks,
like Flickr, which in addition to hosting images also provide a highly
interactive social community. Such communities generate rich metadata that can
naturally be harnessed for image classification and retrieval. Here we study
four popular benchmark datasets, extending them with social-network metadata,
such as the groups to which each image belongs, the comment thread associated
with the image, who uploaded it, their location, and their network of friends.
Since these types of data are inherently relational, we propose a model that
explicitly accounts for the interdependencies between images sharing common
properties. We model the task as a binary labeling problem on a network, and
use structured learning techniques to learn model parameters. We find that
social-network metadata are useful in a variety of classification tasks, in
many cases outperforming methods based on image content.Comment: ECCV 2012; 14 pages, 4 figure
Analysis of Censored Sample Population with GA-SVM
This paper is intended to propose a class of shrunken estimators for kth power of scale parameter in censored samples from one-parameter exponential population when some apriori or guessed value of the parameter is available besides the sample information and analyses their properties. The proposed class of Shrunken estimator is compared with usual unbiased estimator and minimum mean square error (MMSE) estimator. Eventually, empirical study is carried out to exhibit the performance of some Shrunken estimators of the proposed class over the MSME estimator. It is found that certain of these estimators substantially improve the classical estimators even for the guessed values of the kth power of scale parameter much away from the true value, specially for censored samples with small sizes
In Defense of MinHash Over SimHash
MinHash and SimHash are the two widely adopted Locality Sensitive Hashing
(LSH) algorithms for large-scale data processing applications. Deciding which
LSH to use for a particular problem at hand is an important question, which has
no clear answer in the existing literature. In this study, we provide a
theoretical answer (validated by experiments) that MinHash virtually always
outperforms SimHash when the data are binary, as common in practice such as
search.
The collision probability of MinHash is a function of resemblance similarity
(), while the collision probability of SimHash is a function of
cosine similarity (). To provide a common basis for comparison, we
evaluate retrieval results in terms of for both MinHash and
SimHash. This evaluation is valid as we can prove that MinHash is a valid LSH
with respect to , by using a general inequality . Our worst case analysis can
show that MinHash significantly outperforms SimHash in high similarity region.
Interestingly, our intensive experiments reveal that MinHash is also
substantially better than SimHash even in datasets where most of the data
points are not too similar to each other. This is partly because, in practical
data, often holds where
is only slightly larger than 2 (e.g., ). Our restricted worst case
analysis by assuming shows that MinHash indeed significantly
outperforms SimHash even in low similarity region.
We believe the results in this paper will provide valuable guidelines for
search in practice, especially when the data are sparse
- âŚ