14,510 research outputs found
Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor when the number of labeled positive feedback samples is small. This is mainly due to three reasons: 1) an SVM classifier is unstable on a small-sized training set, 2) SVM's optimal hyperplane may be biased when the positive feedback samples are much less than the negative feedback samples, and 3) overfitting happens because the number of feature dimensions is much higher than the size of the training set. In this paper, we develop a mechanism to overcome these problems. To address the first two problems, we propose an asymmetric bagging-based SVM (AB-SVM). For the third problem, we combine the random subspace method and SVM for relevance feedback, which is named random subspace SVM (RS-SVM). Finally, by integrating AB-SVM and RS-SVM, an asymmetric bagging and random subspace SVM (ABRS-SVM) is built to solve these three problems and further improve the relevance feedback performance
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Query Chains: Learning to Rank from Implicit Feedback
This paper presents a novel approach for using clickthrough data to learn
ranked retrieval functions for web search results. We observe that users
searching the web often perform a sequence, or chain, of queries with a similar
information need. Using query chains, we generate new types of preference
judgments from search engine logs, thus taking advantage of user intelligence
in reformulating queries. To validate our method we perform a controlled user
study comparing generated preference judgments to explicit relevance judgments.
We also implemented a real-world search engine to test our approach, using a
modified ranking SVM to learn an improved ranking function from preference
data. Our results demonstrate significant improvements in the ranking given by
the search engine. The learned rankings outperform both a static ranking
function, as well as one trained without considering query chains.Comment: 10 page
Interactive retrieval of video using pre-computed shot-shot similarities
A probabilistic framework for content-based interactive video retrieval is described. The developed indexing of video fragments originates from the probability of the user's positive judgment about key-frames of video shots. Initial estimates of the probabilities are obtained from low-level feature representation. Only statistically significant estimates are picked out, the rest are replaced by an appropriate constant allowing efficient access at search time without loss of search quality and leading to improvement in most experiments. With time, these probability estimates are updated from the relevance judgment of users performing searches, resulting in further substantial increases in mean average precision
End-to-end Learning for Short Text Expansion
Effectively making sense of short texts is a critical task for many real
world applications such as search engines, social media services, and
recommender systems. The task is particularly challenging as a short text
contains very sparse information, often too sparse for a machine learning
algorithm to pick up useful signals. A common practice for analyzing short text
is to first expand it with external information, which is usually harvested
from a large collection of longer texts. In literature, short text expansion
has been done with all kinds of heuristics. We propose an end-to-end solution
that automatically learns how to expand short text to optimize a given learning
task. A novel deep memory network is proposed to automatically find relevant
information from a collection of longer documents and reformulate the short
text through a gating mechanism. Using short text classification as a
demonstrating task, we show that the deep memory network significantly
outperforms classical text expansion methods with comprehensive experiments on
real world data sets.Comment: KDD'201
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
- âŚ