23,861 research outputs found
Query recovery of short user queries: on query expansion with stopwords
User queries to search engines are observed to predominantly contain inflected content words but lack stopwords and capitalization. Thus, they often resemble natural language queries after case folding and stopword removal. Query recovery aims to generate a linguistically well-formed query from a given user query as input to provide natural language processing tasks and cross-language information retrieval (CLIR). The evaluation of query translation shows that translation scores (NIST and BLEU) decrease after case folding, stopword removal, and stemming. A baseline method for query recovery reconstructs capitalization and stopwords, which considerably increases translation scores and significantly increases mean average precision for a standard CLIR task
Strategic polymorphism requires just two combinators!
In previous work, we introduced the notion of functional strategies:
first-class generic functions that can traverse terms of any type while mixing
uniform and type-specific behaviour. Functional strategies transpose the notion
of term rewriting strategies (with coverage of traversal) to the functional
programming paradigm. Meanwhile, a number of Haskell-based models and
combinator suites were proposed to support generic programming with functional
strategies.
In the present paper, we provide a compact and matured reconstruction of
functional strategies. We capture strategic polymorphism by just two primitive
combinators. This is done without commitment to a specific functional language.
We analyse the design space for implementational models of functional
strategies. For completeness, we also provide an operational reference model
for implementing functional strategies (in Haskell). We demonstrate the
generality of our approach by reconstructing representative fragments of the
Strafunski library for functional strategies.Comment: A preliminary version of this paper was presented at IFL 2002, and
included in the informal preproceedings of the worksho
Affective Music Information Retrieval
Much of the appeal of music lies in its power to convey emotions/moods and to
evoke them in listeners. In consequence, the past decade witnessed a growing
interest in modeling emotions from musical signals in the music information
retrieval (MIR) community. In this article, we present a novel generative
approach to music emotion modeling, with a specific focus on the
valence-arousal (VA) dimension model of emotion. The presented generative
model, called \emph{acoustic emotion Gaussians} (AEG), better accounts for the
subjectivity of emotion perception by the use of probability distributions.
Specifically, it learns from the emotion annotations of multiple subjects a
Gaussian mixture model in the VA space with prior constraints on the
corresponding acoustic features of the training music pieces. Such a
computational framework is technically sound, capable of learning in an online
fashion, and thus applicable to a variety of applications, including
user-independent (general) and user-dependent (personalized) emotion
recognition and emotion-based music retrieval. We report evaluations of the
aforementioned applications of AEG on a larger-scale emotion-annotated corpora,
AMG1608, to demonstrate the effectiveness of AEG and to showcase how
evaluations are conducted for research on emotion-based MIR. Directions of
future work are also discussed.Comment: 40 pages, 18 figures, 5 tables, author versio
A Comparative analysis: QA evaluation questions versus real-world queries
This paper presents a comparative analysis of user queries to a web search engine, questions to a Q&A service (answers.com), and questions employed in question answering (QA) evaluations at TREC and CLEF. The analysis shows that user queries to search engines contain mostly content words (i.e. keywords) but lack structure words (i.e. stopwords) and capitalization. Thus, they resemble natural language input after case folding and stopword removal. In contrast, topics for QA evaluation and questions to answers.com mainly
consist of fully capitalized and syntactically well-formed questions. Classification experiments using a našıve Bayes classifier show that stopwords play an important role in determining the expected answer type. A classification based on stopwords is considerably more accurate (47.5% accuracy) than a classification based on all query words (40.1% accuracy) or on content words (33.9% accuracy). To
simulate user input, questions are preprocessed by case folding and stopword removal. Additional classification experiments aim at reconstructing the syntactic wh-word frame of a question, i.e. the embedding of the interrogative word. Results indicate that this part of
questions can be reconstructed with moderate accuracy (25.7%), but for a classification problem with a much larger number of classes compared to classifying queries by expected answer type (2096 classes vs. 130 classes). Furthermore, eliminating stopwords can lead to multiple reconstructed questions with a different or with the opposite meaning (e.g. if negations or temporal restrictions are included). In conclusion, question reconstruction from short user queries can be seen as a new realistic evaluation challenge for QA systems
Abduction in Well-Founded Semantics and Generalized Stable Models
Abductive logic programming offers a formalism to declaratively express and
solve problems in areas such as diagnosis, planning, belief revision and
hypothetical reasoning. Tabled logic programming offers a computational
mechanism that provides a level of declarativity superior to that of Prolog,
and which has supported successful applications in fields such as parsing,
program analysis, and model checking. In this paper we show how to use tabled
logic programming to evaluate queries to abductive frameworks with integrity
constraints when these frameworks contain both default and explicit negation.
The result is the ability to compute abduction over well-founded semantics with
explicit negation and answer sets. Our approach consists of a transformation
and an evaluation method. The transformation adjoins to each objective literal
in a program, an objective literal along with rules that ensure
that will be true if and only if is false. We call the resulting
program a {\em dual} program. The evaluation method, \wfsmeth, then operates on
the dual program. \wfsmeth{} is sound and complete for evaluating queries to
abductive frameworks whose entailment method is based on either the
well-founded semantics with explicit negation, or on answer sets. Further,
\wfsmeth{} is asymptotically as efficient as any known method for either class
of problems. In addition, when abduction is not desired, \wfsmeth{} operating
on a dual program provides a novel tabling method for evaluating queries to
ground extended programs whose complexity and termination properties are
similar to those of the best tabling methods for the well-founded semantics. A
publicly available meta-interpreter has been developed for \wfsmeth{} using the
XSB system.Comment: 48 pages; To appear in Theory and Practice in Logic Programmin
Medical Image Classification via SVM using LBP Features from Saliency-Based Folded Data
Good results on image classification and retrieval using support vector
machines (SVM) with local binary patterns (LBPs) as features have been
extensively reported in the literature where an entire image is retrieved or
classified. In contrast, in medical imaging, not all parts of the image may be
equally significant or relevant to the image retrieval application at hand. For
instance, in lung x-ray image, the lung region may contain a tumour, hence
being highly significant whereas the surrounding area does not contain
significant information from medical diagnosis perspective. In this paper, we
propose to detect salient regions of images during training and fold the data
to reduce the effect of irrelevant regions. As a result, smaller image areas
will be used for LBP features calculation and consequently classification by
SVM. We use IRMA 2009 dataset with 14,410 x-ray images to verify the
performance of the proposed approach. The results demonstrate the benefits of
saliency-based folding approach that delivers comparable classification
accuracies with state-of-the-art but exhibits lower computational cost and
storage requirements, factors highly important for big data analytics.Comment: To appear in proceedings of The 14th International Conference on
Machine Learning and Applications (IEEE ICMLA 2015), Miami, Florida, USA,
201
- âŠ