109,778 research outputs found
A 2D based Partition Strategy for Solving Ranking under Team Context (RTP)
In this paper, we propose a 2D based partition method for solving the problem
of Ranking under Team Context(RTC) on datasets without a priori. We first map
the data into 2D space using its minimum and maximum value among all
dimensions. Then we construct window queries with consideration of current team
context. Besides, during the query mapping procedure, we can pre-prune some
tuples which are not top ranked ones. This pre-classified step will defer
processing those tuples and can save cost while providing solutions for the
problem. Experiments show that our algorithm performs well especially on large
datasets with correctness
Entity Linking for Queries by Searching Wikipedia Sentences
We present a simple yet effective approach for linking entities in queries.
The key idea is to search sentences similar to a query from Wikipedia articles
and directly use the human-annotated entities in the similar sentences as
candidate entities for the query. Then, we employ a rich set of features, such
as link-probability, context-matching, word embeddings, and relatedness among
candidate entities as well as their related entities, to rank the candidates
under a regression based framework. The advantages of our approach lie in two
aspects, which contribute to the ranking process and final linking result.
First, it can greatly reduce the number of candidate entities by filtering out
irrelevant entities with the words in the query. Second, we can obtain the
query sensitive prior probability in addition to the static link-probability
derived from all Wikipedia articles. We conduct experiments on two benchmark
datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ
dataset. Experimental results show that our method outperforms state-of-the-art
systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ
dataset
Incremental Discovery of Prominent Situational Facts
We study the novel problem of finding new, prominent situational facts, which
are emerging statements about objects that stand out within certain contexts.
Many such facts are newsworthy---e.g., an athlete's outstanding performance in
a game, or a viral video's impressive popularity. Effective and efficient
identification of these facts assists journalists in reporting, one of the main
goals of computational journalism. Technically, we consider an ever-growing
table of objects with dimension and measure attributes. A situational fact is a
"contextual" skyline tuple that stands out against historical tuples in a
context, specified by a conjunctive constraint involving dimension attributes,
when a set of measure attributes are compared. New tuples are constantly added
to the table, reflecting events happening in the real world. Our goal is to
discover constraint-measure pairs that qualify a new tuple as a contextual
skyline tuple, and discover them quickly before the event becomes yesterday's
news. A brute-force approach requires exhaustive comparison with every tuple,
under every constraint, and in every measure subspace. We design algorithms in
response to these challenges using three corresponding ideas---tuple reduction,
constraint pruning, and sharing computation across measure subspaces. We also
adopt a simple prominence measure to rank the discovered facts when they are
numerous. Experiments over two real datasets validate the effectiveness and
efficiency of our techniques
Models for Paired Comparison Data: A Review with Emphasis on Dependent Data
Thurstonian and Bradley-Terry models are the most commonly applied models in
the analysis of paired comparison data. Since their introduction, numerous
developments have been proposed in different areas. This paper provides an
updated overview of these extensions, including how to account for object- and
subject-specific covariates and how to deal with ordinal paired comparison
data. Special emphasis is given to models for dependent comparisons. Although
these models are more realistic, their use is complicated by numerical
difficulties. We therefore concentrate on implementation issues. In particular,
a pairwise likelihood approach is explored for models for dependent paired
comparison data, and a simulation study is carried out to compare the
performance of maximum pairwise likelihood with other limited information
estimation methods. The methodology is illustrated throughout using a real data
set about university paired comparisons performed by students.Comment: Published in at http://dx.doi.org/10.1214/12-STS396 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
In this paper, we present our work towards comparing on-line and off-line
evaluation metrics in the context of small e-commerce recommender systems.
Recommending on small e-commerce enterprises is rather challenging due to the
lower volume of interactions and low user loyalty, rarely extending beyond a
single session. On the other hand, we usually have to deal with lower volumes
of objects, which are easier to discover by users through various
browsing/searching GUIs.
The main goal of this paper is to determine applicability of off-line
evaluation metrics in learning true usability of recommender systems (evaluated
on-line in A/B testing). In total 800 variants of recommending algorithms were
evaluated off-line w.r.t. 18 metrics covering rating-based, ranking-based,
novelty and diversity evaluation. The off-line results were afterwards compared
with on-line evaluation of 12 selected recommender variants and based on the
results, we tried to learn and utilize an off-line to on-line results
prediction model.
Off-line results shown a great variance in performance w.r.t. different
metrics with the Pareto front covering 68\% of the approaches. Furthermore, we
observed that on-line results are considerably affected by the novelty of
users. On-line metrics correlates positively with ranking-based metrics (AUC,
MRR, nDCG) for novice users, while too high values of diversity and novelty had
a negative impact on the on-line results for them. For users with more visited
items, however, the diversity became more important, while ranking-based
metrics relevance gradually decrease.Comment: Submitted to ACM Hypertext 2020 Conferenc
Utilising semantic technologies for intelligent indexing and retrieval of digital images
The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as the exploitation of lexical databases for explicit semantic-based query expansion
On Minimum Violations Ranking in Paired Comparisons
Ranking a set of objects from the most dominant one to the least, based on
the results of paired comparisons, proves to be useful in many contexts. Using
the rankings of teams or individuals players in sports to seed tournaments is
an example. The quality of a ranking is often evaluated by the number of
violations, cases in which an object is ranked lower than another that it has
dominated in a comparison, that it contains. A minimum violations ranking (MVR)
method, as its name suggests, searches specifically for rankings that have the
minimum possible number of violations which may or may not be zero. In this
paper, we present a method based on statistical physics that overcomes
conceptual and practical difficulties faced by earlier studies of the problem.Comment: 10 pages, 10 figures; typos corrected (v2
- …