Search CORE

20 research outputs found

Enhancing Knowledge Bases with Quantity Facts

Author: Ho V.
Milchevski D.
Stepanova D.
Strötgen J.
Weikum G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

The Sweet Spot between Inverted Indices and Metric-Space Indexing for Top-K–List Similarity Search

Author: Anand Avishek
Michel Sebastian
Milchevski Evica
Publication venue: Konstanz : OpenProceedings
Publication date: 01/01/2015
Field of study

We consider the problem of processing similarity queries over a set of top-k rankings where the query ranking and the similarity threshold are provided at query time. Spearman’s Footrule distance is used to compute the similarity between rankings, considering how well rankings agree on the positions (ranks) of ranked items (i.e., the L1 distance). This setup allows the application of metric index structures such as M- or BK-trees and, alternatively, enables the use of traditional inverted indices for retrieving rankings that overlap (in items) with the query. Although both techniques are reasonable, they come with individual drawbacks for our specific problem. In this paper, we propose a hybrid indexing strategy, which blends inverted indices and metric space indexing, resulting in a structure that resembles both indexing methods with tunable emphasis on one or the other. To find the sweet spot, we propose an assumption-lean but highly accurate (empirically validated) cost model through theoretical analysis. We further present optimizations to the inverted index component, for early termination and minimizing bookkeeping. The performance of the proposed algorithms, hybrid variants, and competitors is studied in a comprehensive evaluation using real-world benchmark data consisting of Web-search–result rankings and entity rankings based on Wikipedia

Institutionelles Repositorium der Leibniz Universität Hannover

Towards the Bosch materials science knowledge base

Author: Adel Heike
Friedrich Annemarie
Hildebrand Felix
Kharlamov Evgeny
Marusczyk Anika
Milchevski Dragan
Stepanova Daria
Strötgen Jannik
Tomazic Federico
Tran Trung-Kien
Publication venue
Publication date: 07/07/2023
Field of study

OPUS Augsburg

Entity Recommendation Based on {Wikipedia}

Author: Milchevski D.
Publication venue: Universität des Saarlandes
Publication date: 01/01/2013
Field of study

MPG.PuRe

Similarity Search Algorithms over Top-k Rankings and Class-Constrained Objects

Author: Milchevski Evica
Publication venue
Publication date: 01/01/2019
Field of study

In this thesis, we consider the problem of processing similarity queries over a dataset of top-k rankings and class constrained objects. Top-k rankings are the most natural and widely used technique to compress a large amount of information into a concise form. Spearman’s Footrule distance is used to compute the similarity between rankings, considering how well rankings agree on the positions (ranks) of ranked items. This setup allows the application of metric distance-based pruning strategies, and, alternatively, enables the use of traditional inverted indices for retrieving rankings that overlap in items. Although both techniques can be individually applied, we hypothesize that blending these two would lead to better performance. First, we formulate theoretical bounds over the rankings, based on Spearman's Footrule distance, which are essential for adapting existing, inverted index based techniques to the setting of top-k rankings. Further, we propose a hybrid indexing strategy, designed for efficiently processing similarity range queries, which incorporates inverted indices and metric space indices, such as M- or BK-trees, resulting in a structure that resembles both indexing methods with tunable emphasis on one or the other. Moreover, optimizations to the inverted index component are presented, for early termination and minimizing bookkeeping. As vast amounts of data are being generated on a daily bases, we further present a distributed, highly tunable, approach, implemented in Apache Spark, for efficiently processing similarity join queries over top-k rankings. To combine distance-based filtering with inverted indices, the algorithm works in several phases. The partial results are joined for the computation of the final result set. As the last contribution of the thesis, we consider processing k-nearest-neighbor (k-NN) queries over class-constrained objects, with the additional requirement that the result objects are of a specific type. We introduce the MISP index, which first indexes the objects by their (combination of) class belonging, followed by a similarity search sub index for each subset of objects. The number of such subsets can combinatorially explode, thus, we provide a cost model that analyzes the performance of the MISP index structure under different configurations, with the aim of finding the most efficient one for the dataset being searched

KLUEDO Kaiserslauterer uniweiter elektronischer Dokumentenserver

Similarity Search Algorithms over Top-k Rankings and Class-Constrained Objects

Author: Milchevski Evica
Publication venue
Publication date: 01/01/2019
Field of study

KLUEDO Kaiserslauterer uniweiter elektronischer Dokumentenserver

{X-REC}: Cross-category Entity Recommendation

Author: Berberich K.
Milchevski D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Crossref

MPG.PuRe

Linking {Wikipedia} Events to Past News

Author: Berberich K.
Milchevski D.
Mishra A.
Publication venue
Publication date: 01/01/2014
Field of study

We consider the task of linking Wikipedia events to rele-vant news articles from the past. Descriptions of events are abundant in Wikipedia and systematically curated in year, decade, and century articles. To address this task, we develop a two-stage cascade approach that builds a query model from temporal expressions in a set of initially re-trieved documents. As baselines we consider several meth-ods that integrate publication dates and/or temporal expres-sions into a language modeling approach. Our experimen-tal evaluation on 50 randomly sampled Wikipedia events with crowd-sourced relevance assessments shows that the two-stage cascade approach outperforms the baselines. Our experimental testbed of queries and relevance assessments is made publicly available

CiteSeerX

MPG.PuRe

STICS: searching with strings, things, and cats

Author: Hoffart Johannes
Milchevski Dragan
Weikum Gerhard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

MPG.PuRe