Search CORE

4 research outputs found

Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes

Author: AB Veretennikov
AB Veretennikov
AB Veretennikov
G Zipf
HE Williams
Justin Zobel
Matthew Chang
S Gugnani
Sergey Brin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Full-text search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity full-text search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance full-text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of search query execution is 44-45 times less than that required when using ordinary inverted indexes. This is a pre-print of a contribution "Veretennikov A.B. Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes" published in "Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868" published by Springer, Cham. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-01054-6_66. The work was supported by Act 211 Government of the Russian Federation, contract no 02.A03.21.0006.Comment: Alexander B. Veretennikov. Chair of Calculation Mathematics and Computer Science, INSM. Ural Federal Universit

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Author: AB Veretennikov
AB Veretennikov
AB Veretennikov
AB Veretennikov
AB Veretennikov
AB Veretennikov
AB Veretennikov
BJ Jansen
G Zipf
HE Williams
Justin Zobel
JWJ Williams
R Schenkel
RWP Luk
Yves Rasolofo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in a text, we use additional indexes to store information about nearby words that are at distances from the given word of less than or equal to the MaxDistance parameter. We showed that additional indexes with three-component keys can be used to improve the average query execution time by up to 94.7 times if the queries consist of high-frequency occurring words. In this paper, we present a new search algorithm with even more performance gains. We consider several strategies for selecting multi-component key indexes for a specific query and compare these strategies with the optimal strategy. We also present the results of search experiments, which show that three-component key indexes enable much faster searches in comparison with two-component key indexes. This is a pre-print of a contribution "Veretennikov A.B. (2019) Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance." published in "Manolopoulos Y., Stupnikov S. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2018. Communications in Computer and Information Science, vol 1003" published by Springer, Cham. This book constitutes the refereed proceedings of the 20th International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2018, held in Moscow, Russia, in October 2018. The 9 revised full papers presented together with three invited papers were carefully reviewed and selected from 54 submissions. The final authenticated version is available online at https://doi.org/10.1007/978-3-030-23584-0_7.Comment: Revised paper of "Veretennikov A.B. Proximity full-text search with a response time guarantee by means of additional indexes with multi-component keys", Selected Papers of the XX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2018), Moscow, Russia, October 9-12, 2018, http://ceur-ws.org/Vol-2277, http://ceur-ws.org/Vol-2277/paper23.pd

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Proximity Full-Text Searches of Frequently Occurring Words with a Response Time Guarantee

Author: AB Veretennikov
AB Veretennikov
AB Veretennikov
AB Veretennikov
BJ Jansen
G Zipf
HE Williams
Justin Zobel
JWJ Williams
RB Miller
Yves Rasolofo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin