4 research outputs found
Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes
Full-text search engines are important tools for information retrieval. Term
proximity is an important factor in relevance score measurement. In a proximity
full-text search, we assume that a relevant document contains query terms near
each other, especially if the query terms are frequently occurring words. A
methodology for high-performance full-text query execution is discussed. We
build additional indexes to achieve better efficiency. For a word that occurs
in the text, we include in the indexes some information about nearby words.
What types of additional indexes do we use? How do we use them? These questions
are discussed in this work. We present the results of experiments showing that
the average time of search query execution is 44-45 times less than that
required when using ordinary inverted indexes.
This is a pre-print of a contribution "Veretennikov A.B. Proximity Full-Text
Search with a Response Time Guarantee by Means of Additional Indexes" published
in "Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications.
IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868"
published by Springer, Cham. The final authenticated version is available
online at: https://doi.org/10.1007/978-3-030-01054-6_66. The work was supported
by Act 211 Government of the Russian Federation, contract no 02.A03.21.0006.Comment: Alexander B. Veretennikov. Chair of Calculation Mathematics and
Computer Science, INSM. Ural Federal Universit
Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance
Full-text search engines are important tools for information retrieval. In a
proximity full-text search, a document is relevant if it contains query terms
near each other, especially if the query terms are frequently occurring words.
For each word in a text, we use additional indexes to store information about
nearby words that are at distances from the given word of less than or equal to
the MaxDistance parameter. We showed that additional indexes with
three-component keys can be used to improve the average query execution time by
up to 94.7 times if the queries consist of high-frequency occurring words. In
this paper, we present a new search algorithm with even more performance gains.
We consider several strategies for selecting multi-component key indexes for a
specific query and compare these strategies with the optimal strategy. We also
present the results of search experiments, which show that three-component key
indexes enable much faster searches in comparison with two-component key
indexes.
This is a pre-print of a contribution "Veretennikov A.B. (2019) Proximity
Full-Text Search by Means of Additional Indexes with Multi-component Keys: In
Pursuit of Optimal Performance." published in "Manolopoulos Y., Stupnikov S.
(eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL
2018. Communications in Computer and Information Science, vol 1003" published
by Springer, Cham. This book constitutes the refereed proceedings of the 20th
International Conference on Data Analytics and Management in Data Intensive
Domains, DAMDID/RCDL 2018, held in Moscow, Russia, in October 2018. The 9
revised full papers presented together with three invited papers were carefully
reviewed and selected from 54 submissions. The final authenticated version is
available online at https://doi.org/10.1007/978-3-030-23584-0_7.Comment: Revised paper of "Veretennikov A.B. Proximity full-text search with a
response time guarantee by means of additional indexes with multi-component
keys", Selected Papers of the XX International Conference on Data Analytics
and Management in Data Intensive Domains (DAMDID/RCDL 2018), Moscow, Russia,
October 9-12, 2018, http://ceur-ws.org/Vol-2277,
http://ceur-ws.org/Vol-2277/paper23.pd
Proximity Full-Text Searches of Frequently Occurring Words with a Response Time Guarantee
Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in the text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to MaxDistance, which is a parameter. A search algorithm for the case when the query consists of high-frequently occurring words is discussed. In addition, we present results of experiments with different values of MaxDistance to evaluate the search speed dependence on the value of MaxDistance. These results show that the average time of the query execution with our indexes is 94.7–45.9 times (depending on the value of MaxDistance) less than that with standard inverted files when queries that contain high-frequently occurring words are evaluated. © Springer Nature Switzerland AG 2020