1,982 research outputs found
Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes
Full-text search engines are important tools for information retrieval. Term
proximity is an important factor in relevance score measurement. In a proximity
full-text search, we assume that a relevant document contains query terms near
each other, especially if the query terms are frequently occurring words. A
methodology for high-performance full-text query execution is discussed. We
build additional indexes to achieve better efficiency. For a word that occurs
in the text, we include in the indexes some information about nearby words.
What types of additional indexes do we use? How do we use them? These questions
are discussed in this work. We present the results of experiments showing that
the average time of search query execution is 44-45 times less than that
required when using ordinary inverted indexes.
This is a pre-print of a contribution "Veretennikov A.B. Proximity Full-Text
Search with a Response Time Guarantee by Means of Additional Indexes" published
in "Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications.
IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868"
published by Springer, Cham. The final authenticated version is available
online at: https://doi.org/10.1007/978-3-030-01054-6_66. The work was supported
by Act 211 Government of the Russian Federation, contract no 02.A03.21.0006.Comment: Alexander B. Veretennikov. Chair of Calculation Mathematics and
Computer Science, INSM. Ural Federal Universit
A user evaluation of hierarchical phrase browsing
Phrase browsing interfaces based on hierarchies of phrases extracted automatically from document collections offer a useful compromise between automatic full-text searching and manually-created subject indexes. The literature contains descriptions of such systems that many find compelling and persuasive. However, evaluation studies have either been anecdotal, or focused on objective measures of the quality of automatically-extracted index terms, or restricted to questions of computational efficiency and feasibility. This paper reports on an empirical, controlled user study that compares hierarchical phrase browsing with full-text searching over a range of information seeking tasks. Users found the results located via phrase browsing to be relevant and useful but preferred keyword searching for certain types of queries. Users experiences were marred by interface details, including inconsistencies between the phrase browser and the surrounding digital library interface
Path Queries on Compressed XML
Central to any XML query language is a path language such as XPath which operates on the tree structure of the XML document. We demonstrate in this paper that the tree structure can be e#ectively compressed and manipulated using techniques derived from symbolic model checking . Specifically, we show first that succinct representations of document tree structures based on sharing subtrees are highly e#ective. Second, we show that compressed structures can be queried directly and e#ciently through a process of manipulating selections of nodes and partial decompression
Towards a query language for annotation graphs
The multidimensional, heterogeneous, and temporal nature of speech databases
raises interesting challenges for representation and query. Recently,
annotation graphs have been proposed as a general-purpose representational
framework for speech databases. Typical queries on annotation graphs require
path expressions similar to those used in semistructured query languages.
However, the underlying model is rather different from the customary graph
models for semistructured data: the graph is acyclic and unrooted, and both
temporal and inclusion relationships are important. We develop a query language
and describe optimization techniques for an underlying relational
representation.Comment: 8 pages, 10 figure
MT techniques in a retrieval system of semantically enriched patents
This paper focuses on how automatic
translation techniques integrated in a
patent retrieval system increase its capabilities and make possible extended features and functionalities. We describe 1)
a novel methodology for natural language
to SPARQL translation based on a grammar–
ontology interoperability automation and a query grammar for the patents domain; 2) a devised strategy for statisticalbased
translation of patents that allows to transfer semantic annotations to the target
language; 3) a built-in knowledge representation infrastructure that uses multilingual semantic annotations; and 4) an online application that offers a multilingual
search interface over structural knowledge
databases (domain ontologies) and multilingual documents (biomedical patents)
that have been automatically translated.Peer ReviewedPostprint (published version
- …