175 research outputs found
Context Sensitive Search String Composition Algorithm using User Intention to Handle Ambiguous Keywords
Finding the required URL among the first few result pages of a search engine is still a challenging task. This may require number of reformulations of the search string thus adversely affecting user's search time. Query ambiguity and polysemy are major reasons for not obtaining relevant results in the top few result pages. Efficient query composition and data organization are necessary for getting effective results. Context of the information need and the user intent may improve the autocomplete feature of existing search engines. This research proposes a Funnel Mesh-5 algorithm (FM5) to construct a search string taking into account context of information need and user intention with three main steps 1) Predict user intention with user profiles and the past searches via weighted mesh structure 2) Resolve ambiguity and polysemy of search strings with context and user intention 3) Generate a personalized disambiguated search string by query expansion encompassing user intention and predicted query. Experimental results for the proposed approach and a comparison with direct use of search engine are presented. A comparison of FM5 algorithm with K Nearest Neighbor algorithm for user intention identification is also presented. The proposed system provides better precision for search results for ambiguous search strings with improved identification of the user intention. Results are presented for English language dataset as well as Marathi (an Indian language) dataset of ambiguous search strings.
The design space of a configurable autocompletion component
Autocompletion is a commonly used interface feature in diverse applications. Semantic Web data has, on the one hand, the potential to provide new functionality by exploiting the semantics in the data used for generating autocompletion suggestions. Semantic Web applications, on the other hand, typically pose extra requirements on the semantic properties of the suggestions given. When the number of syntactic matches becomes too large, some means of selecting a semantically meaningful subset of suggestions to be presented to the user is needed. In this paper we identify a number of key design dimensions of autocompletion interface components. Our hypothesis is that a one-size-fits-all solution to autocompletion interface components does not exist, because different tasks and different data sets require interfaces corresponding to different points in our design space. We present a fully configurable architecture, which can be used to configure autocompletion components to the desired point in this design space. The architecture has been implemented as an open source software component that can be plugged into a variety of applications. We report on the results of a user evaluation that confirms this hypothesis, and describe the need to evaluate semantic autocompletion in a task and application-specific context
The design space of a configurable autocompletion component
Autocompletion is a commonly used interface feature in diverse applications. Semantic Web data has, on the one hand, the potential to provide new functionality by exploiting the semantics in the data used for generating autocompletion suggestions. Semantic Web applications, on the other hand, typically pose extra requirements on the semantic properties of the suggestions given. When the number of syntactic matches becomes too large, some means of selecting a semantically meaningful subset of suggestions to be presented to the user is needed. In this paper we identify a number of key design dimensions of autocompletion interface components. Our hypothesis is that a one-size-fits-all solution to autocompletion interface components does not exist, because different tasks and different data sets require interfaces corresponding to different points in our design space. We present a fully configurable architecture, which can be used to configure autocompletion components to the desired point in this design space. The architecture has been implemented as an open source software component that can be plugged into a variety of applications. We report on the results of a user evaluation that confirms this hypothesis, and describe the need to evaluate semantic autocompletion in a task and application-specific context
AirIndex: Versatile Index Tuning Through Data and Storage
The end-to-end lookup latency of a hierarchical index -- such as a B-tree or
a learned index -- is determined by its structure such as the number of layers,
the kinds of branching functions appearing in each layer, the amount of data we
must fetch from layers, etc. Our primary observation is that by optimizing
those structural parameters (or designs) specifically to a target system's I/O
characteristics (e.g., latency, bandwidth), we can offer a faster lookup
compared to the ones that are not optimized. Can we develop a systematic method
for finding those optimal design parameters? Ideally, the method must have the
potential to generate almost any existing index or a novel combination of them
for the fastest possible lookup.
In this work, we present new data and an I/O-aware index builder (called
AirIndex) that can find high-speed hierarchical index designs in a principled
way. Specifically, AirIndex minimizes an objective function expressing the
end-to-end latency in terms of various designs -- the number of layers, types
of layers, and more -- for given data and a storage profile, using a
graph-based optimization method purpose-built to address the computational
challenges rising from the inter-dependencies among index layers and the
exponentially many candidate parameters in a large search space. Our empirical
studies confirm that AirIndex can find optimal index designs, build optimal
indexes within the times comparable to existing methods, and deliver up to 4.1x
faster lookup than a lightweight B-tree library (LMDB), 3.3x--46.3x faster than
state-of-the-art learned indexes (RMI/CDFShop, PGM-Index, ALEX/APEX, PLEX), and
2.0 faster than Data Calculator's suggestion on various dataset and storage
settings.Comment: 13 pages, 3 appendices, 19 figures, to appear at SIGMOD 202
- …