10,221 research outputs found
Caching Historical Embeddings in Conversational Search
Rapid response, namely low latency, is fundamental in search applications; it
is particularly so in interactive search sessions, such as those encountered in
conversational settings. An observation with a potential to reduce latency
asserts that conversational queries exhibit a temporal locality in the lists of
documents retrieved. Motivated by this observation, we propose and evaluate a
client-side document embedding cache, improving the responsiveness of
conversational search systems. By leveraging state-of-the-art dense retrieval
models to abstract document and query semantics, we cache the embeddings of
documents retrieved for a topic introduced in the conversation, as they are
likely relevant to successive queries. Our document embedding cache implements
an efficient metric index, answering nearest-neighbor similarity queries by
estimating the approximate result sets returned. We demonstrate the efficiency
achieved using our cache via reproducible experiments based on TREC CAsT
datasets, achieving a hit rate of up to 75% without degrading answer quality.
Our achieved high cache hit rates significantly improve the responsiveness of
conversational systems while likewise reducing the number of queries managed on
the search back-end
Characterizing and Subsetting Big Data Workloads
Big data benchmark suites must include a diversity of data and workloads to
be useful in fairly evaluating big data systems and architectures. However,
using truly comprehensive benchmarks poses great challenges for the
architecture community. First, we need to thoroughly understand the behaviors
of a variety of workloads. Second, our usual simulation-based research methods
become prohibitively expensive for big data. As big data is an emerging field,
more and more software stacks are being proposed to facilitate the development
of big data applications, which aggravates hese challenges. In this paper, we
first use Principle Component Analysis (PCA) to identify the most important
characteristics from 45 metrics to characterize big data workloads from
BigDataBench, a comprehensive big data benchmark suite. Second, we apply a
clustering technique to the principle components obtained from the PCA to
investigate the similarity among big data workloads, and we verify the
importance of including different software stacks for big data benchmarking.
Third, we select seven representative big data workloads by removing redundant
ones and release the BigDataBench simulation version, which is publicly
available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload
Characterizatio
Measuring and Managing Answer Quality for Online Data-Intensive Services
Online data-intensive services parallelize query execution across distributed
software components. Interactive response time is a priority, so online query
executions return answers without waiting for slow running components to
finish. However, data from these slow components could lead to better answers.
We propose Ubora, an approach to measure the effect of slow running components
on the quality of answers. Ubora randomly samples online queries and executes
them twice. The first execution elides data from slow components and provides
fast online answers; the second execution waits for all components to complete.
Ubora uses memoization to speed up mature executions by replaying network
messages exchanged between components. Our systems-level implementation works
for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the
EasyRec Recommendation Engine, and the OpenEphyra question answering system.
Ubora computes answer quality much faster than competing approaches that do not
use memoization. With Ubora, we show that answer quality can and should be used
to guide online admission control. Our adaptive controller processed 37% more
queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor
- …