Search CORE

144 research outputs found

Scalable query result caching for web applications

Author: Alonso G.
Altinel M.
Amiri K.
Levy A. Y.
Li W.
Olston C.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Static query result caching revisited

Author: Altingovde I.S.
Ozcan R.
Ulusoy O.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Query result caching is an important mechanism for search engine efficiency. In this study, we first review several query features that are used to determine the contents of a static result cache. Next, we introduce a new feature that more accurately represents the popularity of a query by measuring the stability of query frequency over a set of time intervals. Experimental results show that this new feature achieves hit ratios better than those of the previously proposed features

Bilkent University Institutional Repository

Recommended from our members

DotSlash: Providing Dynamic Scalability to Web Applications with On-demand Distributed Query Result Caching

Author: Schulzrinne Henning G.
Zhao Weibin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Scalability poses a significant challenge for today's web applications, mainly due to the large population of potential users. To effectively address the problem of short-term dramatic load spikes caused by web hotspots, we developed a self-configuring and scalable rescue system called DotSlash. The primary goal of our system is to provide dynamic scalability to web applications by enabling a web site to obtain resources dynamically, and use them autonomically without any administrative intervention. To address the database server bottleneck, DotSlash allows a web site to set up on-demand distributed query result caching, which greatly reduces the database workload for read mostly databases, and thus increases the request rate supported at a DotSlash-enabled web site. The novelty of our work is that our query result caching is on demand, and operated based on load conditions. The caching remains inactive as long as the load is normal, but is activated once the load is heavy. This approach offers good data consistency during normal load situations, and good scalability with relaxed data consistency for heavy load periods. We have built a prototype system for the widely used LAMP configuration, and evaluated our system using the RUBBoS bulletin board benchmark. Experiments show that a DotSlash-enhanced web site can improve the maximum request rate supported by a factor of 5 using 8 rescue servers for the RUBBoS submission mix, and by a factor of 10 using 15 rescue servers for the RUBBoS read-only mix

Columbia University Academic Commons

Living Knowledge

Author: Baldry Anthony
Dutta Biswanath
Giunchiglia Fausto
Maltese Vincenzo
Publication venue: Università degli Studi di Trento
Publication date: 01/03/2012
Field of study

Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following

Unitn-eprints Research

Stochastic Query Covering for Fast Approximate Document Retrieval

Author: Anagnostopoulos Aristidis
Becchetti Luca
Ida Mele
Ilaria Bordino
Leonardi Stefano
Piotr Sankowski
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios

Archivio della ricerca- Università di Roma La Sapienza

MPG.PuRe

Adaptive time-to-live strategies for query result caching in web search engines

Author: Alici S.
Altingovde I.S.
Barla Cambazoglu B.
Ozcan R.
Ulusoy O.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

An important research problem that has recently started to receive attention is the freshness issue in search engine result caches. In the current techniques in literature, the cached search result pages are associated with a fixed time-to-live (TTL) value in order to bound the staleness of search results presented to the users, potentially as part of a more complex cache refresh or invalidation mechanism. In this paper, we propose techniques where the TTL values are set in an adaptive manner, on a per-query basis. Our results show that the proposed techniques reduce the fraction of stale results served by the cache and also decrease the fraction of redundant query evaluations on the search engine backend compared to a strategy using a fixed TTL value for all queries. © 2012 Springer-Verlag Berlin Heidelberg

Bilkent University Institutional Repository

Efficiently correlating complex events over live and archived data streams

Author: Merve Soner
Nesime Tatbul
Nihal Dindar
Peter M. Fischer
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Correlating complex events over live and archived data streams, which we call Pattern Correlation Queries (PCQs), provides many benefits for domains which need real-time forecasting of events or identification of causal dependencies, while handling data at high rates and in massive amounts, like in financial or medical settings. Existing work has focused either on complex event processing over a single type of stream source (i.e., either live or archived), or on simple stream correlation queries (e.g., live events trigerring a database lookup). In this paper, we specifically focus on recency-based PCQs and provide clear, useful, and optimizable semantics for them. PCQs raise a number of challenges in optimizing data management and query processing, which we address in the setting of the DejaVu complex event processing system. More specifically, we propose three complementary optimizations including recent in-put buffering, query result caching, and join source ordering. Fur-thermore, we capture the relevant query processing tradeoffs in a cost model. An extensive performance study on synthetic and real-life data sets not only validates this cost model, but also shows that our optimizations are very effective, achieving more than two orders magnitude throughput improvement and much better scala-bility compared to a conventional approach

CiteSeerX

OPUS Augsburg

Crossref

Vectorwise: Beyond Column Stores

Author: Boncz P.A.
Zukowski M.
Publication venue
Publication date: 01/01/2012
Field of study

textabstractThis paper tells the story of Vectorwise, a high-performance analytical database system, from multiple perspectives: its history from academic project to commercial product, the evolution of its technical architecture, customer reactions to the product and its future research and development roadmap. One take-away from this story is that the novelty in Vectorwise is much more than just column-storage: it boasts many query processing innovations in its vectorized execution model, and an adaptive mixed row/column data storage model with indexing support tailored to analytical workloads. Another one is that there is a long road from research prototype to commercial product, though database research continues to achieve a strong innovative inﬂuence on product development

VU Research Portal

CWI's Institutional Repository