Search CORE

2,176 research outputs found

The NASA Astrophysics Data System: Architecture

Author: Accomazzi A.
Eichhorn G.
Grant C. S.
Kurtz M. J.
Murray S. S.
Publication venue: 'EDP Sciences'
Publication date: 04/02/2000
Field of study

The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself. The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes. To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms. The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion. The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Efficient Update of Indexes for Dynamically Changing Web Documents

Author: Agarwal Ramesh
Lim Lipyeow
Padmanabhan Sriram
Vitter Jeffrey Scott
Wang Min
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/03/2011
Field of study

The original publication is available at www.springerlink.comRecent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchronized collection is not immediately searchable, because the keyword index is rebuilt from scratch less frequently than the collection can be refreshed. An inverted index is usually used to index documents crawled from the web. Complete index rebuild at high frequency is expensive. Previous work on incremental inverted index updates have been restricted to adding and removing documents. Updating the inverted index for previously indexed documents that have changed has not been addressed. In this paper, we propose an efficient method to update the inverted index for previously indexed documents whose contents have changed. Our method uses the idea of landmarks together with the diff algorithm to significantly reduce the number of postings in the inverted index that need to be updated. Our experiments verify that our landmark-diff method results in significant savings in the number of update operations on the inverted index

KU ScholarWorks

Efficient Spatial Keyword Search in Trajectory Databases

Author: Cong Gao
Lu Hua
Ooi Beng Chin
Zhang Dongxiang
Zhang Meihui
Publication venue
Publication date: 01/01/2012
Field of study

An increasing amount of trajectory data is being annotated with text descriptions to better capture the semantics associated with locations. The fusion of spatial locations and text descriptions in trajectories engenders a new type of top-

k

queries that take into account both aspects. Each trajectory in consideration consists of a sequence of geo-spatial locations associated with text descriptions. Given a user location

\lambda

and a keyword set

\psi

, a top-

k

query returns

k

trajectories whose text descriptions cover the keywords

\psi

and that have the shortest match distance. To the best of our knowledge, previous research on querying trajectory databases has focused on trajectory data without any text description, and no existing work has studied such kind of top-

k

queries on trajectories. This paper proposes one novel method for efficiently computing top-

k

trajectories. The method is developed based on a new hybrid index, cell-keyword conscious B

^+

-tree, denoted by \cellbtree, which enables us to exploit both text relevance and location proximity to facilitate efficient and effective query processing. The results of our extensive empirical studies with an implementation of the proposed algorithms on BerkeleyDB demonstrate that our proposed methods are capable of achieving excellent performance and good scalability.Comment: 12 page

arXiv.org e-Print Archive

Roskilde Universitet

VBN

Static Score Bucketing in Inverted Indexes

Author: Botev Chavdar
Eiron Nadav
Fontoura Marcus
Li Ning
Publication venue: 'SAGE Publications'
Publication date: 23/11/2005
Field of study

Maintaining strict static score order of inverted lists is a heuristic used by search engines to improve the quality of query results when the entire inverted lists cannot be processed. This heuristic, however, increases the cost of index generation and requires time-consuming index build algorithms. In this paper, we study a new index organization based on static score bucketing. We show that this new technique significantly improves in index build performance while having minimal impact on the quality of search results. We also provide upper bounds on the quality degradation and verify experimentally the benefits of the proposed approach

eCommons@Cornell

Hybrid index maintenance for growing text collections

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

Crossref

Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising

Author: Baeza-Yates Ricardo
Djuric Nemanja
Feng Andrew
Grbovic Mihajlo
Ordentlich Erik
Owens Gavin
Radosavljevic Vladan
Silvestri Fabrizio
Yang Lee
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Sponsored search represents a major source of revenue for web search engines. This popular advertising model brings a unique possibility for advertisers to target users' immediate intent communicated through a search query, usually by displaying their ads alongside organic search results for queries deemed relevant to their products or services. However, due to a large number of unique queries it is challenging for advertisers to identify all such relevant queries. For this reason search engines often provide a service of advanced matching, which automatically finds additional relevant queries for advertisers to bid on. We present a novel advanced matching approach based on the idea of semantic embeddings of queries and ads. The embeddings were learned using a large data set of user search sessions, consisting of search queries, clicked ads and search links, while utilizing contextual information such as dwell time and skipped ads. To address the large-scale nature of our problem, both in terms of data and vocabulary size, we propose a novel distributed algorithm for training of the embeddings. Finally, we present an approach for overcoming a cold-start problem associated with new ads and queries. We report results of editorial evaluation and online tests on actual search traffic. The results show that our approach significantly outperforms baselines in terms of relevance, coverage, and incremental revenue. Lastly, we open-source learned query embeddings to be used by researchers in computational advertising and related fields.Comment: 10 pages, 4 figures, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Ital

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Performance of query processing implementations in ranking-based text retrieval systems using inverted indices

Author: Aykanat C.
Cambazoglu B. B.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Cataloged from PDF version of article.Similarity calculations and document ranking form the computationally expensive parts of query processing in ranking-based text retrieval. In this work, for these calculations, 11 alternative implementation techniques are presented under four different categories, and their asymptotic time and space complexities are investigated. To our knowledge, six of these techniques are not discussed in any other publication before. Furthermore, analytical experiments are carried out on a 30 GB document collection to evaluate the practical performance of different implementations in terms of query processing time and space consumption. Advantages and disadvantages of each technique are illustrated under different querying scenarios, and several experiments that investigate the scalability of the implementations are presented. (C) 2005 Elsevier Ltd. All rights reserved

Bilkent University Institutional Repository