196 research outputs found

    Deriving query suggestions for site search

    Get PDF
    Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T

    Enhanced Web Search Engines with Query-Concept Bipartite Graphs

    Get PDF
    With rapid growth of information on the Web, Web search engines have gained great momentum for exploiting valuable Web resources. Although keywords-based Web search engines provide relevant search results in response to users’ queries, future enhancement is still needed. Three important issues include (1) search results can be diverse because ambiguous keywords in queries can be interpreted to different meanings; (2) indentifying keywords in long queries is difficult for search engines; and (3) generating query-specific Web page summaries is desirable for Web search results’ previews. Based on clickthrough data, this thesis proposes a query-concept bipartite graph for representing queries’ relations, and applies the queries’ relations to applications such as (1) personalized query suggestions, (2) long queries Web searches and (3) query-specific Web page summarization. Experimental results show that query-concept bipartite graphs are useful for performance improvement for the three applications

    Discriminative Reranking for Spelling Correction

    Get PDF
    PACLIC 20 / Wuhan, China / 1-3 November, 200

    Discriminative reranking for spelling correction

    Get PDF
    Abstract. This paper proposes a novel approach to spelling correction. It reranks the output of an existing spelling corrector, Aspell. A discriminative model (Ranking SVM) is employed to improve upon the initial ranking, using additional features as evidence. These features are derived from stateof-the-art techniques in spelling correction, including edit distance, letter-based n-gram, phonetic similarity and noisy channel model. This paper also presents a new method to automatically extract training samples from the query log chain. The system outperforms the baseline Aspell greatly, as well as previous models and several off-the-shelf spelling correction systems (e.g. Microsoft Word 2003). The results on query chain pairs are comparable to that based on manually-annotated pairs, with 32.2%/32.6 % reduction in error rate, respectively. 1

    End-to-End Neural Ad-hoc Ranking with Kernel Pooling

    Full text link
    This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine's query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM's advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches

    Deep Character-Level Click-Through Rate Prediction for Sponsored Search

    Full text link
    Predicting the click-through rate of an advertisement is a critical component of online advertising platforms. In sponsored search, the click-through rate estimates the probability that a displayed advertisement is clicked by a user after she submits a query to the search engine. Commercial search engines typically rely on machine learning models trained with a large number of features to make such predictions. This is inevitably requires a lot of engineering efforts to define, compute, and select the appropriate features. In this paper, we propose two novel approaches (one working at character level and the other working at word level) that use deep convolutional neural networks to predict the click-through rate of a query-advertisement pair. Specially, the proposed architectures only consider the textual content appearing in a query-advertisement pair as input, and produce as output a click-through rate prediction. By comparing the character-level model with the word-level model, we show that language representation can be learnt from scratch at character level when trained on enough data. Through extensive experiments using billions of query-advertisement pairs of a popular commercial search engine, we demonstrate that both approaches significantly outperform a baseline model built on well-selected text features and a state-of-the-art word2vec-based approach. Finally, by combining the predictions of the deep models introduced in this study with the prediction of the model in production of the same commercial search engine, we significantly improve the accuracy and the calibration of the click-through rate prediction of the production system.Comment: SIGIR2017, 10 page
    • …
    corecore