12 research outputs found

    Ensemble clustering for result diversification

    Get PDF
    This paper describes the participation of the University of Twente in the Web track of TREC 2012. Our baseline approach uses the Mirex toolkit, an open source tool that sequantially scans all the documents. For result diversification, we experimented with improving the quality of clusters through ensemble clustering. We combined clusters obtained by different clustering methods (such as LDA and K-means) and clusters obtained by using different types of data (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based diversification and also better than a non-diversification run

    Efficient & Effective Selective Query Rewriting with Efficiency Predictions

    Get PDF
    To enhance effectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit effectiveness often have a negative impact on efficiency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both effectiveness and efficiency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the effectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the effectiveness of the search engine

    Precise Image Exploration With Cluster Analysis

    Get PDF
    Since the rise of digital multimedia in our present age, when looking for an image that closely matches their needs and preferences, the number of images a user must sort through has become more and more unmanageable. Even when searching for a narrow topic, it can be nearly impossible to find an image that meets a specific preference by going through all the possible images. To combat this growing problem, we describe an exploration system built on deep neural networks to empower the users to quickly sort through all the possible images by quickly narrowing down to their preferred images. By design, our exploration system goes around the need to match the user’s query directly to a small group of images to serve users images that would traditionally be too difficult to group together and match to a query. We propose to use deep metric learning and clustering to group the images, which we will see cleverly manages problems that hold back traditional neural networks in this problem—unseen image groups and shifting definitions

    Transferring Learning To Rank Models for Web Search

    Get PDF
    ABSTRACT Learning to rank techniques provide mechanisms for combining document feature values into learned models that produce effective rankings. However, issues concerning the transferability of learned models between different corpora or subsets of the same corpus are not yet well understood. For instance, is the importance of different feature sets consistent between subsets of a corpus, or whether a learned model obtained on a small subset of the corpus effectively transfer to the larger corpus? By formulating our experiments around two null hypotheses, in this work, we apply a full-factorial experiment design to empirically investigate these questions using the ClueWeb09 and ClueWeb12 corpora, combined with queries from the TREC Web track. Among other observations, our experiments reveal that ClueWeb09 remains an effective choice of training corpus for learning effective models for ClueWeb12, and also that the importance of query independent features varies among the ClueWeb09 and ClueWeb12 corpora. In doing so, this work contributes an important study into the transferability of learning to rank models, as well as empirically-derived best practices for effective retrieval on the ClueWeb12 corpus

    検索意図を考慮したナビゲーション支援システムに関する研究

    Get PDF
    筑波大学修士(情報学)学位論文 ・ 平成29年3月24日授与(37779号

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms
    corecore