Search CORE

256 research outputs found

Efficient Diversification of Web Search Results

Author: Capannini Gabriele
Nardini Franco Maria
Perego Raffaele
Silvestri Fabrizio
Publication venue
Publication date: 01/01/2011
Field of study

In this paper we analyze the efficiency of various search results diversification methods. While efficacy of diversification approaches has been deeply investigated in the past, response time and scalability issues have been rarely addressed. A unified framework for studying performance and feasibility of result diversification solutions is thus proposed. First we define a new methodology for detecting when, and how, query results need to be diversified. To this purpose, we rely on the concept of "query refinement" to estimate the probability of a query to be ambiguous. Then, relying on this novel ambiguity detection method, we deploy and compare on a standard test set, three different diversification methods: IASelect, xQuAD, and OptSelect. While the first two are recent state-of-the-art proposals, the latter is an original algorithm introduced in this paper. We evaluate both the efficiency and the effectiveness of our approach against its competitors by using the standard TREC Web diversification track testbed. Results shown that OptSelect is able to run two orders of magnitude faster than the two other state-of-the-art approaches and to obtain comparable figures in diversification effectiveness.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Combining implicit and explicit topic representations for result diversification

Author: He J. (Jiyin)
Hollink V. (Vera)
Vries A.P. (Arjen) de
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2012
Field of study

Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., from retrieved documents, and externally, e.g., from Web resources such as query logs. Internally modeled subtopics are often implicitly represented, e.g., as latent topics, while externally modeled subtopics are often explicitly represented, e.g., as reformulated queries. We propose a framework that: i) combines both implicitly and explicitly represented subtopics; and ii) allows flexible combination of multiple external resources in a transparent and unified manner. Specifically, we use a random walk based approach to estimate the similarities of the explicit subtopics mined from a number of heterogeneous resources: click logs, anchor text, and web n-grams. We then use these similarities to regularize the latent topics extracted from the top-ranked documents, i.e., the internal (implicit) subtopics. Empirical results show that regularization with explicit subtopics extracted from the right resource leads to improved diversification results, indicating that the proposed regularization with (explicit) external resources forms better (implicit) topic models. Click logs and anchor text are shown to be more effective resources than web n-grams under current experimental settings. Combining resources does not always lead to better results, but achieves a robust performance. This robustness is important for two reasons: it cannot be predicted which resources will be most effective for a given query, and it is not yet known how to reliably determine the optimal model parameters for building implicit topic models

CiteSeerX

CWI's Institutional Repository

Interactive information retrieval

Author: Allan
Barry
Bates
Beaulieu
Beaulieu
Belkin
Belkin
Bhavnani
Blair
Borgman
Borgman
Brajnik
Broder
Buyukkokten
Byström
Campbell
Case
Chen
Cove
Crestani
Crouch
Downie
Dumais
Eastman
Efthimiadis
Ellis
Ellis
Fidel
Ford
Ford
Foster
Fox
Hansen
Harper
Hearst
Hearst
Hearst
Heinström
Hill
Ingwersen
Ingwersen
Jansen
Jansen
Jones
Jones
Kang
Kelly
Kelly
Kim
Konstan
Kruschwitz
Kuhlthau
Legg
Lin
Lin
Lorigo
Lynch
López-Ostenero
Maña-López
Niemi
Norman
Over
Pirkola
Pu
Radev
Reid
Reid
Riedl
Rieh
Robertson
Rosenfeld
Roussinov
Ruthven
Ruthven
Savolainen
Shipman
Shneiderman
Sihvonen
Slone
Smeaton
Spink
Spink
Spink
Spink
Spink
Spink
Spärck Jones
Spärck Jones
Sweeney
Tombros
Tombros
Toms
Topi
Topi
Vakkari
Vakkari
Vakkari
Vakkari
van der Eijk
Vechtomova
Voorhees
White
White
White
White
Wiesman
Wu
Xie
Publication venue: 'Wiley'
Publication date: 01/11/2008
Field of study

Crossref

University of Strathclyde Institutional Repository

Acquiring symbolic design optimization problem reformulation knowledge: On computable relationships between design syntax and semantics

Author: Sarkar Somwrita
Publication venue: Faculty of Architecture, Design and Planning
Publication date: 01/01/2009
Field of study

This thesis presents a computational method for the inductive inference of explicit and implicit semantic design knowledge from the symbolic-mathematical syntax of design formulations using an unsupervised pattern recognition and extraction approach. Existing research shows that AI / machine learning based design computation approaches either require high levels of knowledge engineering or large training databases to acquire problem reformulation knowledge. The method presented in this thesis addresses these methodological limitations. The thesis develops, tests, and evaluates ways in which the method may be employed for design problem reformulation. The method is based on the linear algebra based factorization method Singular Value Decomposition (SVD), dimensionality reduction and similarity measurement through unsupervised clustering. The method calculates linear approximations of the associative patterns of symbol cooccurrences in a design problem representation to infer induced coupling strengths between variables, constraints and system components. Unsupervised clustering of these approximations is used to identify useful reformulations. These two components of the method automate a range of reformulation tasks that have traditionally required different solution algorithms. Example reformulation tasks that it performs include selection of linked design variables, parameters and constraints, design decomposition, modularity and integrative systems analysis, heuristically aiding design “case” identification, topology modeling and layout planning. The relationship between the syntax of design representation and the encoded semantic meaning is an open design theory research question. Based on the results of the method, the thesis presents a set of theoretical postulates on computable relationships between design syntax and semantics. The postulates relate the performance of the method with empirical findings and theoretical insights provided by cognitive neuroscience and cognitive science on how the human mind engages in symbol processing and the resulting capacities inherent in symbolic representational systems to encode “meaning”. The performance of the method suggests that semantic “meaning” is a higher order, global phenomenon that lies distributed in the design representation in explicit and implicit ways. A one-to-one local mapping between a design symbol and its meaning, a largely prevalent approach adopted by many AI and learning algorithms, may not be sufficient to capture and represent this meaning. By changing the theoretical standpoint on how a “symbol” is defined in design representations, it was possible to use a simple set of mathematical ideas to perform unsupervised inductive inference of knowledge in a knowledge-lean and training-lean manner, for a knowledge domain that traditionally relies on “giving” the system complex design domain and task knowledge for performing the same set of tasks

Sydney eScholarship

Supporting Source Code Search with Context-Aware and Semantics-Driven Query Reformulation

Author: Rahman Mohammad Masudur 1985-
Publication venue: 'University of Saskatchewan Library'
Publication date: 04/10/2019
Field of study

Software bugs and failures cost trillions of dollars every year, and could even lead to deadly accidents (e.g., Therac-25 accident). During maintenance, software developers fix numerous bugs and implement hundreds of new features by making necessary changes to the existing software code. Once an issue report (e.g., bug report, change request) is assigned to a developer, she chooses a few important keywords from the report as a search query, and then attempts to find out the exact locations in the software code that need to be either repaired or enhanced. As a part of this maintenance, developers also often select ad hoc queries on the fly, and attempt to locate the reusable code from the Internet that could assist them either in bug fixing or in feature implementation. Unfortunately, even the experienced developers often fail to construct the right search queries. Even if the developers come up with a few ad hoc queries, most of them require frequent modifications which cost significant development time and efforts. Thus, construction of an appropriate query for localizing the software bugs, programming concepts or even the reusable code is a major challenge. In this thesis, we overcome this query construction challenge with six studies, and develop a novel, effective code search solution (BugDoctor) that assists the developers in localizing the software code of interest (e.g., bugs, concepts and reusable code) during software maintenance. In particular, we reformulate a given search query (1) by designing novel keyword selection algorithms (e.g., CodeRank) that outperform the traditional alternatives (e.g., TF-IDF), (2) by leveraging the bug report quality paradigm and source document structures which were previously overlooked and (3) by exploiting the crowd knowledge and word semantics derived from Stack Overflow Q&A site, which were previously untapped. Our experiment using 5000+ search queries (bug reports, change requests, and ad hoc queries) suggests that our proposed approach can improve the given queries significantly through automated query reformulations. Comparison with 10+ existing studies on bug localization, concept location and Internet-scale code search suggests that our approach can outperform the state-of-the-art approaches with a significant margin

University of Saskatchewan Research Archive

Recommended from our members

A Proportionality-based Approach to Search Result Diversification

Author: Dang Van Bac
Publication venue: ScholarWorks@UMass Amherst
Publication date: 29/08/2014
Field of study

Search result diversification addresses the problem of queries with unclear information needs. The aim of using diversification techniques is to find a ranking of documents that covers multiple possible interpretations, aspects, or topics for a given query. By explicitly providing diversity in search results, this approach can increase the likelihood that users will find documents relevant to their specific intent, thereby improving effectiveness. This dissertation introduces a new perspective on diversity: diversity by proportionality. We consider a result list more diverse, with respect to some set of topics related to the query, when the ratio between the number of relevant documents it provides for each of these topics matches more closely with the topic popularity distribution. Consequently, we derive an effectiveness measure based on proportionality and propose a new framework for optimizing proportionality in search results, which we show to be more effective than existing techniques. Diversification would be impractical without the ability to automatically infer the set of topics associated with the user queries. Therefore, we study cluster-based techniques for generating these topics from publicly available data sources. Based on the challenges that we observe with topic generation, we present a simplified term-based representation for query topics. Specifically, we propose to identify for each query a single set of terms that describes its topics. This set is provided to a diversification technique which in effect treats each of the terms as a topic to determine coverage in the search results. We call this approach term level diversification and we show that it can promote diversity with respect to the topics underlying the input terms. This simplifies the task of finding a set of query topics, which has proven difficult, to finding only a set of terms. We also present a technique as well as several data sources for generating these terms effectively

ScholarWorks@UMass Amherst

Mining information interaction behavior:Academic papers and enterprise emails

Author: Li X.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications