1,271 research outputs found

    Utilizing sub-topical structure of documents for information retrieval.

    Get PDF
    Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document

    Task-Oriented Query Reformulation with Reinforcement Learning

    Full text link
    Search engines play an important role in our everyday lives by assisting us in finding the information we need. When we input a complex query, however, results are often far from satisfactory. In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned. We train this neural network with reinforcement learning. The actions correspond to selecting terms to build a reformulated query, and the reward is the document recall. We evaluate our approach on three datasets against strong baselines and show a relative improvement of 5-20% in terms of recall. Furthermore, we present a simple method to estimate a conservative upper-bound performance of a model in a particular environment and verify that there is still large room for improvements.Comment: EMNLP 201

    Personalized Item Ranking from Implicit User Feedback: A Heterogeneous Information Network Approach

    Get PDF
    In today’s era of the digital world with information overload, generating personalized recommendations for the e-commerce users is a challenging and interesting problem. Recommendation of top-N items of interest to a user of e-commerce is highly challenging using binary implicit feedback. The training data is usually very sparse and have binary values capturing a user’s action or inaction. Due to the sparseness of data and lack of explicit user preferences, the recommendations generated by model-based and neighborhood-based approaches are not effective. Of late, network-based item recommendation methods, which utilize item related meta-information, are beginning to attract increasing attention for binary implicit feedback data. In this work, we propose a heterogeneous information network based recommendation model for personalized top-N recommendations using binary implicit feedback data. To utilize the potential of meta-information related to items, we utilize the concept of meta-path. To improve the effectiveness of the recommendations, the popularity of items and interest of users are leveraged simultaneously. Personalized weight learning of various meta-paths in the network is performed to determine the intrinsic interests of users from the binary implicit feedback data. To show the effectiveness, the proposed model is experimentally evaluated using the real-world dataset. Available at: https://aisel.aisnet.org/pajais/vol9/iss2/3

    Mobile content enrichment

    Full text link
    Delivering an effective mobile search service is challenging for many reasons. Certainly small-screen mobile handsets with limited text input capabilities do not make ideal search devices. In addition, the brevity of Mobile Internet content hampers effective indexing and limits retrieval opportunities. In this paper we focus on this indexing issue and describe an approach that leverages Web search engines as a source of content enrichment. We present an evaluation using a mobile news service that demonstrated significant improvements in search performance compared to a standard benchmark sys-tem

    Temporally Biased Search Result Snippets

    Get PDF
    The search engine result snippets are an important source of information for the user to obtain quick insights into the corresponding result documents. When the search terms are too general, like a person\u27s name or a company\u27s name, creating an appropriate snippet that effectively summarizes the document\u27s content can be challenging owing to multiple occurrences of the search term in the top ranked documents, without a simple means to select a subset of sentences containing them to form result snippet. In web pages classified as narratives and news articles, multiple references to explicit, implicit and relative temporal expressions can be found. Based on these expressions, the sentences can be ordered on a timeline. In this thesis, we propose the idea of generation of an alternate search results snippet, by exploiting these temporal expressions embedded within the pages, using a timeline map. Our method of snippets generation is mainly targeted at general search terms. At present, when the search terms are too general, the existing systems generate static snippets for resultant pages like displaying the first line. In our approach, we introduce an alternate method of extracting and selecting temporal data from these pages to adapt a snippet to be a more effective summary. Specifically, it selects and blends temporally interesting sentences. Using weighted kappa measure, we evaluate our approach by comparing snippets generated for multiple search terms based on existing systems and snippets generated by using our approach

    Promoting user engagement and learning in search tasks by effective document representation

    Get PDF
    Much research in information retrieval (IR) focuses on optimisation of the rank of relevant retrieval results for single shot ad hoc IR tasks. Relatively little research has been carried out on supporting and promoting user engagement within search tasks. We seek to improve user experience by use of enhanced document snippets to be presented during the search process to promote user engagement with retrieved information. The primary role of document snippets within search has traditionally been to indicate the potential relevance of retrieved items to the user’s information need. Beyond the relevance of an item, it is generally not possible to infer the contents of individual ranked results just by reading the current snippets. We hypothesise that the creation of richer document snippets and summaries, and effective presentation of this information to users will promote effective search and greater user engagement, and support emerging areas such as learning through search. We generate document summaries for a given query by extracting top relevant sentences from retrieved documents. Creation of these summaries goes beyond exist- ing snippet creation methods by comparing content between documents to take into account novelty when selecting content for inclusion in individual document sum- maries. Further, we investigate the readability of the generated summaries with the overall goal of generating snippets which not only help a user to identify document relevance, but are also designed to increase the user’s understanding and knowledge of a topic gained while inspecting the snippets. We perform a task-based user study to record the user’s interactions, search be- haviour and feedback to evaluate the effectiveness of our snippets using qualitative and quantitative measures. In our user study, we found that richer snippets generated in this work improved the user experience and topical knowledge, and helped users to learn about the topic effectively

    Supporting Source Code Search with Context-Aware and Semantics-Driven Query Reformulation

    Get PDF
    Software bugs and failures cost trillions of dollars every year, and could even lead to deadly accidents (e.g., Therac-25 accident). During maintenance, software developers fix numerous bugs and implement hundreds of new features by making necessary changes to the existing software code. Once an issue report (e.g., bug report, change request) is assigned to a developer, she chooses a few important keywords from the report as a search query, and then attempts to find out the exact locations in the software code that need to be either repaired or enhanced. As a part of this maintenance, developers also often select ad hoc queries on the fly, and attempt to locate the reusable code from the Internet that could assist them either in bug fixing or in feature implementation. Unfortunately, even the experienced developers often fail to construct the right search queries. Even if the developers come up with a few ad hoc queries, most of them require frequent modifications which cost significant development time and efforts. Thus, construction of an appropriate query for localizing the software bugs, programming concepts or even the reusable code is a major challenge. In this thesis, we overcome this query construction challenge with six studies, and develop a novel, effective code search solution (BugDoctor) that assists the developers in localizing the software code of interest (e.g., bugs, concepts and reusable code) during software maintenance. In particular, we reformulate a given search query (1) by designing novel keyword selection algorithms (e.g., CodeRank) that outperform the traditional alternatives (e.g., TF-IDF), (2) by leveraging the bug report quality paradigm and source document structures which were previously overlooked and (3) by exploiting the crowd knowledge and word semantics derived from Stack Overflow Q&A site, which were previously untapped. Our experiment using 5000+ search queries (bug reports, change requests, and ad hoc queries) suggests that our proposed approach can improve the given queries significantly through automated query reformulations. Comparison with 10+ existing studies on bug localization, concept location and Internet-scale code search suggests that our approach can outperform the state-of-the-art approaches with a significant margin
    corecore