237 research outputs found
Effective summarisation for search engines
Users of information retrieval (IR) systems issue queries to find information in large collections of documents. Nearly all IR systems return answers in the form of a list of results, where each entry typically consists of the title of the underlying document, a link, and a short query-biased summary of a document's content called a snippet. As retrieval systems typically return a mixture of relevant and non-relevant answers, the role of the snippet is to guide users to identify those documents that are likely to be good answers and to ignore those that are less useful. This thesis focuses on techniques to improve the generation and evaluation of query-biased summaries for informational requests, where users typically need to inspect several documents to fulfil their information needs. We investigate the following issues: how users construct query-biased summaries, and how this compares with current automatic summarisation methods; how query expansion can be applied to sentence-level ranking to improve the quality of query-biased summaries; and, how to evaluate these summarisation approaches using sentence-level relevance data. First, through an eye tracking study, we investigate the way in which users select information from documents when they are asked to construct a query-biased summary in response to a given search request. Our analysis indicates that user behaviour differs from the assumptions of current state-of-the-art query-biased summarisation approaches. A major cause of difference resulted from vocabulary mismatch, a common IR problem. This thesis then examines query expansion techniques to improve the selection of candidate relevant sentences, and to reduce the vocabulary mismatch observed in the previous study. We employ a Cranfield-based methodology to quantitatively assess sentence ranking methods based on sentence-level relevance assessments available in the TREC Novelty track, in line with previous work. We study two aspects of sentence-level evaluation of this track. First, whether sentences that have been judged based on relevance, as in the TREC Novelty track, can also be considered to be indicative; that is, useful in terms of being part of a query-biased summary and guiding users to make correct document selections. By conducting a crowdsourcing experiment, we find that relevance and indicativeness agree around 73% of the time. Second, during our evaluations we discovered a bias that longer sentences were more likely to be judged as relevant. We then propose a novel evaluation of sentence ranking methods, which aims to isolate the sentence length bias. Using our enhanced evaluation method, we find that query expansion can effectively assist in the selection of short sentences. We conclude our investigation with a second study to examine the effectiveness of query expansion in query-biased summarisation methods to end users. Our results indicate that participants significantly tend to prefer query-biased summaries aided through expansion techniques approximately 60% of the time, for query-biased summaries comprised of short and middle length sentences. We suggest that our findings can inform the generation and display of query-biased summaries of IR systems such as search engines
Video browsing interfaces and applications: a review
We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video data—which, if presented in its raw format, is rather unwieldy and costly—have become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Using social semantic knowledge to improve annotations in personal photo collections
Instituto Politécnico de Lisboa (IPL) e Instituto Superior de Engenharia de Lisboa (ISEL)apoio concedido pela bolsa SPRH/PROTEC/67580/2010, que apoiou parcialmente este trabalh
Implicit feedback for interactive information retrieval
Searchers can find the construction of query statements for submission to Information Retrieval (IR) systems a problematic activity. These problems are confounded by uncertainty about the information they are searching for, or an unfamiliarity with the retrieval system being used or collection being searched. On the World Wide Web these problems are potentially more acute as searchers receive little or no training in how to search effectively. Relevance feedback (RF) techniques allow searchers to directly communicate what information is relevant and help them construct improved query statements. However, the techniques require explicit relevance assessments that intrude on searchers’ primary lines of activity and as such, searchers may be unwilling to provide this feedback. Implicit feedback systems are unobtrusive and make inferences of what is relevant based on searcher interaction. They gather information to better represent searcher needs whilst minimising the burden of explicitly reformulating queries or directly providing relevance information.
In this thesis I investigate implicit feedback techniques for interactive information retrieval. The techniques proposed aim to increase the quality and quantity of searcher interaction and use this interaction to infer searcher interests. I develop search interfaces that use representations of the top-ranked retrieved documents such as sentences and summaries to encourage a deeper examination of search results and drive the information seeking process.
Implicit feedback frameworks based on heuristic and probabilistic approaches are described. These frameworks use interaction to identify needs and estimate changes in these needs during a search. The evidence gathered is used to modify search queries and make new search decisions such as re-searching the document collection or restructuring already retrieved information. The term selection models from the frameworks and elsewhere are evaluated using a simulation-based evaluation methodology that allows different search scenarios to be modelled. Findings show that the probabilistic term selection model generated the most effective search queries and learned what was relevant in the shortest time.
Different versions of an interface that implements the probabilistic framework are evaluated to test it with human subjects and investigate how much control they want over its decisions. The experiment involved 48 subjects with different skill levels and search experience. The results show that searchers are happy to delegate responsibility to RF systems for relevance assessment (through implicit feedback), but not more severe search decisions such as formulating queries or selecting retrieval strategies. Systems that help searchers make these decisions are preferred to those that act directly on their behalf or await searcher action
Question-driven text summarization with extractive-abstractive frameworks
Automatic Text Summarisation (ATS) is becoming increasingly important due to the exponential growth of textual content on the Internet. The primary goal of an ATS system is to generate a condensed version of the key aspects in the input document while minimizing redundancy. ATS approaches are extractive, abstractive, or hybrid. The extractive approach selects the most important sentences in the input document(s) and then concatenates them to form the summary. The abstractive approach represents the input document(s) in an intermediate form and then constructs the summary using different sentences than the originals. The hybrid approach combines both the extractive and abstractive approaches. The query-based ATS selects the information that is most relevant to the initial search query. Question-driven ATS is a technique to produce concise and informative answers to specific questions using a document collection.
In this thesis, a novel hybrid framework is proposed for question-driven ATS taking advantage of extractive and abstractive summarisation mechanisms. The framework consists of complementary modules that work together to generate an effective summary: (1) discovering appropriate non-redundant sentences as plausible answers using a multi-hop question answering system based on a Convolutional Neural Network (CNN), multi-head attention mechanism and reasoning process; and (2) a novel paraphrasing Generative Adversarial Network (GAN) model based on transformers rewrites the extracted sentences in an abstractive setup. In addition, a fusing mechanism is proposed for compressing the sentence pairs selected by a next sentence prediction model in the paraphrased summary. Extensive experiments on various datasets are performed, and the results show the model can outperform many question-driven and query-based baseline methods. The proposed model is adaptable to generate summaries for the questions in the closed domain and open domain. An online summariser demo is designed based on the proposed model for the industry use to process the technical text
Enhanced web-based summary generation for search.
After a user types in a search query on a major search engine, they are presented with a number of search results. Each search result is made up of a title, brief text summary and a URL. It is then the user\u27s job to select documents for further review. Our research aims to improve the accuracy of users selecting relevant documents by improving the way these web pages are summarized. Improvements in accuracy will lead to time improvements and user experience improvements. We propose ReClose, a system for generating web document summaries. ReClose generates summary content through combining summarization techniques from query-biased and query-independent summary generation. Query-biased summaries generally provide query terms in context. Query-independent summaries focus on summarizing documents as a whole. Combining these summary techniques led to a 10% improvement in user decision making over Google generated summaries. Color-coded ReClose summaries provide keyword usage depth at a glance and also alert users to topic departures. Color-coding further enhanced ReClose results and led to a 20% improvement in user decision making over Google generated summaries. Many online documents include structure and multimedia of various forms such as tables, lists, forms and images. We propose to include this structure in web page summaries. We found that the expert user was insignificantly slowed in decision making while the majority of average users made decisions more quickly using summaries including structure without any decrease in decision accuracy. We additionally extended ReClose for use in summarizing large numbers of tweets in tracking flu outbreaks in social media. The resulting summaries have variable length and are effective at summarizing flu related trends. Users of the system obtained an accuracy of 0.86 labeling multi-tweet summaries. This showed that the basis of ReClose is effective outside of web documents and that variable length summaries can be more effective than fixed length. Overall the ReClose system provides unique summaries that contain more informative content than current search engines produce, highlight the results in a more meaningful way, and add structure when meaningful. The applications of ReClose extend far beyond search and have been demonstrated in summarizing pools of tweets
Recommended from our members
Towards Nootropia : a non-linear approach to adaptive document filtering
In recent years, it has become increasingly difficult for users to find relevant information within the accessible glut. Research in Information Filtering (IF) tackles this problem through a tailored representation of the user interests, a user profile. Traditionally, IF inherits techniques from the related and more well established domains of Information Retrieval and Text Categorisation. These include, linear profile representations that exclude term dependencies and may only effectively represent a single topic of interest, and linear learning algorithms that achieve a steady profile adaptation pace. We argue that these practices are not attuned to the dynamic nature of user interests. A user may be interested in more than one topic in parallel, and both frequent variations and occasional radical changes of interests are inevitable over time. With our experimental system "Nootropia", we achieve adaptive document filtering with a single, multi-topic user profile. A hierarchical term network that takes into account topical and lexical correlations between terms and identifies topic-subtopic relations between them, is used to represent a user's multiple topics of interest and distinguish between them. A series of non-linear document evaluation functions is then established on the hierarchical network. Experiments using a variation of TREC's routing subtask to test the ability of a single profile to represent two and three topics of interest, reveal the approach's superiority over a linear profile representation. Adaptation of this single, multi-topic profile to a variety of changes in the user interests, is achieved through a process of self-organisation that constantly readjusts the profile stucturally, in response to user feedback. We used virtual users and another variation of TREC's routing subtask to test the profile on two learning and two forgetting tasks. The results clearly indicate the profile's ability to adapt to both frequent variations and radical changes in user interests
- …