9 research outputs found

    Tracking Events in Social Media

    Get PDF
    Tracking topical events in social media streams, such as Twitter, provides a means for users to keep up-to-date on topics of interest to them. This tracking may last a period of days, or even weeks. These events and topics might be provided by users explicitly, or generated for users from selected news articles. Push notification from social media provides a method to push the updates directly to the users on their mobile devices or desktops. In this thesis, we start with a lexical comparison between carefully edited prose and social media posts, providing an improved understanding of word usage within social media. Compared with carefully edited prose, such as news articles and Wikipedia articles, the language of social media is informal in the extreme. By using word embeddings, we identify words whose usage differs greatly between a Wikipedia corpus and a Twitter corpus. Following from this work, we explore a general method for developing succinct queries, reflecting the topic of a given news article, for the purpose of tracking the associated news event within a social media stream. A series of probe queries are generated from an initial set of candidate keywords extracted from the article. By analyzing the results of these probes, we rank and trim the candidate set to create a succinct query. The method can also be used for linking and searching among different collections. Given a query for topical events, push notification to users directly from social media streams provides a method for them to keep up-to-date on topics of personal interest. We determine that the key to effective notification lies in controlling of update volume, by establishing and maintaining appropriate thresholds for pushing updates. We explore and evaluate multiple threshold setting strategies. Push notifications should be relevant to the personal interest, and timely, with pushes occurring as soon as after the actual event occurrence as possible and novel for providing non-duplicate information. An analysis of existing evaluation metrics for push notification reflects different assumptions regarding user requirements. This analysis leads to a framework that places different weights and penalties on different behaviours and can guide the future development of a family of evaluation metrics that more accurately models user needs. Throughout the thesis, rank similarity measures are applied to compare rankings generated by various experiments. As a final component, we develop a family of rank similarity metrics based on maximized effectiveness difference, each derived from a traditional information retrieval evaluation measure. Computing this maximized effectiveness difference (MED) requires the solution of an optimization problem that varies in difficulty, depending on the associated measure. We present solutions for several standard effectiveness measures, including nDCG, MAP, and ERR. Through experimental validation, we show that MED reveals meaningful differences between retrieval runs. Mathematically, MED is a metric, regardless of the associated measure. Prior work has established a number of other desiderata for rank similarity in the context of search, and we demonstrate that MED satisfies these requirements. Unlike previous proposals, MED allows us to directly translate assumptions about user behavior from any established effectiveness measure to create a corresponding rank similarity measure. In addition, MED cleanly accommodates partial relevance judgments, and if complete relevance information is available, it reduces to a simple difference between effectiveness values

    News vertical search using user-generated content

    Get PDF
    The thesis investigates how content produced by end-users on the World Wide Web — referred to as user-generated content — can enhance the news vertical aspect of a universal Web search engine, such that news-related queries can be satisfied more accurately, comprehensively and in a more timely manner. We propose a news search framework to describe the news vertical aspect of a universal web search engine. This framework is comprised of four components, each providing a different piece of functionality. The Top Events Identification component identifies the most important events that are happening at any given moment using discussion in user-generated content streams. The News Query Classification component classifies incoming queries as news-related or not in real-time. The Ranking News-Related Content component finds and ranks relevant content for news-related user queries from multiple streams of news and user-generated content. Finally, the News-Related Content Integration component merges the previously ranked content for the user query into theWeb search ranking. In this thesis, we argue that user-generated content can be leveraged in one or more of these components to better satisfy news-related user queries. Potential enhancements include the faster identification of news queries relating to breaking news events, more accurate classification of news-related queries, increased coverage of the events searched for by the user or increased freshness in the results returned. Approaches to tackle each of the four components of the news search framework are proposed, which aim to leverage user-generated content. Together, these approaches form the news vertical component of a universal Web search engine. Each approach proposed for a component is thoroughly evaluated using one or more datasets developed for that component. Conclusions are derived concerning whether the use of user-generated content enhances the component in question using an appropriate measure, namely: effectiveness when ranking events by their current importance/newsworthiness for the Top Events Identification component; classification accuracy over different types of query for the News Query Classification component; relevance of the documents returned for the Ranking News-Related Content component; and end-user preference for rankings integrating user-generated content in comparison to the unalteredWeb search ranking for the News-Related Content Integration component. Analysis of the proposed approaches themselves, the effective settings for the deployment of those approaches and insights into their behaviour are also discussed. In particular, the evaluation of the Top Events Identification component examines how effectively events — represented by newswire articles — can be ranked by their importance using two different streams of user-generated content, namely blog posts and Twitter tweets. Evaluation of the proposed approaches for this component indicates that blog posts are an effective source of evidence to use when ranking events and that these approaches achieve state-of-the-art effectiveness. Using the same approaches instead driven by a stream of tweets, provide a story ranking performance that is significantly more effective than random, but is not consistent across all of the datasets and approaches tested. Insights are provided into the reasons for this with regard to the transient nature of discussion in Twitter. Through the evaluation of the News Query Classification component, we show that the use of timely features extracted from different news and user-generated content sources can increase the accuracy of news query classification over relying upon newswire provider streams alone. Evidence also suggests that the usefulness of the user-generated content sources varies as news events mature, with some sources becoming more influential over time as new content is published, leading to an upward trend in classification accuracy. The Ranking News-Related Content component evaluation investigates how to effectively rank content from the blogosphere and Twitter for news-related user queries. Of the approaches tested, we show that learning to rank approaches using features specific to blog posts/tweets lead to state-of-the-art ranking effectiveness under real-time constraints. Finally this thesis demonstrates that the majority of end-users prefer rankings integrated with usergenerated content for news-related queries to rankings containing only Web search results or integrated with only newswire articles. Of the user-generated content sources tested, the most popular source is shown to be Twitter, particularly for queries relating to breaking events. The central contributions of this thesis are the introduction of a news search framework, the approaches to tackle each of the four components of the framework that integrate user-generated content and their subsequent evaluation in a simulated real-time setting. This thesis draws insights from a broad range of experiments spanning the entire search process for news-related queries. The experiments reported in this thesis demonstrate the potential and scope for enhancements that can be brought about by the leverage of user-generated content for real-time news search and related applications

    News vertical search using user-generated content

    Full text link

    Filtering News from Document Streams: Evaluation Aspects and Modeled Stream Utility

    Get PDF
    Events like hurricanes, earthquakes, or accidents can impact a large number of people. Not only are people in the immediate vicinity of the event affected, but concerns about their well-being are shared by the local government and well-wishers across the world. The latest information about news events could be of use to government and aid agencies in order to make informed decisions on providing necessary support, security and relief. The general public avails of news updates via dedicated news feeds or broadcasts, and lately, via social media services like Facebook or Twitter. Retrieving the latest information about newsworthy events from the world-wide web is thus of importance to a large section of society. As new content on a multitude of topics is continuously being published on the web, specific event related information needs to be filtered from the resulting stream of documents. We present in this thesis, a user-centric evaluation measure for evaluating systems that filter news related information from document streams. Our proposed evaluation measure, Modeled Stream Utility (MSU), models users accessing information from a stream of sentences produced by a news update filtering system. The user model allows for simulating a large number of users with different characteristic stream browsing behavior. Through simulation, MSU estimates the utility of a system for an average user browsing a stream of sentences. Our results show that system performance is sensitive to a user population's stream browsing behavior and that existing evaluation metrics correspond to very specific types of user behavior. To evaluate systems that filter sentences from a document stream, we need a set of judged sentences. This judged set is a subset of all the sentences returned by all systems, and is typically constructed by pooling together the highest quality sentences, as determined by respective system assigned scores for each sentence. Sentences in the pool are manually assessed and the resulting set of judged sentences is then used to compute system performance metrics. In this thesis, we investigate the effect of including duplicates of judged sentences, into the judged set, on system performance evaluation. We also develop an alternative pooling methodology, that given the MSU user model, selects sentences for pooling based on the probability of a sentences being read by modeled users. Our research lays the foundation for interesting future work for utilizing user-models in different aspects of evaluation of stream filtering systems. The MSU measure enables incorporation of different user models. Furthermore, the applicability of MSU could be extended through calibration based on user behavior

    Explicit web search result diversification

    Get PDF
    Queries submitted to a web search engine are typically short and often ambiguous. With the enormous size of the Web, a misunderstanding of the information need underlying an ambiguous query can misguide the search engine, ultimately leading the user to abandon the originally submitted query. In order to overcome this problem, a sensible approach is to diversify the documents retrieved for the user's query. As a result, the likelihood that at least one of these documents will satisfy the user's actual information need is increased. In this thesis, we argue that an ambiguous query should be seen as representing not one, but multiple information needs. Based upon this premise, we propose xQuAD---Explicit Query Aspect Diversification, a novel probabilistic framework for search result diversification. In particular, the xQuAD framework naturally models several dimensions of the search result diversification problem in a principled yet practical manner. To this end, the framework represents the possible information needs underlying a query as a set of keyword-based sub-queries. Moreover, xQuAD accounts for the overall coverage of each retrieved document with respect to the identified sub-queries, so as to rank highly diverse documents first. In addition, it accounts for how well each sub-query is covered by the other retrieved documents, so as to promote novelty---and hence penalise redundancy---in the ranking. The framework also models the importance of each of the identified sub-queries, so as to appropriately cater for the interests of the user population when diversifying the retrieved documents. Finally, since not all queries are equally ambiguous, the xQuAD framework caters for the ambiguity level of different queries, so as to appropriately trade-off relevance for diversity on a per-query basis. The xQuAD framework is general and can be used to instantiate several diversification models, including the most prominent models described in the literature. In particular, within xQuAD, each of the aforementioned dimensions of the search result diversification problem can be tackled in a variety of ways. In this thesis, as additional contributions besides the xQuAD framework, we introduce novel machine learning approaches for addressing each of these dimensions. These include a learning to rank approach for identifying effective sub-queries as query suggestions mined from a query log, an intent-aware approach for choosing the ranking models most likely to be effective for estimating the coverage and novelty of multiple documents with respect to a sub-query, and a selective approach for automatically predicting how much to diversify the documents retrieved for each individual query. In addition, we perform the first empirical analysis of the role of novelty as a diversification strategy for web search. As demonstrated throughout this thesis, the principles underlying the xQuAD framework are general, sound, and effective. In particular, to validate the contributions of this thesis, we thoroughly assess the effectiveness of xQuAD under the standard experimentation paradigm provided by the diversity task of the TREC 2009, 2010, and 2011 Web tracks. The results of this investigation demonstrate the effectiveness of our proposed framework. Indeed, xQuAD attains consistent and significant improvements in comparison to the most effective diversification approaches in the literature, and across a range of experimental conditions, comprising multiple input rankings, multiple sub-query generation and coverage estimation mechanisms, as well as queries with multiple levels of ambiguity. Altogether, these results corroborate the state-of-the-art diversification performance of xQuAD

    A semi-automated FAQ retrieval system for HIV/AIDS

    Get PDF
    This thesis describes a semi-automated FAQ retrieval system that can be queried by users through short text messages on low-end mobile phones to provide answers on HIV/AIDS related queries. First we address the issue of result presentation on low-end mobile phones by proposing an iterative interaction retrieval strategy where the user engages with the FAQ retrieval system in the question answering process. At each iteration, the system returns only one question-answer pair to the user and the iterative process terminates after the user's information need has been satisfied. Since the proposed system is iterative, this thesis attempts to reduce the number of iterations (search length) between the users and the system so that users do not abandon the search process before their information need has been satisfied. Moreover, we conducted a user study to determine the number of iterations that users are willing to tolerate before abandoning the iterative search process. We subsequently used the bad abandonment statistics from this study to develop an evaluation measure for estimating the probability that any random user will be satisfied when using our FAQ retrieval system. In addition, we used a query log and its click-through data to address three main FAQ document collection deficiency problems in order to improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. Conclusions are derived concerning whether we can reduce the rate at which users abandon their search before their information need has been satisfied by using information from previous searches to: Address the term mismatch problem between the users' SMS queries and the relevant FAQ documents in the collection; to selectively rank the FAQ document according to how often they have been previously identified as relevant by users for a particular query term; and to identify those queries that do not have a relevant FAQ document in the collection. In particular, we proposed a novel template-based approach that uses queries from a query log for which the true relevant FAQ documents are known to enrich the FAQ documents with additional terms in order to alleviate the term mismatch problem. These terms are added as a separate field in a field-based model using two different proposed enrichment strategies, namely the Term Frequency and the Term Occurrence strategies. This thesis thoroughly investigates the effectiveness of the aforementioned FAQ document enrichment strategies using three different field-based models. Our findings suggest that we can improve the overall recall and the probability that any random user will be satisfied by enriching the FAQ documents with additional terms from queries in our query log. Moreover, our investigation suggests that it is important to use an FAQ document enrichment strategy that takes into consideration the number of times a term occurs in the query when enriching the FAQ documents. We subsequently show that our proposed enrichment approach for alleviating the term mismatch problem generalise well on other datasets. Through the evaluation of our proposed approach for selectively ranking the FAQ documents, we show that we can improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system by incorporating the click popularity score of a query term t on an FAQ document d into the scoring and ranking process. Our results generalised well on a new dataset. However, when we deploy the click popularity score of a query term t on an FAQ document d on an enriched FAQ document collection, we saw a decrease in the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. Furthermore, we used our query log to build a binary classifier for detecting those queries that do not have a relevant FAQ document in the collection (Missing Content Queries (MCQs))). Before building such a classifier, we empirically evaluated several feature sets in order to determine the best combination of features for building a model that yields the best classification accuracy in identifying the MCQs and the non-MCQs. Using a different dataset, we show that we can improve the overall retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system by deploying a MCQs detection subsystem in our FAQ retrieval system to filter out the MCQs. Finally, this thesis demonstrates that correcting spelling errors can help improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. We tested our FAQ retrieval system with two different testing sets, one containing the original SMS queries and the other containing the SMS queries which were manually corrected for spelling errors. Our results show a significant improvement in the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system

    Pretrained Transformers for Text Ranking: BERT and Beyond

    Get PDF
    The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading

    Supervised extractive summarisation of news events

    Get PDF
    This thesis investigates whether the summarisation of news-worthy events can be improved by using evidence about entities (i.e.\ people, places, and organisations) involved in the events. More effective event summaries, that better assist people with their news-based information access requirements, can help to reduce information overload in today's 24-hour news culture. Summaries are based on sentences extracted verbatim from news articles about the events. Within a supervised machine learning framework, we propose a series of entity-focused event summarisation features. Computed over multiple news articles discussing a given event, such entity-focused evidence estimates: the importance of entities within events; the significance of interactions between entities within events; and the topical relevance of entities to events. The statement of this research work is that augmenting supervised summarisation models, which are trained on discriminative multi-document newswire summarisation features, with evidence about the named entities involved in the events, by integrating entity-focused event summarisation features, we will obtain more effective summaries of news-worthy events. The proposed entity-focused event summarisation features are thoroughly evaluated over two multi-document newswire summarisation scenarios. The first scenario is used to evaluate the retrospective event summarisation task, where the goal is to summarise an event to-date, based on a static set of news articles discussing the event. The second scenario is used to evaluate the temporal event summarisation task, where the goal is to summarise the changes in an ongoing event, based on a time-stamped stream of news articles discussing the event. The contributions of this thesis are two-fold. First, this thesis investigates the utility of entity-focused event evidence for identifying important and salient event summary sentences, and as a means to perform anti-redundancy filtering to control the volume of content emitted as a summary of an evolving event. Second, this thesis also investigates the validity of automatic summarisation evaluation metrics, the effectiveness of standard summarisation baselines, and the effective training of supervised machine learned summarisation models

    Design problems in crowdsourcing: improving the quality of crowd-based data collection

    Get PDF
    Text, images, and other types of information objects can be described in many ways. Having detailed metadata and various people's interpretations of the object helps in providing better access and use. While collecting novel descriptions is challenging, crowdsourcing is presenting new opportunities to do so. Large-scale human contributions open the door to latent information, subjective judgments, and other encoding of data that is otherwise difficult to infer algorithmically. However, such contributions are also subject to variance from the inconsistencies of human interpretation. This dissertation studies the problem of variance in crowdsourcing and investigates how it can be controlled both through post-collection modeling and better collection-time design decisions. Crowd-contributed data is affected by many inconsistencies that differ from automated processes: differences in attention, interpretation, skill, and engagement. The types of tasks that we require of humans are also more inherently abstract and more difficult to agree on. Particularly, qualitative or judgment-based tasks may be subjective, affected by contributor opinions and tastes. Approaches to understanding contribution variance and improve data quality are studied in three spaces. First, post-collection modeling is pursued as a way of improving crowdsourced data quality, looking at whether factors including time, experience, and agreement with others provide indicators of contributions quality. Secondly, collection-time design problems are studied, comparing design manipulations for a controlled set of tasks. Since crowdsourcing is borne out of an interaction, not all crowdsourcing data corrections are posterior: it also matters how you collect that data. Finally, designing for subjective contexts is studied. Crowds are well-positioned to teach us about how information can be adapted to different person-specific needs, but treating subjective tasks similarly to other tasks results in unnecessary error. The primary contribution of this work is an understanding of crowd data quality improvements from non-adversarial perspectives: that is, focusing on sources of variance or errors beyond poor contributors. This includes findings that: 1. Collection interface design has a vital influence on the quality of collected data, and better guiding contributors can improve crowdsourced contribution quality without greatly raising the cost of collection nor impeding other quality control strategies. 2. Different interpretations of instructions threaten reliability and accuracy in crowdsourcing. This source of problems even affects trustworthy, attentive contributors. However, contributor quality can be inferred very early in an interaction for possible interventions. 3. Certain design choices improve the quality of contributions in tasks that call for them. Anchoring reduces contributor-specific error, training affirms or corrects contributors' understanding of the task, and performance feedback can motivate middling contributors to exercise more care. Particularly notable due to its simplicity, an intervention that forefronts instructions behind an explicitly dismissable window improves contribution quality greatly. 4. Paid crowdsourcing, often used for tasks with an assumed ground truth, can be also be applied in subjective contexts. It is promising for on-demand personalization contexts, such as recommendation without prior data for training. 5. Two approaches are found to improve the quality of tasks for subjective crowdsourcing. Matching contributors to a target person based on similarity is good for long-term interactions or for bootstrapping multi-target systems. Alternately, explicitly asking contributors to make sense of a target person and customize work for them is especially good for tasks with broad decision spaces and is more enjoyable to perform. The findings in this dissertation contribute to the crowdsourcing research space as well as providing practical improvements to crowd collection best practices
    corecore