44 research outputs found

    Tracking Events in Social Media

    Get PDF
    Tracking topical events in social media streams, such as Twitter, provides a means for users to keep up-to-date on topics of interest to them. This tracking may last a period of days, or even weeks. These events and topics might be provided by users explicitly, or generated for users from selected news articles. Push notification from social media provides a method to push the updates directly to the users on their mobile devices or desktops. In this thesis, we start with a lexical comparison between carefully edited prose and social media posts, providing an improved understanding of word usage within social media. Compared with carefully edited prose, such as news articles and Wikipedia articles, the language of social media is informal in the extreme. By using word embeddings, we identify words whose usage differs greatly between a Wikipedia corpus and a Twitter corpus. Following from this work, we explore a general method for developing succinct queries, reflecting the topic of a given news article, for the purpose of tracking the associated news event within a social media stream. A series of probe queries are generated from an initial set of candidate keywords extracted from the article. By analyzing the results of these probes, we rank and trim the candidate set to create a succinct query. The method can also be used for linking and searching among different collections. Given a query for topical events, push notification to users directly from social media streams provides a method for them to keep up-to-date on topics of personal interest. We determine that the key to effective notification lies in controlling of update volume, by establishing and maintaining appropriate thresholds for pushing updates. We explore and evaluate multiple threshold setting strategies. Push notifications should be relevant to the personal interest, and timely, with pushes occurring as soon as after the actual event occurrence as possible and novel for providing non-duplicate information. An analysis of existing evaluation metrics for push notification reflects different assumptions regarding user requirements. This analysis leads to a framework that places different weights and penalties on different behaviours and can guide the future development of a family of evaluation metrics that more accurately models user needs. Throughout the thesis, rank similarity measures are applied to compare rankings generated by various experiments. As a final component, we develop a family of rank similarity metrics based on maximized effectiveness difference, each derived from a traditional information retrieval evaluation measure. Computing this maximized effectiveness difference (MED) requires the solution of an optimization problem that varies in difficulty, depending on the associated measure. We present solutions for several standard effectiveness measures, including nDCG, MAP, and ERR. Through experimental validation, we show that MED reveals meaningful differences between retrieval runs. Mathematically, MED is a metric, regardless of the associated measure. Prior work has established a number of other desiderata for rank similarity in the context of search, and we demonstrate that MED satisfies these requirements. Unlike previous proposals, MED allows us to directly translate assumptions about user behavior from any established effectiveness measure to create a corresponding rank similarity measure. In addition, MED cleanly accommodates partial relevance judgments, and if complete relevance information is available, it reduces to a simple difference between effectiveness values

    DIR 2011: Dutch_Belgian Information Retrieval Workshop Amsterdam

    Get PDF

    Leveraging social relevance : using social networks to enhance literature access and microblog search

    Get PDF
    L'objectif principal d'un système de recherche d'information est de sélectionner les documents pertinents qui répondent au besoin en information exprimé par l'utilisateur à travers une requête. Depuis les années 1970-1980, divers modèles théoriques ont été proposés dans ce sens pour représenter les documents et les requêtes d'une part et les apparier d'autre part, indépendamment de tout utilisateur. Plus récemment, l'arrivée du Web 2.0 ou le Web social a remis en cause l'efficacité de ces modèles du fait qu'ils ignorent l'environnement dans lequel l'information se situe. En effet, l'utilisateur n'est plus un simple consommateur de l'information mais il participe également à sa production. Pour accélérer la production de l'information et améliorer la qualité de son travail, l'utilisateur échange de l'information avec son voisinage social dont il partage les mêmes centres d'intérêt. Il préfère généralement obtenir l'information d'un contact direct plutôt qu'à partir d'une source anonyme. Ainsi, l'utilisateur, influencé par son environnement socio-cultuel, donne autant d'importance à la proximité sociale de la ressource d'information autant qu'à la similarité des documents à sa requête. Dans le but de répondre à ces nouvelles attentes, la recherche d'information s'oriente vers l'implication de l'utilisateur et de sa composante sociale dans le processus de la recherche. Ainsi, le nouvel enjeu de la recherche d'information est de modéliser la pertinence compte tenu de la position sociale et de l'influence de sa communauté. Le second enjeu est d'apprendre à produire un ordre de pertinence qui traduise le mieux possible l'importance et l'autorité sociale. C'est dans ce cadre précis, que s'inscrit notre travail. Notre objectif est d'estimer une pertinence sociale en intégrant d'une part les caractéristiques sociales des ressources et d'autre part les mesures de pertinence basées sur les principes de la recherche d'information classique. Nous proposons dans cette thèse d'intégrer le réseau social d'information dans le processus de recherche d'information afin d'utiliser les relations sociales entre les acteurs sociaux comme une source d'évidence pour mesurer la pertinence d'un document en réponse à une requête. Deux modèles de recherche d'information sociale ont été proposés à des cadres applicatifs différents : la recherche d'information bibliographique et la recherche d'information dans les microblogs. Les importantes contributions de chaque modèle sont détaillées dans la suite. Un modèle social pour la recherche d'information bibliographique. Nous avons proposé un modèle générique de la recherche d'information sociale, déployé particulièrement pour l'accès aux ressources bibliographiques. Ce modèle représente les publications scientifiques au sein d'réseau social et évalue leur importance selon la position des auteurs dans le réseau. Comparativement aux approches précédentes, ce modèle intègre des nouvelles entités sociales représentées par les annotateurs et les annotations sociales. En plus des liens de coauteur, ce modèle exploite deux autres types de relations sociales : la citation et l'annotation sociale. Enfin, nous proposons de pondérer ces relations en tenant compte de la position des auteurs dans le réseau social et de leurs mutuelles collaborations. Un modèle social pour la recherche d'information dans les microblogs.} Nous avons proposé un modèle pour la recherche de tweets qui évalue la qualité des tweets selon deux contextes: le contexte social et le contexte temporel. Considérant cela, la qualité d'un tweet est estimé par l'importance sociale du blogueur correspondant. L'importance du blogueur est calculée par l'application de l'algorithme PageRank sur le réseau d'influence sociale. Dans ce même objectif, la qualité d'un tweet est évaluée selon sa date de publication. Les tweets soumis dans les périodes d'activité d'un terme de la requête sont alors caractérisés par une plus grande importance. Enfin, nous proposons d'intégrer l'importance sociale du blogueur et la magnitude temporelle avec les autres facteurs de pertinence en utilisant un modèle Bayésien.An information retrieval system aims at selecting relevant documents that meet user's information needs expressed with a textual query. For the years 1970-1980, various theoretical models have been proposed in this direction to represent, on the one hand, documents and queries and on the other hand to match information needs independently of the user. More recently, the arrival of Web 2.0, known also as the social Web, has questioned the effectiveness of these models since they ignore the environment in which the information is located. In fact, the user is no longer a simple consumer of information but also involved in its production. To accelerate the production of information and improve the quality of their work, users tend to exchange documents with their social neighborhood that shares the same interests. It is commonly preferred to obtain information from a direct contact rather than from an anonymous source. Thus, the user, under the influenced of his social environment, gives as much importance to the social prominence of the information as the textual similarity of documents at the query. In order to meet these new prospects, information retrieval is moving towards novel user centric approaches that take into account the social context within the retrieval process. Thus, the new challenge of an information retrieval system is to model the relevance with regards to the social position and the influence of individuals in their community. The second challenge is produce an accurate ranking of relevance that reflects as closely as possible the importance and the social authority of information producers. It is in this specific context that fits our work. Our goal is to estimate the social relevance of documents by integrating the social characteristics of resources as well as relevance metrics as defined in classical information retrieval field. We propose in this work to integrate the social information network in the retrieval process and exploit the social relations between social actors as a source of evidence to measure the relevance of a document in response to a query. Two social information retrieval models have been proposed in different application frameworks: literature access and microblog retrieval. The main contributions of each model are detailed in the following. A social information model for flexible literature access. We proposed a generic social information retrieval model for literature access. This model represents scientific papers within a social network and evaluates their importance according to the position of respective authors in the network. Compared to previous approaches, this model incorporates new social entities represented by annotators and social annotations (tags). In addition to co-authorships, this model includes two other types of social relationships: citation and social annotation. Finally, we propose to weight these relationships according to the position of authors in the social network and their mutual collaborations. A social model for information retrieval for microblog search. We proposed a microblog retrieval model that evaluates the quality of tweets in two contexts: the social context and temporal context. The quality of a tweet is estimated by the social importance of the corresponding blogger. In particular, blogger's importance is calculated by the applying PageRank algorithm on the network of social influence. With the same aim, the quality of a tweet is evaluated according to its date of publication. Tweets submitted in periods of activity of query terms are then characterized by a greater importance. Finally, we propose to integrate the social importance of blogger and the temporal magnitude tweets as well as other relevance factors using a Bayesian network model

    Re-ranking Real-time Web Tweets to Find Reliable and Influential Twitterers

    Get PDF
    Twitter is a powerful social media tool to share information on different topics around the world. Following different users/accounts is the most effective way to get information propagated in Twitter. Due to Twitter's limited searching and lack of navigation support, searching Twitter is not easy and requires effort to find reliable information. This thesis proposed a new methodology to rank tweets based on their authority with the goal of aiding users identifying influential Twitterers. This methodology, HIRKM rank, is influenced by PageRank, Alexa Rank, original tweet or a retweet and the use of hash tags to determine the authorisation of each tweet. This method is applied to rank TREC 2011 microblogging dataset which contains over 16 million tweets based on 50 predefined topics. The results are a list of tweets presented in a descending order based on their authorities which are relevant to the users search queries and will be evaluated using TREC’s official golden standard for the microblogging dataset

    Security and Privacy for Modern Wireless Communication Systems

    Get PDF
    The aim of this reprint focuses on the latest protocol research, software/hardware development and implementation, and system architecture design in addressing emerging security and privacy issues for modern wireless communication networks. Relevant topics include, but are not limited to, the following: deep-learning-based security and privacy design; covert communications; information-theoretical foundations for advanced security and privacy techniques; lightweight cryptography for power constrained networks; physical layer key generation; prototypes and testbeds for security and privacy solutions; encryption and decryption algorithm for low-latency constrained networks; security protocols for modern wireless communication networks; network intrusion detection; physical layer design with security consideration; anonymity in data transmission; vulnerabilities in security and privacy in modern wireless communication networks; challenges of security and privacy in node–edge–cloud computation; security and privacy design for low-power wide-area IoT networks; security and privacy design for vehicle networks; security and privacy design for underwater communications networks

    An Information Retrieval Test Collection for English SMS Conversations

    Get PDF
    Information retrieval research for informal conversational settings differs in important ways from the more traditional goal of document retrieval. The goal of this research is to build an information retrieval test collection from informal conversational messages and to demonstrate the use of that collection to compare the retrieval effectiveness of some information retrieval systems. The test collection is based on the Linguistic Data Consortium's collection of more than 8,000 English SMS (Short Message Service) conversations, which contain more than 120,000 individual messages. The collection is described, followed by a description of the processes for creating and collecting topics, performing relevance judgments, and establishing baseline results. The findings indicate that traditional approaches for building information retrieval test collections can reasonably be applied to preclustered SMS conversations, but that the process of creating relevance judgments is somewhat more challenging and thus the reliable detection of differences in system effectiveness is somewhat more complex

    Statistical models for the analysis of short user-generated documents: author identification for conversational documents

    Get PDF
    In recent years short user-generated documents have been gaining popularity on the Internet and attention in the research communities. This kind of documents are generated by users of the various online services: platforms for instant messaging communication, for real-time status posting, for discussing and for writing reviews. Each of these services allows users to generate written texts with particular properties and which might require specific algorithms for being analysed. In this dissertation we are presenting our work which aims at analysing this kind of documents. We conducted qualitative and quantitative studies to identify the properties that might allow for characterising them. We compared the properties of these documents with the properties of standard documents employed in the literature, such as newspaper articles, and defined a set of characteristics that are distinctive of the documents generated online. We also observed two classes within the online user-generated documents: the conversational documents and those involving group discussions. We later focused on the class of conversational documents, that are short and spontaneous. We created a novel collection of real conversational documents retrieved online (e.g. Internet Relay Chat) and distributed it as part of an international competition (PAN @ CLEF'12). The competition was about author characterisation, which is one of the possible studies of authorship attribution documented in the literature. Another field of study is authorship identification, that became our main topic of research. We approached the authorship identification problem in its closed-class variant. For each problem we employed documents from the collection we released and from a collection of Twitter messages, as representative of conversational or short user-generated documents. We proved the unsuitability of standard authorship identification techniques for conversational documents and proposed novel methods capable of reaching better accuracy rates. As opposed to standard methods that worked well only for few authors, the proposed technique allowed for reaching significant results even for hundreds of users

    News vertical search using user-generated content

    Get PDF
    The thesis investigates how content produced by end-users on the World Wide Web — referred to as user-generated content — can enhance the news vertical aspect of a universal Web search engine, such that news-related queries can be satisfied more accurately, comprehensively and in a more timely manner. We propose a news search framework to describe the news vertical aspect of a universal web search engine. This framework is comprised of four components, each providing a different piece of functionality. The Top Events Identification component identifies the most important events that are happening at any given moment using discussion in user-generated content streams. The News Query Classification component classifies incoming queries as news-related or not in real-time. The Ranking News-Related Content component finds and ranks relevant content for news-related user queries from multiple streams of news and user-generated content. Finally, the News-Related Content Integration component merges the previously ranked content for the user query into theWeb search ranking. In this thesis, we argue that user-generated content can be leveraged in one or more of these components to better satisfy news-related user queries. Potential enhancements include the faster identification of news queries relating to breaking news events, more accurate classification of news-related queries, increased coverage of the events searched for by the user or increased freshness in the results returned. Approaches to tackle each of the four components of the news search framework are proposed, which aim to leverage user-generated content. Together, these approaches form the news vertical component of a universal Web search engine. Each approach proposed for a component is thoroughly evaluated using one or more datasets developed for that component. Conclusions are derived concerning whether the use of user-generated content enhances the component in question using an appropriate measure, namely: effectiveness when ranking events by their current importance/newsworthiness for the Top Events Identification component; classification accuracy over different types of query for the News Query Classification component; relevance of the documents returned for the Ranking News-Related Content component; and end-user preference for rankings integrating user-generated content in comparison to the unalteredWeb search ranking for the News-Related Content Integration component. Analysis of the proposed approaches themselves, the effective settings for the deployment of those approaches and insights into their behaviour are also discussed. In particular, the evaluation of the Top Events Identification component examines how effectively events — represented by newswire articles — can be ranked by their importance using two different streams of user-generated content, namely blog posts and Twitter tweets. Evaluation of the proposed approaches for this component indicates that blog posts are an effective source of evidence to use when ranking events and that these approaches achieve state-of-the-art effectiveness. Using the same approaches instead driven by a stream of tweets, provide a story ranking performance that is significantly more effective than random, but is not consistent across all of the datasets and approaches tested. Insights are provided into the reasons for this with regard to the transient nature of discussion in Twitter. Through the evaluation of the News Query Classification component, we show that the use of timely features extracted from different news and user-generated content sources can increase the accuracy of news query classification over relying upon newswire provider streams alone. Evidence also suggests that the usefulness of the user-generated content sources varies as news events mature, with some sources becoming more influential over time as new content is published, leading to an upward trend in classification accuracy. The Ranking News-Related Content component evaluation investigates how to effectively rank content from the blogosphere and Twitter for news-related user queries. Of the approaches tested, we show that learning to rank approaches using features specific to blog posts/tweets lead to state-of-the-art ranking effectiveness under real-time constraints. Finally this thesis demonstrates that the majority of end-users prefer rankings integrated with usergenerated content for news-related queries to rankings containing only Web search results or integrated with only newswire articles. Of the user-generated content sources tested, the most popular source is shown to be Twitter, particularly for queries relating to breaking events. The central contributions of this thesis are the introduction of a news search framework, the approaches to tackle each of the four components of the framework that integrate user-generated content and their subsequent evaluation in a simulated real-time setting. This thesis draws insights from a broad range of experiments spanning the entire search process for news-related queries. The experiments reported in this thesis demonstrate the potential and scope for enhancements that can be brought about by the leverage of user-generated content for real-time news search and related applications

    Towards MANET-based Recommender Systems for Open Facilities

    Get PDF
    Nowadays, most recommender systems are based on a centralized architecture, which can cause crucial issues in terms of trust, privacy, dependability, and costs. In this paper, we propose a decentralized and distributed MANET-based (Mobile Ad-hoc NETwork) recommender system for open facilities. The system is based on mobile devices that collect sensor data about users locations to derive implicit ratings that are used for collaborative filtering recommendations. The mechanisms of deriving ratings and propagating them in a MANET network are discussed in detail. Finally, extensive experiments demonstrate the suitability of the approach in terms of different performance metrics. © 2021, The Author(s)

    Cross-Platform Question Answering in Social Networking Services

    Get PDF
    The last two decades have made the Internet a major source for knowledge seeking. Several platforms have been developed to find answers to one's questions such as search engines and online encyclopedias. The wide adoption of social networking services has pushed the possibilities even further by giving people the opportunity to stimulate the generation of answers that are not already present on the Internet. Some of these social media services are primarily community question answering (CQA) sites, while the others have a more general audience but can also be used to ask and answer questions. The choice of a particular platform (e.g., a CQA site, a microblogging service, or a search engine) by some user depends on several factors such as awareness of available resources and expectations from different platforms, and thus will sometimes be suboptimal. Hence, we introduce \emph{cross-platform question answering}, a framework that aims to improve our ability to satisfy complex information needs by returning answers from different platforms, including those where the question has not been originally asked. We propose to build this core capability by defining a general architecture for designing and implementing real-time services for answering naturally occurring questions. This architecture consists of four key components: (1) real-time detection of questions, (2) a set of platforms from which answers can be returned, (3) question processing by the selected answering systems, which optionally involves question transformation when questions are answered by services that enforce differing conventions from the original source, and (4) answer presentation, including ranking, merging, and deciding whether to return the answer. We demonstrate the feasibility of this general architecture by instantiating a restricted development version in which we collect the questions from one CQA website, one microblogging service or directly from the asker, and find answers from among some subset of those CQA and microblogging services. To enable the integration of new answering platforms in our architecture, we introduce a framework for automatic evaluation of their effectiveness
    corecore