69 research outputs found

    Leveraging social relevance : using social networks to enhance literature access and microblog search

    Get PDF
    L'objectif principal d'un système de recherche d'information est de sélectionner les documents pertinents qui répondent au besoin en information exprimé par l'utilisateur à travers une requête. Depuis les années 1970-1980, divers modèles théoriques ont été proposés dans ce sens pour représenter les documents et les requêtes d'une part et les apparier d'autre part, indépendamment de tout utilisateur. Plus récemment, l'arrivée du Web 2.0 ou le Web social a remis en cause l'efficacité de ces modèles du fait qu'ils ignorent l'environnement dans lequel l'information se situe. En effet, l'utilisateur n'est plus un simple consommateur de l'information mais il participe également à sa production. Pour accélérer la production de l'information et améliorer la qualité de son travail, l'utilisateur échange de l'information avec son voisinage social dont il partage les mêmes centres d'intérêt. Il préfère généralement obtenir l'information d'un contact direct plutôt qu'à partir d'une source anonyme. Ainsi, l'utilisateur, influencé par son environnement socio-cultuel, donne autant d'importance à la proximité sociale de la ressource d'information autant qu'à la similarité des documents à sa requête. Dans le but de répondre à ces nouvelles attentes, la recherche d'information s'oriente vers l'implication de l'utilisateur et de sa composante sociale dans le processus de la recherche. Ainsi, le nouvel enjeu de la recherche d'information est de modéliser la pertinence compte tenu de la position sociale et de l'influence de sa communauté. Le second enjeu est d'apprendre à produire un ordre de pertinence qui traduise le mieux possible l'importance et l'autorité sociale. C'est dans ce cadre précis, que s'inscrit notre travail. Notre objectif est d'estimer une pertinence sociale en intégrant d'une part les caractéristiques sociales des ressources et d'autre part les mesures de pertinence basées sur les principes de la recherche d'information classique. Nous proposons dans cette thèse d'intégrer le réseau social d'information dans le processus de recherche d'information afin d'utiliser les relations sociales entre les acteurs sociaux comme une source d'évidence pour mesurer la pertinence d'un document en réponse à une requête. Deux modèles de recherche d'information sociale ont été proposés à des cadres applicatifs différents : la recherche d'information bibliographique et la recherche d'information dans les microblogs. Les importantes contributions de chaque modèle sont détaillées dans la suite. Un modèle social pour la recherche d'information bibliographique. Nous avons proposé un modèle générique de la recherche d'information sociale, déployé particulièrement pour l'accès aux ressources bibliographiques. Ce modèle représente les publications scientifiques au sein d'réseau social et évalue leur importance selon la position des auteurs dans le réseau. Comparativement aux approches précédentes, ce modèle intègre des nouvelles entités sociales représentées par les annotateurs et les annotations sociales. En plus des liens de coauteur, ce modèle exploite deux autres types de relations sociales : la citation et l'annotation sociale. Enfin, nous proposons de pondérer ces relations en tenant compte de la position des auteurs dans le réseau social et de leurs mutuelles collaborations. Un modèle social pour la recherche d'information dans les microblogs.} Nous avons proposé un modèle pour la recherche de tweets qui évalue la qualité des tweets selon deux contextes: le contexte social et le contexte temporel. Considérant cela, la qualité d'un tweet est estimé par l'importance sociale du blogueur correspondant. L'importance du blogueur est calculée par l'application de l'algorithme PageRank sur le réseau d'influence sociale. Dans ce même objectif, la qualité d'un tweet est évaluée selon sa date de publication. Les tweets soumis dans les périodes d'activité d'un terme de la requête sont alors caractérisés par une plus grande importance. Enfin, nous proposons d'intégrer l'importance sociale du blogueur et la magnitude temporelle avec les autres facteurs de pertinence en utilisant un modèle Bayésien.An information retrieval system aims at selecting relevant documents that meet user's information needs expressed with a textual query. For the years 1970-1980, various theoretical models have been proposed in this direction to represent, on the one hand, documents and queries and on the other hand to match information needs independently of the user. More recently, the arrival of Web 2.0, known also as the social Web, has questioned the effectiveness of these models since they ignore the environment in which the information is located. In fact, the user is no longer a simple consumer of information but also involved in its production. To accelerate the production of information and improve the quality of their work, users tend to exchange documents with their social neighborhood that shares the same interests. It is commonly preferred to obtain information from a direct contact rather than from an anonymous source. Thus, the user, under the influenced of his social environment, gives as much importance to the social prominence of the information as the textual similarity of documents at the query. In order to meet these new prospects, information retrieval is moving towards novel user centric approaches that take into account the social context within the retrieval process. Thus, the new challenge of an information retrieval system is to model the relevance with regards to the social position and the influence of individuals in their community. The second challenge is produce an accurate ranking of relevance that reflects as closely as possible the importance and the social authority of information producers. It is in this specific context that fits our work. Our goal is to estimate the social relevance of documents by integrating the social characteristics of resources as well as relevance metrics as defined in classical information retrieval field. We propose in this work to integrate the social information network in the retrieval process and exploit the social relations between social actors as a source of evidence to measure the relevance of a document in response to a query. Two social information retrieval models have been proposed in different application frameworks: literature access and microblog retrieval. The main contributions of each model are detailed in the following. A social information model for flexible literature access. We proposed a generic social information retrieval model for literature access. This model represents scientific papers within a social network and evaluates their importance according to the position of respective authors in the network. Compared to previous approaches, this model incorporates new social entities represented by annotators and social annotations (tags). In addition to co-authorships, this model includes two other types of social relationships: citation and social annotation. Finally, we propose to weight these relationships according to the position of authors in the social network and their mutual collaborations. A social model for information retrieval for microblog search. We proposed a microblog retrieval model that evaluates the quality of tweets in two contexts: the social context and temporal context. The quality of a tweet is estimated by the social importance of the corresponding blogger. In particular, blogger's importance is calculated by the applying PageRank algorithm on the network of social influence. With the same aim, the quality of a tweet is evaluated according to its date of publication. Tweets submitted in periods of activity of query terms are then characterized by a greater importance. Finally, we propose to integrate the social importance of blogger and the temporal magnitude tweets as well as other relevance factors using a Bayesian network model

    Credibility analysis of textual claims with explainable evidence

    Get PDF
    Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources. We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.Das Web ist eine riesige Quelle wertvoller Informationen, allerdings wurde es durch die Verbreitung von Falschmeldungen verschmutzt. Eine zunehmende Anzahl an Hoaxes, Falschmeldungen und irreführenden Informationen im Internet haben viele Websites hervorgebracht, auf denen die Fakten überprüft und zweifelhafte Behauptungen manuell bewertet werden. Die rasante Verbreitung großer Mengen von Fehlinformationen sind jedoch zum Engpass für die manuelle Überprüfung geworden. Dies erfordert Tools zur Bewertung der Glaubwürdigkeit, mit denen dieser Überprüfungsprozess automatisiert werden kann. In früheren Arbeiten in diesem Bereich werden starke Annahmen gemacht über die Struktur der Behauptungen und die Portale, in denen sie gepostet werden. Vor allem aber können die Black-Box-Techniken, die in früheren Arbeiten vorgeschlagen wurden, nicht erklären, warum eine bestimmte Aussage als glaubwürdig erachtet wird oder nicht. Um diesen Einschränkungen zu begegnen, wird in dieser Dissertation ein allgemeines Framework für die automatisierte Bewertung der Glaubwürdigkeit vorgeschlagen, bei dem keine Annahmen über die Struktur oder den Ursprung der Behauptungen gemacht werden. Insbesondere schlagen wir ein featurebasiertes Modell vor, das automatisch relevante Artikel zu einer bestimmten Behauptung abruft und deren Glaubwürdigkeit bewertet, indem die gegenseitige Interaktion zwischen dem Sprachstil der relevanten Artikel, ihre Haltung zur Behauptung und der Vertrauenswürdigkeit der zugrunde liegenden Quellen erfasst wird. Wir verbessern unseren Ansatz zur Bewertung der Glaubwürdigkeit weiter und schlagen ein auf neuronalen Netzen basierendes Modell vor. Im Gegensatz zum featurebasierten Modell ist dieses Modell nicht auf Feature-Engineering und externe Lexika angewiesen. Unsere beiden Modelle machen ihre Einschätzungen interpretierbar, indem sie erklärbare Beweise aus sorgfältig ausgewählten Webquellen extrahieren. Wir verwenden unsere Modelle zur Entwicklung eines Webinterfaces, CredEye, mit dem Benutzer die Glaubwürdigkeit einer Behauptung in Textform automatisch bewerten und verstehen können, indem sie automatisch ausgewählte Beweisstücke einsehen. Darüber hinaus untersuchen wir das Problem der Positionsklassifizierung und schlagen ein auf neuronalen Netzen basierendes Modell vor, um die Position verschiedener Benutzerperspektiven in Bezug auf die umstrittenen Behauptungen vorherzusagen. Bei einer kontroversen Behauptung und einem Benutzerkommentar sagt unser Einstufungsmodell voraus, ob der Benutzerkommentar die Behauptung unterstützt oder ablehnt

    Twitter Activity Of Urban And Rural Colleges: A Sentiment Analysis Using The Dialogic Loop

    Get PDF
    The purpose of the present study is to ascertain if colleges are achieving their ultimate communication goals of maintaining and attracting students through their microblogging activity, which according to Dialogic Loop Theory, is directly correlated to the use of positive and negative sentiment. The study focused on a cross-section of urban and rural community colleges within the United States to identify the sentiment score of their microblogging activity. The study included a content analysis on the Twitter activity of these colleges. A data-mining process was employed to collect a census of the tweets associated with these colleges. Further processing was then applied using data linguistic software that removed all irrelevant text, word abbreviations, emoticons, and other Twitter specific classifiers. The resulting data set was then processed through a Multinomial Naive Bayes Classifier, which refers to a probability of word counts in a text. The classifier was trained using a data source of 1.5 million tweets, called Sentiment140, that qualitatively analyzed the corpus of these tweets, labeling them as positive and negative sentiment. The Multinomial Naive Bayes Classifier distinguished specific wording and phrases from the corpus, comparing the data to a specific database of sentiment word identifiers. The sentiment analysis process categorized the text as being positive or negative. Finally, statistical analysis was conducted on the outcome of the sentiment analysis. A significant contribution of the current work was extending Kent and Taylor\u27s (1998) Dialogic Loop Theory, which was designed specifically for identifying the relationship building capabilities of a Web site, to encompass the microblogging concept used in Twitter. Specifically, Dialogic Loop Theory is applied and enhanced to develop a model for social media communication to augment relationship building capabilities, which the current study established as a new form for evaluating Twitter tweets, labeled in the current body of work as Microblog Dialogic Communication. The implication is that by using Microblog Dialogic Communication, a college can address and correct their microblogging sentiment. The results of the data collected found that rural colleges tweeted more positive sentiment tweets and less negative sentiment tweets when compared to the urban colleges tweets

    A Ranking Approach to Summarising Twitter Home Timelines

    Get PDF
    The rise of social media services has changed the ways in which users can communicate and consume content online. Whilst online social networks allow for fast and convenient delivery of knowledge, users are prone to information overload when too much information is presented for them to read and process. Automatic text summarisation is a tool to help mitigate information overload. In automatic text summarisation, short summaries are generated algorithmically from extended text, such as news articles or scientific papers. This thesis addresses the challenges in applying text summarisation to the Twitter social network. It also goes beyond text, exploiting additional information that is unique to social networks to create summaries which are personal to an intended reader. Unlike previous work in tweet summarisation, the experiments here address the home timelines of readers, which contain the incoming posts from authors to whom they have explicitly subscribed. A novel contribution is made in this work the form of a large gold standard (19,35019,350 tweets), the majority of which will be shared with the research community. The gold standard is a collection of timelines that have been subjectively annotated by the readers to whom they belong, allowing fair evaluation of summaries which are not limited to tweets of general interest, but which are specific to the reader. Where the home timeline is used by professional users for social media analysis, automatic text summarisation can be applied to give results which beat all baselines. In the general case, where no limitation is placed on the types of readers, personalisation features which exploit the relationship between author and reader and the reader's own previous posts, were shown to outperform both automatic text summarisation and all baselines

    Spatial and Temporal Sentiment Analysis of Twitter data

    Get PDF
    The public have used Twitter world wide for expressing opinions. This study focuses on spatio-temporal variation of georeferenced Tweets’ sentiment polarity, with a view to understanding how opinions evolve on Twitter over space and time and across communities of users. More specifically, the question this study tested is whether sentiment polarity on Twitter exhibits specific time-location patterns. The aim of the study is to investigate the spatial and temporal distribution of georeferenced Twitter sentiment polarity within the area of 1 km buffer around the Curtin Bentley campus boundary in Perth, Western Australia. Tweets posted in campus were assigned into six spatial zones and four time zones. A sentiment analysis was then conducted for each zone using the sentiment analyser tool in the Starlight Visual Information System software. The Feature Manipulation Engine was employed to convert non-spatial files into spatial and temporal feature class. The spatial and temporal distribution of Twitter sentiment polarity patterns over space and time was mapped using Geographic Information Systems (GIS). Some interesting results were identified. For example, the highest percentage of positive Tweets occurred in the social science area, while science and engineering and dormitory areas had the highest percentage of negative postings. The number of negative Tweets increases in the library and science and engineering areas as the end of the semester approaches, reaching a peak around an exam period, while the percentage of negative Tweets drops at the end of the semester in the entertainment and sport and dormitory area. This study will provide some insights into understanding students and staff ’s sentiment variation on Twitter, which could be useful for university teaching and learning management

    Collaborative Working Environments : Group Needs Approach to Designing Systems for Supporting Spatially Distributed Groups

    Get PDF
    Collaboration in spatially distributed groups requires technological support for mediating collaborative activities and members’ interactions over time and distance. Technology provides multiple tools for supporting individual, social and task requirements of collaborative groups. Nevertheless, many aspects of computer-mediated interactions are not sufficiently explained and creating an effective computer-supported environment for collaborative groups as a combination of these tools remains a challenge. Meeting this challenge requires taking into consideration different aspects of collaborative interactions from both social and technological perspectives. This thesis discusses the social and technical aspects of collaboration in spatially distributed groups and introduces a design approach for collaborative working environments. Firstly, it presents a comprehensive overview of research on collaborative groups, summarizing three interrelated elements under the umbrella of the group needs approach: individual, task and group maintenance needs. Secondly, it proposes a design approach for collaborative working environments on the basis of group needs and thus presents an alternative for designing computer-supported environment for collaborative groups. This research considers two main types of systems for supporting collaborative groups – groupware and social software – and discusses functionalities originating from these systems. It introduces the Quality Function Deployment method and utilizes its House of Quality concept in order to develop and initially evaluate the First-Stage Prototype – the prototypical implementation of the collaborative working environment combining these two main types. The presented framework is used as a benchmarking tool on the basis of which selected existing platforms for supporting collaboration are evaluated. This research contributes to the area of the Computer-Supported Cooperative Work and discusses actual trends in development of collaborative systems related to the application of new social tools for purposes of computer-supported collaboration.Kollaborative Arbeitsumgebungen – der Gruppenbedürfnisansatz zur Entwicklung von Systemen für die Unterstützung räumlich verteilter Gruppen Die Zusammenarbeit in räumlich verteilten Gruppen erfordert technologische Unterstützung um Interaktionen innerhalb der Gruppen über Zeit und Distanz zu ermöglichen. Dabei bieten heutige Technologien verschiedene Tools zur Unterstützung von individuellen, sozialen und aufgabenorientierten Anforderungen der Gruppen. Allerdings sind viele Aspekte von computervermittelten Interaktionen nicht ausreichend erforscht und die Gestaltung von effektiven computerunterstützten Umgebungen für zusammenarbeitende Gruppen als eine Kombination dieser Tools bleibt eine Herausforderung. Die Erfüllung dieser Anforderungen erfordert die Berücksichtigung unterschiedlicher Aspekte der Gruppeninteraktionen sowohl aus sozialer als auch aus technologischer Perspektive. Die vorliegende Arbeit untersucht die sozialen und technischen Aspekte der Zusammenarbeit in räumlich verteilten Gruppen und präsentiert einen Entwicklungsansatz für Systeme zur Unterstützung der Zusammenarbeit. Zum einen gibt sie einen umfassenden Überblick über den aktuellen Forschungsstand zum Thema kooperative Gruppen und fasst dabei die drei verbundenen Elemente individuelle Bedürfnisse, Aufgabenbedürfnisse und Bedürfnisse zur Aufrechterhaltung der Gruppen unter dem Dach des Gruppenbedürfnisansatzes zusammen. Zum anderen präsentiert die Arbeit ein Entwicklungskonzept für kooperative Arbeitsumgebungen auf Grundlage dieses Ansatzes und somit eine Alternative für die Gestaltung von computerunterstützten Umgebungen für kollaborative Gruppen. Für diese Forschungsarbeit werden im Wesentlichen zwei Arten von Systemen sowie deren Funktionalitäten zur Unterstützung von kollaborativen Gruppen diskutiert – Groupware und Social Software. Um eine prototypische Implementierung einer kollaborativen Arbeitsumgebung zu entwickeln und eine erste Evaluation durchzuführen, wird die Quality Function Deployment Methode und das damit verbundene House of Quality Konzept verwendet. Die Forschungsergebnisse leisten einen Beitrag auf dem Gebiet der computerunterstützten Gruppenarbeit (Computer-Supported Cooperative Work) und diskutieren aktuelle Trends im Bereich der Entwicklung kollaborativer Arbeitsumgebungen, die sich mit der Integration von neuen sozialen Tools zum Zweck computerunterstützter Zusammenarbeit beschäftigen

    A multi-modal, multi-platform, and multi-lingual approach to understanding online misinformation

    Get PDF
    Due to online social media, access to information is becoming easier and easier. Meanwhile, the truthfulness of online information is often not guaranteed. Incorrect information, often called misinformation, can have several modalities, and it can spread to multiple social media platforms in different languages, which can be destructive to society. However, academia and industry do not have automated ways to assess the impact of misinformation on social media, preventing the adoption of productive strategies to curb the prevalence of misinformation. In this dissertation, I present my research to build computational pipelines that help measuring and detecting misinformation on social media. My work can be divided into three parts. The first part focuses on processing misinformation in text form. I first show how to group political news articles from both trustworthy and untrustworthy news outlets into stories. Then I present a measurement analysis on the spread of stories to characterize how mainstream and fringe Web communities influence each other. The second part is related to analyzing image-based misinformation. It can be further divided into two parts: fauxtography and generic image misinformation. Fauxtography is a special type of image misinformation, where images are manipulated or used out-of-context. In this research, I present how to identify fauxtography on social media by using a fact-checking website (Snopes.com), and I also develop a computational pipeline to facilitate the measurement of these images at scale. I next focus on generic misinformation images related to COVID-19. During the pandemic, text misinformation has been studied in many aspects. However, very little research has covered image misinformation during the COVID-19 pandemic. In this research, I develop a technique to cluster visually similar images together, facilitating manual annotation, to make subsequent analysis possible. The last part is about the detection of misinformation in text form following a multi-language perspective. This research aims to detect textual COVID-19 related misinformation and what stances Twitter users have towards such misinformation in both English and Chinese. To achieve this goal, I experiment on several natural language processing (NLP) models to investigate their performance on misinformation detection and stance detection in both monolingual and multi-lingual manners. The results show that two models: COVID-Tweet-BERT v2 and BERTweet are generally effective in detecting misinformation and stance in the two above manners. These two models are promising to be applied to misinformation moderation on social media platforms, which heavily depends on identifying misinformation and stance of the author towards this piece of misinformation. Overall, the results of this dissertation shed light on understanding of online misinformation, and my proposed computational tools are applicable to moderation of social media, potentially benefitting for a more wholesome online ecosystem

    European Handbook of Crowdsourced Geographic Information

    Get PDF
    "This book focuses on the study of the remarkable new source of geographic information that has become available in the form of user-generated content accessible over the Internet through mobile and Web applications. The exploitation, integration and application of these sources, termed volunteered geographic information (VGI) or crowdsourced geographic information (CGI), offer scientists an unprecedented opportunity to conduct research on a variety of topics at multiple scales and for diversified objectives. The Handbook is organized in five parts, addressing the fundamental questions: What motivates citizens to provide such information in the public domain, and what factors govern/predict its validity?What methods might be used to validate such information? Can VGI be framed within the larger domain of sensor networks, in which inert and static sensors are replaced or combined by intelligent and mobile humans equipped with sensing devices? What limitations are imposed on VGI by differential access to broadband Internet, mobile phones, and other communication technologies, and by concerns over privacy? How do VGI and crowdsourcing enable innovation applications to benefit human society? Chapters examine how crowdsourcing techniques and methods, and the VGI phenomenon, have motivated a multidisciplinary research community to identify both fields of applications and quality criteria depending on the use of VGI. Besides harvesting tools and storage of these data, research has paid remarkable attention to these information resources, in an age when information and participation is one of the most important drivers of development. The collection opens questions and points to new research directions in addition to the findings that each of the authors demonstrates. Despite rapid progress in VGI research, this Handbook also shows that there are technical, social, political and methodological challenges that require further studies and research.

    Joint top-k subscription query processing over microblog threads

    No full text
    With an increasing amount of social media messages, users on the social platforms start to seek ideas and opinions by themselves. Publisher subscribers are utilized by these who want to actively read and consume web data. Web platforms give people opportunities to communicate with others. The social property is also important in the pub/sub while currently no other works have ever considered this. Also, platforms like Twitter or Facebook only allow users to post a short message which causes the short-text problem: single posts lack of contextual information. Therefore, we propose the microblog thread as the minimum information unit in the subscription queries to capture social and textual relevant information. However, this brings challenges to this problem: 1. How to retrieve the microblog thread while the stream of microblogs keeps updating the microblog threads and the results of subscription queries keep changing? 2. How to represent the subscription results while the microblog threads are frequently updated? Hence, we propose the group filtering and individual filtering to help to satisfy the high update rate of subscription results. Extensive experiments on real datasets have been conducted to verify the efficiency and scalability of our proposed approach
    corecore