170 research outputs found
Misinformation Detection in Social Media
abstract: The pervasive use of social media gives it a crucial role in helping the public perceive reliable information. Meanwhile, the openness and timeliness of social networking sites also allow for the rapid creation and dissemination of misinformation. It becomes increasingly difficult for online users to find accurate and trustworthy information. As witnessed in recent incidents of misinformation, it escalates quickly and can impact social media users with undesirable consequences and wreak havoc instantaneously. Different from some existing research in psychology and social sciences about misinformation, social media platforms pose unprecedented challenges for misinformation detection. First, intentional spreaders of misinformation will actively disguise themselves. Second, content of misinformation may be manipulated to avoid being detected, while abundant contextual information may play a vital role in detecting it. Third, not only accuracy, earliness of a detection method is also important in containing misinformation from being viral. Fourth, social media platforms have been used as a fundamental data source for various disciplines, and these research may have been conducted in the presence of misinformation. To tackle the challenges, we focus on developing machine learning algorithms that are robust to adversarial manipulation and data scarcity.
The main objective of this dissertation is to provide a systematic study of misinformation detection in social media. To tackle the challenges of adversarial attacks, I propose adaptive detection algorithms to deal with the active manipulations of misinformation spreaders via content and networks. To facilitate content-based approaches, I analyze the contextual data of misinformation and propose to incorporate the specific contextual patterns of misinformation into a principled detection framework. Considering its rapidly growing nature, I study how misinformation can be detected at an early stage. In particular, I focus on the challenge of data scarcity and propose a novel framework to enable historical data to be utilized for emerging incidents that are seemingly irrelevant. With misinformation being viral, applications that rely on social media data face the challenge of corrupted data. To this end, I present robust statistical relational learning and personalization algorithms to minimize the negative effect of misinformation.Dissertation/ThesisDoctoral Dissertation Computer Science 201
FAKE NEWS DETECTION ON THE WEB: A DEEP LEARNING BASED APPROACH
The acceptance and popularity of social media platforms for the dispersion and proliferation of news articles have led to the spread of questionable and untrusted information (in part) due to the ease by which misleading content can be created and shared among the communities. While prior research has attempted to automatically classify news articles and tweets as credible and non-credible. This work complements such research by proposing an approach that utilizes the amalgamation of Natural Language Processing (NLP), and Deep Learning techniques such as Long Short-Term Memory (LSTM).
Moreover, in Information System’s paradigm, design science research methodology (DSRM) has become the major stream that focuses on building and evaluating an artifact to solve emerging problems. Hence, DSRM can accommodate deep learning-based models with the availability of adequate datasets. Two publicly available datasets that contain labeled news articles and tweets have been used to validate the proposed model’s effectiveness. This work presents two distinct experiments, and the results demonstrate that the proposed model works well for both long sequence news articles and short-sequence texts such as tweets. Finally, the findings suggest that the sentiments, tagging, linguistics, syntactic, and text embeddings are the features that have the potential to foster fake news detection through training the proposed model on various dimensionality to learn the contextual meaning of the news content
Multilingual Fake News Detection with Satire
International audienceThe information spread through the Web influences politics, stock markets, public health, people's reputation and brands. For these reasons, it is crucial to filter out false information. In this paper, we compare different automatic approaches for fake news detection based on statistical text analysis on the vaccination fake news dataset provided by the Storyzy company. Our CNN works better for discrimination of the larger classes (fake vs trusted) while the gradient boosting decision tree with feature stacking approach obtained better results for satire detection. We contribute by showing that efficient satire detection can be achieved using merged embeddings and a specific model, at the cost of larger classes. We also contribute by merging redundant information on purpose in order to better predict satire news from fake news and trusted news
A Hotspot Discovery Method Based on Improved FIHC Clustering Algorithm
It was difficult to find the microblog hotspot because the characteristics of microblog were short, rapid, change and so on. A microblog hotspot detection method based on MFIHC and TOPSIS was proposed in order to solve the problem. Firstly, the calculation of HowNet similarity was used in the score function of FIHC, the semantic links between frequent words were considered, and the initial clusters based on frequent words were produced more accurately. Then the initial cluster of the text repletion of mircoblog was reduced, and the idea of Single-Pass clustering was used to the reduced topic cluster in order to get the Hotspot. At last, an improved TOPSIS model was used to sort the hot topics in order to get the rank of the hot topics. Compared with the other text clustering algorithms and hotspot detection methods, the method has good effect, and can be a more comprehensive response to the current hot topics
Redes sociais online : extração de conhecimento e análise espaço-temporal de eventos de difusão de informação
Orientador: Fernando José Von ZubenDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Com o surgimento e a popularização de Redes Sociais Online e de Serviços de Redes Sociais, pesquisadores da área de computação têm encontrado um campo fértil para o desenvolvimento de trabalhos com grande volume de dados, modelos envolvendo múltiplos agentes e dinâmicas espaço-temporais. Entretanto, mesmo com significativo elenco de pesquisas já publicadas no assunto, ainda existem aspectos das redes sociais cuja explicação é incipiente. Visando o aprofundamento do conhecimento da área, este trabalho investiga fenômenos de compartilhamento coletivo na rede, que caracterizam eventos de difusão de informação. A partir da observação de dados reais oriundos do serviço online Twitter, tais eventos são modelados, caracterizados e analisados. Com o uso de técnicas de aprendizado de máquina, são encontrados padrões nos processos espaço-temporais da rede, tornando possível a construção de classificadores de mensagens baseados em comportamento e a caracterização de comportamentos individuais, a partir de conexões sociaisAbstract: With the advent and popularization of Online Social Networks and Social Networking Services, computer science researchers have found fertile field for the development of studies using large volumes of data, multiple agents models and spatio-temporal dynamics. However, even with a significant amount of published research on the subject, there are still aspects of social networks whose explanation is incipient. In order to deepen the knowledge of the area, this work investigates phenomena of collective sharing on the network, characterizing information diffusion events. From the observation of real data obtained from the online service Twitter, we collect, model and characterize such events. Finally, using machine learning and computational data analysis, patterns are found on the network's spatio-temporal processes, making it possible to classify a message's topic from users behaviour and the characterization of individual behaviour, from social connectionsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric
Decision Modelling Driven by Twitter Data: A Case Study of the 2017 Presidential Election in Ecuador
When Infodemic Meets Epidemic: a Systematic Literature Review
Epidemics and outbreaks present arduous challenges requiring both individual
and communal efforts. Social media offer significant amounts of data that can
be leveraged for bio-surveillance. They also provide a platform to quickly and
efficiently reach a sizeable percentage of the population, hence their
potential impact on various aspects of epidemic mitigation. The general
objective of this systematic literature review is to provide a methodical
overview of the integration of social media in different epidemic-related
contexts. Three research questions were conceptualized for this review,
resulting in over 10000 publications collected in the first PRISMA stage, 129
of which were selected for inclusion. A thematic method-oriented synthesis was
undertaken and identified 5 main themes related to social media enabled
epidemic surveillance, misinformation management, and mental health. Findings
uncover a need for more robust applications of the lessons learned from
epidemic post-mortem documentation. A vast gap exists between retrospective
analysis of epidemic management and result integration in prospective studies.
Harnessing the full potential of social media in epidemic related tasks
requires streamlining the results of epidemic forecasting, public opinion
understanding and misinformation propagation, all while keeping abreast of
potential mental health implications. Pro-active prevention has thus become
vital for epidemic curtailment and containment
Mining Social Media and Structured Data in Urban Environmental Management to Develop Smart Cities
This research presented the deployment of data mining on social media and structured data in urban studies. We analyzed urban relocation, air quality and traffic parameters on multicity data as early work. We applied the data mining techniques of association rules, clustering and classification on urban legislative history. Results showed that data mining could produce meaningful knowledge to support urban management. We treated ordinances (local laws) and the tweets about them as indicators to assess urban policy and public opinion. Hence, we conducted ordinance and tweet mining including sentiment analysis of tweets. This part of the study focused on NYC with a goal of assessing how well it heads towards a smart city. We built domain-specific knowledge bases according to widely accepted smart city characteristics, incorporating commonsense knowledge sources for ordinance-tweet mapping. We developed decision support tools on multiple platforms using the knowledge discovered to guide urban management. Our research is a concrete step in harnessing the power of data mining in urban studies to enhance smart city development
Information Reliability on the Social Web - Models and Applications in Intelligent User Interfaces
The Social Web is undergoing continued evolution, changing the paradigm of information production, processing and sharing. Information sources have shifted from institutions to individual users, vastly increasing the amount of information available online. To overcome the information overload problem, modern filtering algorithms have enabled people to find relevant information in efficient ways. However, noisy, false and otherwise useless information remains a problem. We believe that the concept of information reliability needs to be considered along with information relevance to adapt filtering algorithms to today's Social Web. This approach helps to improve information search and discovery and can also improve user experience by communicating aspects of information reliability.This thesis first shows the results of a cross-disciplinary study into perceived reliability by reporting on a novel user experiment. This is followed by a discussion of modeling, validating, and communicating information reliability, including its various definitions across disciplines. A selection of important reliability attributes such as source credibility, competence, influence and timeliness are examined through different case studies. Results show that perceived reliability of information can vary greatly across contexts. Finally, recent studies on visual analytics, including algorithm explanations and interactive interfaces are discussed with respect to their impact on the perception of information reliability in a range of application domains
Geo-Information Harvesting from Social Media Data
As unconventional sources of geo-information, massive imagery and text
messages from open platforms and social media form a temporally quasi-seamless,
spatially multi-perspective stream, but with unknown and diverse quality. Due
to its complementarity to remote sensing data, geo-information from these
sources offers promising perspectives, but harvesting is not trivial due to its
data characteristics. In this article, we address key aspects in the field,
including data availability, analysis-ready data preparation and data
management, geo-information extraction from social media text messages and
images, and the fusion of social media and remote sensing data. We then
showcase some exemplary geographic applications. In addition, we present the
first extensive discussion of ethical considerations of social media data in
the context of geo-information harvesting and geographic applications. With
this effort, we wish to stimulate curiosity and lay the groundwork for
researchers who intend to explore social media data for geo-applications. We
encourage the community to join forces by sharing their code and data.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
- …