324 research outputs found
A community role approach to assess social capitalists visibility in the Twitter network
In the context of Twitter, social capitalists are specific users trying to
increase their number of followers and interactions by any means. These users
are not healthy for the service, because they are either spammers or real users
flawing the notions of influence and visibility. Studying their behavior and
understanding their position in Twit-ter is thus of important interest. It is
also necessary to analyze how these methods effectively affect user visibility.
Based on a recently proposed method allowing to identify social capitalists, we
tackle both points by studying how they are organized, and how their links
spread across the Twitter follower-followee network. To that aim, we consider
their position in the network w.r.t. its community structure. We use the
concept of community role of a node, which describes its position in a network
depending on its connectiv-ity at the community level. However, the topological
measures originally defined to characterize these roles consider only certain
aspects of the community-related connectivity, and rely on a set of empirically
fixed thresholds. We first show the limitations of these measures, before
extending and generalizing them. Moreover, we use an unsupervised approach to
identify the roles, in order to provide more flexibility relatively to the
studied system. We then apply our method to the case of social capitalists and
show they are highly visible on Twitter, due to the specific roles they hold.Comment: arXiv admin note: substantial text overlap with arXiv:1406.661
Harnessing the power of the general public for crowdsourced business intelligence: a survey
International audienceCrowdsourced business intelligence (CrowdBI), which leverages the crowdsourced user-generated data to extract useful knowledge about business and create marketing intelligence to excel in the business environment, has become a surging research topic in recent years. Compared with the traditional business intelligence that is based on the firm-owned data and survey data, CrowdBI faces numerous unique issues, such as customer behavior analysis, brand tracking, and product improvement, demand forecasting and trend analysis, competitive intelligence, business popularity analysis and site recommendation, and urban commercial analysis. This paper first characterizes the concept model and unique features and presents a generic framework for CrowdBI. It also investigates novel application areas as well as the key challenges and techniques of CrowdBI. Furthermore, we make discussions about the future research directions of CrowdBI
Addressing the new generation of spam (Spam 2.0) through Web usage models
New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem
Multilevel User Credibility Assessment in Social Networks
Online social networks are one of the largest platforms for disseminating
both real and fake news. Many users on these networks, intentionally or
unintentionally, spread harmful content, fake news, and rumors in fields such
as politics and business. As a result, numerous studies have been conducted in
recent years to assess the credibility of users. A shortcoming of most of
existing methods is that they assess users by placing them in one of two
categories, real or fake. However, in real-world applications it is usually
more desirable to consider several levels of user credibility. Another
shortcoming is that existing approaches only use a portion of important
features, which downgrades their performance. In this paper, due to the lack of
an appropriate dataset for multilevel user credibility assessment, first we
design a method to collect data suitable to assess credibility at multiple
levels. Then, we develop the MultiCred model that places users at one of
several levels of credibility, based on a rich and diverse set of features
extracted from users' profile, tweets and comments. MultiCred exploits deep
language models to analyze textual data and deep neural models to process
non-textual features. Our extensive experiments reveal that MultiCred
considerably outperforms existing approaches, in terms of several accuracy
measures
Vulnerabilities to Online Social Network Identity Deception Detection Research and Recommendations for Mitigation
Identity deception in online social networks is a pervasive problem. Ongoing research is developing methods for identity deception detection. However, the real-world efficacy of these methods is currently unknown because they have been evaluated largely through laboratory experiments. We present a review of representative state-of-the-art results on identity deception detection. Based on this analysis, we identify common methodological weaknesses for these approaches, and we propose recommendations that can increase their effectiveness for when they are applied in real-world environments
Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing
This thesis presents several state-of-the-art approaches constructed for the purpose of (i) studying the trustworthiness of users in Online Social Network platforms, (ii) deriving concealed knowledge from their textual content, and (iii) classifying and predicting the domain knowledge of users and their content. The developed approaches are refined through proof-of-concept experiments, several benchmark comparisons, and appropriate and rigorous evaluation metrics to verify and validate their effectiveness and efficiency, and hence, those of the applied frameworks
Sampling Twitter users for social science research: Evidence from a systematic review of the literature
All social media platforms can be used to conduct social science research, but Twitter is
the most popular as it provides its data via several Application Programming Interfaces,
which allows qualitative and quantitative research to be conducted with its members. As
Twitter is a huge universe, both in number of users and amount of data, sampling is generally
required when using it for research purposes. Researchers only recently began to question
whether tweet-level sampling—in which the tweet is the sampling unit—should be
replaced by user-level sampling—in which the user is the sampling unit. The major rationale
for this shift is that tweet-level sampling does not consider the fact that some core discussants
on Twitter are much more active tweeters than other less active users, thus causing
a sample biased towards the more active users. The knowledge on how to select representative
samples of users in the Twitterverse is still insufficient despite its relevance for reliable
and valid research outcomes. This paper contributes to this topic by presenting a systematic
quantitative literature review of sampling plans designed and executed in the context of
social science research in Twitter, including: (1) the definition of the target populations,
(2) the sampling frames used to support sample selection, (3) the sampling methods used
to obtain samples of Twitter users, (4) how data is collected from Twitter users, (5) the size
of the samples, and (6) how research validity is addressed. This review can be a methodological
guide for professionals and academics who want to conduct social science research
involving Twitter users and the Twitterverse.info:eu-repo/semantics/publishedVersio
- …