9,860 research outputs found

    Political Homophily in Independence Movements: Analysing and Classifying Social Media Users by National Identity

    Get PDF
    Social media and data mining are increasingly being used to analyse political and societal issues. Here we undertake the classification of social media users as supporting or opposing ongoing independence movements in their territories. Independence movements occur in territories whose citizens have conflicting national identities; users with opposing national identities will then support or oppose the sense of being part of an independent nation that differs from the officially recognised country. We describe a methodology that relies on users' self-reported location to build large-scale datasets for three territories -- Catalonia, the Basque Country and Scotland. An analysis of these datasets shows that homophily plays an important role in determining who people connect with, as users predominantly choose to follow and interact with others from the same national identity. We show that a classifier relying on users' follow networks can achieve accurate, language-independent classification performances ranging from 85% to 97% for the three territories.Comment: Accepted for publication in IEEE Intelligent System

    Topic-centric Classification of Twitter User's Political Orientation

    Get PDF
    In the recent Scottish Independence Referendum (hereafter, IndyRef), Twitter offered a broad platform for people to express their opinions, with millions of IndyRef tweets posted over the campaign period. In this paper, we aim to classify people's voting intentions by the content of their tweets---their short messages communicated on Twitter. By observing tweets related to the IndyRef, we find that people not only discussed the vote, but raised topics related to an independent Scotland including oil reserves, currency, nuclear weapons, and national debt. We show that the views communicated on these topics can inform us of the individuals' voting intentions ("Yes"--in favour of Independence vs. "No"--Opposed). In particular, we argue that an accurate classifier can be designed by leveraging the differences in the features' usage across different topics related to voting intentions. We demonstrate improvements upon a Naive Bayesian classifier using the topics enrichment method. Our new classifier identifies the closest topic for each unseen tweet, based on those topics identified in the training data. Our experiments show that our Topics-Based Naive Bayesian classifier improves accuracy by 7.8% over the classical Naive Bayesian baseline

    Semi-supervised Text Regression with Conditional Generative Adversarial Networks

    Get PDF
    Enormous online textual information provides intriguing opportunities for understandings of social and economic semantics. In this paper, we propose a novel text regression model based on a conditional generative adversarial network (GAN), with an attempt to associate textual data and social outcomes in a semi-supervised manner. Besides promising potential of predicting capabilities, our superiorities are twofold: (i) the model works with unbalanced datasets of limited labelled data, which align with real-world scenarios; and (ii) predictions are obtained by an end-to-end framework, without explicitly selecting high-level representations. Finally we point out related datasets for experiments and future research directions

    Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval

    Get PDF
    Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems, i.e., image tag assignment, refinement, and tag-based image retrieval is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, i.e. estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this paper introduces a taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison between the state-of-the-art, a new experimental protocol is presented, with training sets containing 10k, 100k and 1m images and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.Comment: to appear in ACM Computing Survey

    Data exploitation and privacy protection in the era of data sharing

    Get PDF
    As the amount, complexity, and value of data available in both private and public sectors has risen sharply, the competing goals of data privacy and data utility have challenged both organizations and individuals. This dissertation addresses both goals. First, we consider the task of {\it interorganizational data sharing}, in which data owners, data clients, and data subjects have different and sometimes competing privacy concerns. A key challenge in this type of scenario is that each organization uses its own set of proprietary, intraorganizational attributes to describe the shared data; such attributes cannot be shared with other organizations. Moreover, data-access policies are determined by multiple parties and may be specified using attributes that are not directly comparable with the ones used by the owner to specify the data. We propose a system architecture and a suite of protocols that facilitate dynamic and efficient interorganizational data sharing, while allowing each party to use its own set of proprietary attributes to describe the shared data and preserving confidentiality of both data records and attributes. We introduce the novel technique of \textit{attribute-based encryption with oblivious attribute translation (OTABE)}, which plays a crucial role in our solution and may prove useful in other applications. This extension of attribute-based encryption uses semi-trusted proxies to enable dynamic and oblivious translation between proprietary attributes that belong to different organizations. We prove that our OTABE-based framework is secure in the standard model and provide two real-world use cases. Next, we turn our attention to utility that can be derived from the vast and growing amount of data about individuals that is available on social media. As social networks (SNs) continue to grow in popularity, it is essential to understand what can be learned about personal attributes of SN users by mining SN data. The first SN-mining problem we consider is how best to predict the voting behavior of SN users. Prior work only considered users who generate politically oriented content or voluntarily disclose their political preferences online. We avoid this bias by using a novel type of Bayesian-network (BN) model that combines demographic, behavioral, and social features. We test our method in a predictive analysis of the 2016 U.S. Presidential election. Our work is the first to take a semi-supervised approach in this setting. Using the Expectation-Maximization (EM) algorithm, we combine labeled survey data with unlabeled Facebook data, thus obtaining larger datasets and addressing self-selection bias. The second SN-mining challenge we address is the extent to which Dynamic Bayesian Networks (DBNs) can infer dynamic behavioral intentions such as the intention to get a vaccine or to apply for a loan. Knowledge of such intentions has great potential to improve the design of recommendation systems, ad-targeting mechanisms, public-health campaigns, and other social and commercial endeavors. We focus on the question of how to infer an SN user\u27s \textit{offline} decisions and intentions using only the {\it public} portions of her \textit{online} SN accounts. Our contribution is twofold. First, we use BNs and several behavioral-psychology techniques to model decision making as a complex process that both influences and is influenced by static factors (such as personality traits and demographic categories) and dynamic factors (such as triggering events, interests, and emotions). Second, we explore the extent to which temporal models may assist in the inference task by representing SN users as sets of DBNs that are built using our modeling techniques. The use of DBNs, together with data gathered in multiple waves, has the potential to improve both inference accuracy and prediction accuracy in future time slots. It may also shed light on the extent to which different factors influence the decision-making process

    Citizen adoption of e-government services – Evidence from Hungary

    Get PDF
    In a citizen centric approach – which became increasingly popular in the last decade – e-government success begins with citizens starting to use e-government systems, solutions, services. In line with this our paper investigates the factors – presented by the technology acceptance literature – influencing e-government service usage, on a large representative Hungarian sample concerning a wide range of B2C public administration services. Our results imply that the Hungarian government can further increase the usage of e-government services by influencing effort expectancy, trust of internet, facilitating conditions, user experience or habits

    An analysis of the user occupational class through Twitter content

    Get PDF
    Social media content can be used as a complementary source to the traditional methods for extracting and studying collective social attributes. This study focuses on the prediction of the occupational class for a public user profile. Our analysis is conducted on a new annotated corpus of Twitter users, their respective job titles, posted textual content and platform-related attributes. We frame our task as classification using latent feature representations such as word clusters and embeddings. The employed linear and, especially, non-linear methods can predict a user’s occupational class with strong accuracy for the coarsest level of a standard occupation taxonomy which includes nine classes. Combined with a qualitative assessment, the derived results confirm the feasibility of our approach in inferring a new user attribute that can be embedded in a multitude of downstream applications
    corecore