18,878 research outputs found

    Automated Detection of Human Users in Twitter

    Get PDF
    AbstractThis paper compares Suppport Vector Machine (SVM) classification and a number of clustering approaches to separate human from not human users in Twitter in order to identify normal human activity. These approaches have similar F1 accuracy scores of 90% with both experienc- ing difficulties in classifying human users behaving abnormally. A second stage classification step was then used to further separate not human users into brands, celebrities and promoters / information achieving an average F1 accuracy of 74%. These accuracies were achieved by reducing the size of the feature space using stepwise feature selection and category balancing from manual inspection of classification results

    A Network Topology Approach to Bot Classification

    Full text link
    Automated social agents, or bots, are increasingly becoming a problem on social media platforms. There is a growing body of literature and multiple tools to aid in the detection of such agents on online social networking platforms. We propose that the social network topology of a user would be sufficient to determine whether the user is a automated agent or a human. To test this, we use a publicly available dataset containing users on Twitter labelled as either automated social agent or human. Using an unsupervised machine learning approach, we obtain a detection accuracy rate of 70%

    Who tweets? Deriving the demographic characteristics of age, occupation and social class from Twitter user meta-data

    Get PDF
    This paper specifies, designs and critically evaluates two tools for the automated identification of demographic data (age, occupation and social class) from the profile descriptions of Twitter users in the United Kingdom (UK). Meta-data data routinely collected through the Collaborative Social Media Observatory (COSMOS: http://www.cosmosproject.net/) relating to UK Twitter users is matched with the occupational lookup tables between job and social class provided by the Office for National Statistics (ONS) using SOC2010. Using expert human validation, the validity and reliability of the automated matching process is critically assessed and a prospective class distribution of UK Twitter users is offered with 2011 Census baseline comparisons. The pattern matching rules for identifying age are explained and enacted following a discussion on how to minimise false positives. The age distribution of Twitter users, as identified using the tool, is presented alongside the age distribution of the UK population from the 2011 Census. The automated occupation detection tool reliably identifies certain occupational groups, such as professionals, for which job titles cannot be confused with hobbies or are used in common parlance within alternative contexts. An alternative explanation on the prevalence of hobbies is that the creative sector is overrepresented on Twitter compared to 2011 Census data. The age detection tool illustrates the youthfulness of Twitter users compared to the general UK population as of the 2011 Census according to proportions, but projections demonstrate that there is still potentially a large number of older platform users. It is possible to detect “signatures” of both occupation and age from Twitter meta-data with varying degrees of accuracy (particularly dependent on occupational groups) but further confirmatory work is needed

    Social Bots: Human-Like by Means of Human Control?

    Get PDF
    Social bots are currently regarded an influential but also somewhat mysterious factor in public discourse and opinion making. They are considered to be capable of massively distributing propaganda in social and online media and their application is even suspected to be partly responsible for recent election results. Astonishingly, the term `Social Bot' is not well defined and different scientific disciplines use divergent definitions. This work starts with a balanced definition attempt, before providing an overview of how social bots actually work (taking the example of Twitter) and what their current technical limitations are. Despite recent research progress in Deep Learning and Big Data, there are many activities bots cannot handle well. We then discuss how bot capabilities can be extended and controlled by integrating humans into the process and reason that this is currently the most promising way to go in order to realize effective interactions with other humans.Comment: 36 pages, 13 figure
    corecore