413 research outputs found

    Detection of Sockpuppet Accounts on Reddit

    Get PDF
    The purpose of this thesis is to study and analyze the detection of sockpuppet accounts on Reddit. Sockpuppet accounts are created exclusively to cause mischief or mayhem at a site without the original user being identified. With the rise of sockpuppet accounts, it is very important to identify these accounts and help maintain a healthy online community. The data used for the research was obtained from Reddit which is stored in the dirt cluster. The categorization of sockpuppet accounts and non-sockpuppet accounts was implemented with TF-IDF (term frequency inverse document frequency algorithm), k-Means clustering algorithm and a multilayer perceptron classifier. The goal is to identify sockpuppet accounts from a huge dataset of accounts on Reddit and bring awareness of the level of harm and misuse sockpuppet accounts can create to the online community

    Social media analytics: a survey of techniques, tools and platforms

    Get PDF
    This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing

    AAPOR Report on Big Data

    Get PDF
    In recent years we have seen an increase in the amount of statistics in society describing different phenomena based on so called Big Data. The term Big Data is used for a variety of data as explained in the report, many of them characterized not just by their large volume, but also by their variety and velocity, the organic way in which they are created, and the new types of processes needed to analyze them and make inference from them. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for survey research.There is a great potential in Big Data but there are some fundamental challenges that have to be resolved before its full potential can be realized. In this report we give examples of different types of Big Data and their potential for survey research. We also describe the Big Data process and discuss its main challenges

    Data-driven marketing for the e-commerce of brands

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsThe topic of this project is a data-driven marketing strategy for an e-commerce shoe brand Lovidovi Shoes. The company wanted to improve their digital marketing performance, and an improvement was defined as an increase in sales. While general best practices for successful Facebook Business advertising have been researched, each business is unique and optimal results are achieved through internal research. Data collected over the course of seven years was first centralized, and then analysed, using tools such as Power BI and Python, in order to determine the best audience and ad settings. The findings made on base data were re-evaluated and fine-tuned through testing. The final result showed that the best performing ads target an audience of women of all ages. Bosnia and Herzegovina makes the brands most profitable market, with the biggest number of sales, and the lowest cost per purchase. The feed placement on Facebook and Instagram get the best reaction, and the bestsellers are products in the white sneakers category. The ads created as part of this project showed significantly better performance by the company’s standards, and average performance by the industry’s standards. This project designed a simple guide on how to start making a shift towards data-driven marketing approaches with a limited budget, and has given the company motivation to utilize its data more and better

    Applying text mining techniques to forecast the stock market fluctuations of large it companies with twitter data: descriptive and predictive approaches to enhance the research of stock market predictions with textual and semantic data

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementThis research project applies advanced text mining techniques as a method to predict stock market fluctuations by merging published tweets and daily stock market prices for a set of American Information Technology companies. This project executes a systematical approach to investigate and further analyze, by using mainly R code, two main objectives: i) which are the descriptive criteria, patterns, and variables, which are correlated with the stock fluctuation and ii) does the single usage of tweets indicate moderate signal to predict with high accuracy the stock market fluctuations. The main supposition and expected output of the research work is to deliver findings about the twitter text significance and predictability power to indicate the importance of social media content in terms of stock market fluctuations by using descriptive and predictive data mining approaches, as natural language processing, topic modelling, sentiment analysis and binary classification with neural networks

    Web Log Data Analysis: Converting Unstructured Web Log Data into Structured Data Using Apache Pig

    Get PDF
    Data extraction and analysis have recently received significant attention due to the evolution of social media and large volume of data available in an unstructured form. Hadoop and MapReduce have been continuously implementing and analyzing large amount of data. In this paper Apache Pig, which is one of the high-level platform for analyzing large volume of data and runs on the top of Hadoop is used to analyze unstructured log files and extract information. In this paper, weblog server files are used to analyze and extract meaningful information in an unstructured form to a structured form in Apache Pig framework The main purpose of this paper is to extract, transform and load unstructured data in an Apache Pig framework and analyze the data and its performance on local mode as well as MapReduce mode. This paper further explains in brief about the different steps required to analyze unstructured web server log files in Apache Pig. This paper also compares the efficiency when a large volume of data is processed on MapReduce mode and local mode

    Data analytics

    Get PDF
    This study guide is devoted to substantiating the nature, role and importance of data, information, analytical work, explanation of its basic principles within modern information environment, as well as consideration of the main approaches and basic tools while performing the analytical tasks by specialists in the sphere of political analytics as well as of social work

    ISCRAM-Med 2016. Third International Conference on Information Systems for Crisis Response and Management in Mediterranean Countries [Poster Papers]

    Get PDF
    Poster Papers of ISCRAM-Med 2016 Third International Conference on Information Systems for Crisis Response and Management in Mediterranean Countries. October 26-28, 2016. Universidad Carlos III de Madrid (Spain)Universidad Carlos III de Madrid. Vicerrectorado de Investigación y Transferenci
    corecore