435 research outputs found

    AAPOR Report on Big Data

    Get PDF
    In recent years we have seen an increase in the amount of statistics in society describing different phenomena based on so called Big Data. The term Big Data is used for a variety of data as explained in the report, many of them characterized not just by their large volume, but also by their variety and velocity, the organic way in which they are created, and the new types of processes needed to analyze them and make inference from them. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for survey research.There is a great potential in Big Data but there are some fundamental challenges that have to be resolved before its full potential can be realized. In this report we give examples of different types of Big Data and their potential for survey research. We also describe the Big Data process and discuss its main challenges

    COSMOS: Towards an integrated and scalable service for analysing social media on demand

    Get PDF
    The growing number of people using social media to publish their opinions, share expertise, make social connections and promote their ideas to an international audience is creating data on an epic scale. This enables social scientists to conduct research into ethnography, discourse analysis and analysis of social interactions, providing insight into today's society, which is largely augmented by social computing. The tools available for such analysis are often proprietary and expensive, and often non-interoperable, meaning the rapid marshalling of large data-sets through a range of analyses is arduous and difficult to scale. The collaborative online social media observatory (COSMOS), an integrated social media analysis tool is presented, developed for open access within academia. COSMOS is underpinned by a scalable Hadoop infrastructure and can support the rapid analysis of large data-sets and the orchestration of workflows between tools with limited human effort. We describe an architecture and scalability results for the computational analysis of social media data, and comment on the storage, search and retrieval issues associated with massive social media data-sets. We also provide an insight into the impact of such an integrated on-demand service in the social science academic community

    Public Opinion Analysis Using Hadoop

    Get PDF
    Recent technological advances in devices, computing, and social networking have revolutionized the world but have also increased the amount of data produced by humans on a large scale. If you collect this data in the form of disks, it may fill an entire football field. According to studies, 2.5 billion gigabytes of new data is generated every day and 2.5 petabytes of data is collected every hour. This rate is still growing enormously. Though all this information produced is meaningful and can be useful when processed, it gets neglected. Social media has gained massive popularity nowadays. Twitter makes it easy to engage users in expressing, sharing and discussing hot latest topics but these public expressions and views are hard to analyze due to the bigger size of the data created by Twitter. In order to perform analysis and predictions over the hot topics in society, latest technologies are needed. The most popular solution for this is Hadoop. Hadoop acts as an open-source framework for developing and executing distributed applications that process very large amounts of data. It stores and process big data in a distributed fashion on large clusters of commodity hardware. The risk, of course, in running on commodity machines is how to handle failure. Hadoop is built with the assumption that hardware will fail and as such, it can easily handle most failures. Hadoop can be used for developing and executing distributed applications that process very large amounts of data. It provides a suitable environment needed for treating or processing huge data. Our job is to extract and store data into its file system and query the data according to the desired output. We propose to perform analysis on Public opinion expressed over Twitter regarding the trending topics of the society by using Apache Hadoop framework along with its services Apache Flume and Apache Hive

    Big Data Analytics: A Survey

    Get PDF
    Internet-based programs and communication techniques have become widely used and respected in the IT industry recently. A persistent source of "big data," or data that is enormous in volume, diverse in type, and has a complicated multidimensional structure, is internet applications and communications. Today, several measures are routinely performed with no assurance that any of them will be helpful in understanding the phenomenon of interest in an era of automatic, large-scale data collection. Online transactions that involve buying, selling, or even investing are all examples of e-commerce. As a result, they generate data that has a complex structure and a high dimension. The usual data storage techniques cannot handle those enormous volumes of data. There is a lot of work being done to find ways to minimize the dimensionality of big data in order to provide analytics reports that are even more accurate and data visualizations that are more interesting. As a result, the purpose of this survey study is to give an overview of big data analytics along with related problems and issues that go beyond technology
    • …
    corecore