525 research outputs found

    Scaling archived social media data analysis using a hadoop cloud

    Get PDF
    Over recent years, there has been an emerging interest in supporting social media analysis for marketing, opin- ion analysis and understanding community cohesion. Social media data conforms to many of the categorisations attributed to “big-data” – i.e. volume, velocity and variety. Generally analysis needs to be undertaken over large volumes of data in an efficient and timely manner. A variety of computational infrastructures have been reported to achieve this. We present the COSMOS platform supporting sentiment and tension analysis on Twitter data, and demonstrate how this platform can be scaled using the OpenNebula Cloud environment with Map/Reduce-based analysis using Hadoop. In particular, we describe the types of system configurations that would be most useful from a performance perspective – i.e. how virtual machines in the infrastructure should be distributed to reduce variability in the analysis performance. We demonstrate the approach using a data set consisting of several million Twitter messages, analysed over two types of Cloud infrastructure

    A Survey on Big data Analytics in Cloud Environment

    Get PDF
    The continuous and rapid growth in the volume of data captured by organizations, such as social media, Internet of Things (IoT), machines, multimedia, GPS has produced an overwhelming flow of data. Data creation is occurring at a record rate, referred to as big data, and has emerged as a widely recognized trend. To take advantage of big data, real-time analysis and reporting must be provided in tandem with the massive capacity needed to store and process the data. Big data is affecting organization such as Banking, Education, Government, Health care, Manufacturing, retails and eventually, the society. On the other hand, Cloud computing eliminates the need to maintain expensive computing hardware, dedicated space, and software. Cloud provides larger volume of space for the storage and different set of services for all kind of applications to the cloud customers. Therefore, all the companies are nowadays migrating their applications towards cloud environment, because of the huge reduce in the overall investment and greater flexibility provided by the cloud

    From Social Data Mining to Forecasting Socio-Economic Crisis

    Full text link
    Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c

    QMachine: commodity supercomputing in web browsers

    Get PDF

    Service Oriented Big Data Management for Transport

    No full text
    International audienceThe increasing power of computer hardware and the sophistication of computer software have brought many new possibilities to information world. On one side the possibility to analyse massive data sets has brought new insight, knowledge and information. On the other, it has enabled to massively distribute computing and has opened to a new programming paradigm called Service Oriented Computing particularly well adapted to cloud computing. Applying these new technologies to the transport industry can bring new understanding to town transport infrastructures. The objective of our work is to manage and aggregate cloud services for managing big data and assist decision making for transport systems. Thus this paper presents our approach to propose a service oriented architecture for big data analytics for transport systems based on the cloud. Proposing big data management strategies for data produced by transport infra‐ structures, whilst maintaining cost effective systems deployed on the cloud, is a promising approach. We present the advancement for developing the Data acquisition service and Information extraction and cleaning service as well as the analysis for choosing a sharding strategy

    Distributed OAIS-Based digital preservation system with HDFS technology

    Get PDF
    The paper describes architecture of a distributed OAIS-based digital preservation system which uses HDFS as a file storage system and supports wide distribution on a number of cluster's nodes. It is based on Apache Hadoop framework - a reliable open source solution with well horizontally scalable distributed architecture. Novelty of the proposed system is defined by the fact that none of existing OAIS digital preservation systems use HDFS storage for both structured and unstructured data archiving. Implementation of the system's prototype and results of its testing are also shown

    Metocean Big Data Processing Using Hadoop

    Get PDF
    This report will discuss about MapReduce and how it handles big data. In this report, Metocean (Meteorology and Oceanography) Data will be used as it consist of large data. As the number and type of data acquisition devices grows annually, the sheer size and rate of data being collected is rapidly expanding. These big data sets can contain gigabytes or terabytes of data, and can grow on the order of megabytes or gigabytes per day. While the collection of this information presents opportunities for insight, it also presents many challenges. Most algorithms are not designed to process big data sets in a reasonable amount of time or with a reasonable amount of memory. MapReduce allows us to meet many of these challenges to gain important insights from large data sets. The objective of this project is to use MapReduce to handle big data. MapReduce is a programming technique for analysing data sets that do not fit in memory. The problem statement chapter in this project will discuss on how MapReduce comes as an advantage to deal with large data. The literature review part will explain the definition of NoSQL and RDBMS, Hadoop Mapreduce and big data, things to do when selecting database, NoSQL database deployments, scenarios for using Hadoop and Hadoop real world example. The methodology part will explain the waterfall method used in this project development. The result and discussion will explain in details the result and discussion from my project. The last chapter in this project report is conclusion and recommendatio
    corecore