Search CORE

525 research outputs found

Scaling archived social media data analysis using a hadoop cloud

Author: Burnap Peter
Conejero Javier
Morgan Jeffrey
Rana Omer Farooq
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Over recent years, there has been an emerging interest in supporting social media analysis for marketing, opin- ion analysis and understanding community cohesion. Social media data conforms to many of the categorisations attributed to “big-data” – i.e. volume, velocity and variety. Generally analysis needs to be undertaken over large volumes of data in an efficient and timely manner. A variety of computational infrastructures have been reported to achieve this. We present the COSMOS platform supporting sentiment and tension analysis on Twitter data, and demonstrate how this platform can be scaled using the OpenNebula Cloud environment with Map/Reduce-based analysis using Hadoop. In particular, we describe the types of system configurations that would be most useful from a performance perspective – i.e. how virtual machines in the infrastructure should be distributed to reduce variability in the analysis performance. We demonstrate the approach using a data set consisting of several million Twitter messages, analysed over two types of Cloud infrastructure

Online Research @ Cardiff

A Survey on Big data Analytics in Cloud Environment

Author: Miss. Sohile Kent, Dr. Lakshmi Prasad Saikia
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/01/2017
Field of study

The continuous and rapid growth in the volume of data captured by organizations, such as social media, Internet of Things (IoT), machines, multimedia, GPS has produced an overwhelming flow of data. Data creation is occurring at a record rate, referred to as big data, and has emerged as a widely recognized trend. To take advantage of big data, real-time analysis and reporting must be provided in tandem with the massive capacity needed to store and process the data. Big data is affecting organization such as Banking, Education, Government, Health care, Manufacturing, retails and eventually, the society. On the other hand, Cloud computing eliminates the need to maintain expensive computing hardware, dedicated space, and software. Cloud provides larger volume of space for the storage and different set of services for all kind of applications to the cloud customers. Therefore, all the companies are nowadays migrating their applications towards cloud environment, because of the huge reduce in the overall investment and greater flexibility provided by the cloud

International Journal on Recent and Innovation Trends in Computing and Communication

From Social Data Mining to Forecasting Socio-Economic Crisis

Author: A. Barabasi
A. Barabasi
A. Diekmann
A. Halevy
A. Monreale
A. Szalay
A. Vespignani
B. Kluger
B.-C. Chen
B.B. Mandelbrot
B.C. Chen
C. Cattuto
C. Doctorow
C. Lynch
D. Helbing
D. Helbing
D. Helbing
D. Helbing
D. Helbing
D. Helbing
D. Lazer
E. Ostrom
E. Ravenstein
E.F. Fama
E.J. Candes
G. Sugihara
G. Ziegler
G.K. Zipf
I. Foster
Ioannidis P.A. John
J. Danãelsson
J. Krumm
J.H. Fowler
K.P. Smith
L. Molgedey
L. Odling-Smee
M. Atzori
M. Scheffer
M.J. Salganik
M.M. Gaber
N. Eagle
N.A. Christakis
N.A. Christakis
N.A. Christakis
N.F. Johnson
P. Bajaria
R.K. Merton
S. Balietti
S. Nelson
S.E. Asch
S.H. Muggleton
S.V. Buldyrev
W.S. Bainbridge
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2011
Field of study

Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

QMachine: commodity supercomputing in web browsers

Author
Publication venue: BioMed Central
Publication date: 09/06/2014
Field of study

Springer - Publisher Connector

Service Oriented Big Data Management for Transport

Author: BK Tannahill
D-H Lee
Domenico Talia
EE Schadt
H Demirkan
H. V. Jagadish
J Lin
NJ Yuan
P Buneman
R Cattell
S Hipgrave
V Gulisano
W Yan
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

International audienceThe increasing power of computer hardware and the sophistication of computer software have brought many new possibilities to information world. On one side the possibility to analyse massive data sets has brought new insight, knowledge and information. On the other, it has enabled to massively distribute computing and has opened to a new programming paradigm called Service Oriented Computing particularly well adapted to cloud computing. Applying these new technologies to the transport industry can bring new understanding to town transport infrastructures. The objective of our work is to manage and aggregate cloud services for managing big data and assist decision making for transport systems. Thus this paper presents our approach to propose a service oriented architecture for big data analytics for transport systems based on the cloud. Proposing big data management strategies for data produced by transport infra‐ structures, whilst maintaining cost effective systems deployed on the cloud, is a promising approach. We present the advancement for developing the Data acquisition service and Information extraction and cleaning service as well as the analysis for choosing a sharding strategy

Crossref

Hal - Université Grenoble Alpes

HAL

Hal-Diderot

Distributed OAIS-Based digital preservation system with HDFS technology

Author: Igor Nikiforov
Nikita Voinov
Pavel Drobintsev
Vsevolod Kotlyarov
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2017
Field of study

The paper describes architecture of a distributed OAIS-based digital preservation system which uses HDFS as a file storage system and supports wide distribution on a number of cluster's nodes. It is based on Apache Hadoop framework - a reliable open source solution with well horizontally scalable distributed architecture. Novelty of the proposed system is defined by the fact that none of existing OAIS digital preservation systems use HDFS storage for both structured and unstructured data archiving. Implementation of the system's prototype and results of its testing are also shown

Directory of Open Access Journals

Metocean Big Data Processing Using Hadoop

Author: Md Yusof Nadiatul Akmal
Publication venue: 'Whiting & Birch, Ltd.'
Publication date: 01/05/2015
Field of study

This report will discuss about MapReduce and how it handles big data. In this report, Metocean (Meteorology and Oceanography) Data will be used as it consist of large data. As the number and type of data acquisition devices grows annually, the sheer size and rate of data being collected is rapidly expanding. These big data sets can contain gigabytes or terabytes of data, and can grow on the order of megabytes or gigabytes per day. While the collection of this information presents opportunities for insight, it also presents many challenges. Most algorithms are not designed to process big data sets in a reasonable amount of time or with a reasonable amount of memory. MapReduce allows us to meet many of these challenges to gain important insights from large data sets. The objective of this project is to use MapReduce to handle big data. MapReduce is a programming technique for analysing data sets that do not fit in memory. The problem statement chapter in this project will discuss on how MapReduce comes as an advantage to deal with large data. The literature review part will explain the definition of NoSQL and RDBMS, Hadoop Mapreduce and big data, things to do when selecting database, NoSQL database deployments, scenarios for using Hadoop and Hadoop real world example. The methodology part will explain the waterfall method used in this project development. The result and discussion will explain in details the result and discussion from my project. The last chapter in this project report is conclusion and recommendatio

UTPedia