43,865 research outputs found
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Semi-automatic semantic enrichment of raw sensor data
One of the more recent sources of large volumes of generated data is sensor devices, where dedicated sensing equipment is used to monitor events and happenings in a wide range of domains, including monitoring human biometrics. In recent trials to examine the effects that key moments in movies have on the human body, we fitted fitted with a number of biometric sensor devices and monitored them as they watched a range of dierent movies in groups. The purpose of these experiments was to examine the correlation between humans' highlights in movies as observed from biometric sensors, and highlights in the same movies as identified by our automatic movie analysis techniques. However,the problem with this type of experiment is that both the analysis of the video stream and the sensor data readings are not directly usable
in their raw form because of the sheer volume of low-level data values generated both from the sensors and from the movie analysis. This work describes the semi-automated enrichment of both video analysis and sensor data and the mechanism used to query the data in both centralised
environments, and in a peer-to-peer architecture when the number of sensor devices grows to large numbers. We present and validate a scalable means of semi-automating the semantic enrichment of sensor data, thereby providing a means of large-scale sensor management
Towards the cloudification of the social networks analytics
In the last years, with the increase of the available data from social networks and the rise of big data technologies, social data has emerged as one of the most profitable market for companies to increase their benefits. Besides, social computation scientists see such data as a vast ocean of information to study modern human societies. Nowadays, enterprises and researchers are developing their own mining tools in house, or they are outsourcing their social media mining needs to specialised companies with its consequent economical cost. In this paper, we present the first cloud computing service to facilitate the deployment of social media analytics applications to allow data practitioners to use social mining tools as a service. The main advantage of this service is the possibility to run different queries at the same time and combine their results in real time. Additionally, we also introduce twearch, a prototype to develop twitter mining algorithms as services in the cloud.Peer ReviewedPostprint (author’s final draft
BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking
Data generation is a key issue in big data benchmarking that aims to generate
application-specific data sets to meet the 4V requirements of big data.
Specifically, big data generators need to generate scalable data (Volume) of
different types (Variety) under controllable generation rates (Velocity) while
keeping the important characteristics of raw data (Veracity). This gives rise
to various new challenges about how we design generators efficiently and
successfully. To date, most existing techniques can only generate limited types
of data and support specific big data systems such as Hadoop. Hence we develop
a tool, called Big Data Generator Suite (BDGS), to efficiently generate
scalable big data while employing data models derived from real data to
preserve data veracity. The effectiveness of BDGS is demonstrated by developing
six data generators covering three representative data types (structured,
semi-structured and unstructured) and three data sources (text, graph, and
table data)
- …