2,346 research outputs found
A systematic review of SQL-on-Hadoop by using compact data formats
Article also submitted for publication in Baltic J. Modern Computing (BJMC) on October 5, 2016.There are huge volumes of raw data generated every day. The question is how to store these data in order to provide faster data access. The research direction in Big Data projects using Hadoop Technology, MapReduce kind of framework and compact data formats shows that two data formats (Avro and Parquet) support schema evolution and compression in order to utilize less storage space. In this paper, a systematic review of SQL-on-Hadoop by using Avro and Parquet has been performed over the past six years (2010–2015) using publications of conference proceedings and journals of IEEEXplore, ACM Digital Library, ScienceDirect. With the help of search strategy followed, 94 research papers have been identified out of which 17 have been analyzed as relevant papers. At the end, the conclusion has been made that direct comparison by compactness and fastness between Avro and Parquet do not exist in data science
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Big Data Privacy Context: Literature Effects On Secure Informational Assets
This article's objective is the identification of research opportunities in
the current big data privacy domain, evaluating literature effects on secure
informational assets. Until now, no study has analyzed such relation. Its
results can foster science, technologies and businesses. To achieve these
objectives, a big data privacy Systematic Literature Review (SLR) is performed
on the main scientific peer reviewed journals in Scopus database. Bibliometrics
and text mining analysis complement the SLR. This study provides support to big
data privacy researchers on: most and least researched themes, research
novelty, most cited works and authors, themes evolution through time and many
others. In addition, TOPSIS and VIKOR ranks were developed to evaluate
literature effects versus informational assets indicators. Secure Internet
Servers (SIS) was chosen as decision criteria. Results show that big data
privacy literature is strongly focused on computational aspects. However,
individuals, societies, organizations and governments face a technological
change that has just started to be investigated, with growing concerns on law
and regulation aspects. TOPSIS and VIKOR Ranks differed in several positions
and the only consistent country between literature and SIS adoption is the
United States. Countries in the lowest ranking positions represent future
research opportunities.Comment: 21 pages, 9 figure
AAPOR Report on Big Data
In recent years we have seen an increase in the amount of statistics in society describing different phenomena based on so called Big Data. The term Big Data is used for a variety of data as explained in the report, many of them characterized not just by their large volume, but also by their variety and velocity, the organic way in which they are created, and the new types of processes needed to analyze them and make inference from them. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for survey research.There is a great potential in Big Data but there are some fundamental challenges that have to be resolved before its full potential can be realized. In this report we give examples of different types of Big Data and their potential for survey research. We also describe the Big Data process and discuss its main challenges
- …