54,430 research outputs found
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Dealing with small Files in HPC Environments: automatic Loop-Back Mounting of Disk Images
Processing of large numbers (hundreds of thousands) of small files (i.e., up to a few KB) is notoriously problematic for all modern parallel file systems. While modern storage solutions provide high and scalable bandwidth through parallel storage servers connected with a high-speed network, accessing small files is sequential and latency-bounded. Paradoxically, performance of file access is worse than if the files were stored on a local hard drive. We present a generic solution for large-scale HPC facilities that improves the performance of workflows dealing with large numbers of small file. The files are saved inside a single large file containing a disk image, similarly to an archive. When needed, the image is mounted through the Unix loop-back device, and the contents of the image are available to the user in the form of a usual directory tree. Since mounting of disks under Unix often requires super-user privileges, security concerns and possible ways to address them are considered. A complete Python implementation of image creation, mounting, and unmounting framework is presented. A seamless integration into HPC environments managed by SLURM is discussed on an example of read-only software modules created by administrators, and user-created disk images with read-only application input data. Finally, results of performance benchmarks carried out on the Abel supercomputer facility in Oslo, Norway, are shown
Scalable Reliable SD Erlang Design
This technical report presents the design of Scalable Distributed (SD) Erlang: a set of language-level changes that aims to enable Distributed Erlang to scale for server applications on commodity hardware with at most 100,000 cores. We cover a number of aspects, specifically anticipated architecture, anticipated failures, scalable data structures, and scalable computation. Other two components that guided us in the design of SD Erlang are design principles and typical Erlang applications. The design principles summarise the type of modifications we aim to allow Erlang scalability. Erlang exemplars help us to identify the main Erlang scalability issues and hypothetically validate the SD Erlang design
- …