4,986 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Waveform Signal Entropy and Compression Study of Whole-Building Energy Datasets

    Full text link
    Electrical energy consumption has been an ongoing research area since the coming of smart homes and Internet of Things devices. Consumption characteristics and usages profiles are directly influenced by building occupants and their interaction with electrical appliances. Extracted information from these data can be used to conserve energy and increase user comfort levels. Data analysis together with machine learning models can be utilized to extract valuable information for the benefit of occupants themselves, power plants, and grid operators. Public energy datasets provide a scientific foundation to develop and benchmark these algorithms and techniques. With datasets exceeding tens of terabytes, we present a novel study of five whole-building energy datasets with high sampling rates, their signal entropy, and how a well-calibrated measurement can have a significant effect on the overall storage requirements. We show that some datasets do not fully utilize the available measurement precision, therefore leaving potential accuracy and space savings untapped. We benchmark a comprehensive list of 365 file formats, transparent data transformations, and lossless compression algorithms. The primary goal is to reduce the overall dataset size while maintaining an easy-to-use file format and access API. We show that with careful selection of file format and encoding scheme, we can reduce the size of some datasets by up to 73%

    A Parallel Data Compression Framework for Large Scale 3D Scientific Data

    Full text link
    Large scale simulations of complex systems ranging from climate and astrophysics to crowd dynamics, produce routinely petabytes of data and are projected to reach the zettabytes level in the coming decade. These simulations enable unprecedented insights but at the same their effectiveness is hindered by the enormous data sizes associated with the computational elements and respective output quantities of interest that impose severe constraints on storage and I/O time. In this work, we address these challenges through a novel software framework for scientific data compression. The software (CubismZ) incorporates efficient wavelet based techniques and the state-of-the-art ZFP, SZ and FPZIP floating point compressors. The framework relies on a block-structured data layout, benefits from OpenMP and MPI and targets supercomputers based on multicores. CubismZ can be used as a tool for ex situ (offline) compression of scientific datasets and supports conventional Computational Fluid Dynamics (CFD) file formats. Moreover, it provides a testbed of comparison, in terms of compression factor and peak signal-to-noise ratio, for a number of available data compression methods. The software yields in situ compression ratios of 100x or higher for fluid dynamics data produced by petascale simulations of cloud cavitation collapse using O(1011)\mathcal{O}(10^{11}) grid cells, with negligible impact on the total simulation time.Comment: 26 pages, 12 figures, open-source softwar

    Improving Quality of Service and Reducing Power Consumption with WAN accelerator in Cloud Computing Environments

    Full text link
    The widespread use of cloud computing services is expected to deteriorate a Quality of Service and toincrease the power consumption of ICT devices, since the distance to a server becomes longer than before. Migration of virtual machines over a wide area can solve many problems such as load balancing and power saving in cloud computing environments. This paper proposes to dynamically apply WAN accelerator within the network when a virtual machine is moved to a distant center, in order to prevent the degradation in performance after live migration of virtual machines over a wide area. mSCTP-based data transfer using different TCP connections before and after migration is proposed in order to use a currently available WAN accelerator. This paper does not consider the performance degradation of live migration itself. Then, this paper proposes to reduce the power consumption of ICT devices, which consists of installing WAN accelerators as part of cloud resources actively and increasing the packet transfer rate of communication link temporarily. It is demonstrated that the power consumption with WAN accelerator could be reduced to one-tenth of that without WAN accelerator.Comment: 12 pages, International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.1, January 201

    Towards Media Intercloud Standardization Evaluating Impact of Cloud Storage Heterogeneity

    Full text link
    Digital media has been increasing very rapidly, resulting in cloud computing's popularity gain. Cloud computing provides ease of management of large amount of data and resources. With a lot of devices communicating over the Internet and with the rapidly increasing user demands, solitary clouds have to communicate to other clouds to fulfill the demands and discover services elsewhere. This scenario is called intercloud computing or cloud federation. Intercloud computing still lacks standard architecture. Prior works discuss some of the architectural blueprints, but none of them highlight the key issues involved and their impact, so that a valid and reliable architecture could be envisioned. In this paper, we discuss the importance of intercloud computing and present in detail its architectural components. Intercloud computing also involves some issues. We discuss key issues as well and present impact of storage heterogeneity. We have evaluated some of the most noteworthy cloud storage services, namely Dropbox, Amazon CloudDrive, GoogleDrive, Microsoft OneDrive (formerly SkyDrive), Box, and SugarSync in terms of Quality of Experience (QoE), Quality of Service (QoS), and storage space efficiency. Discussion on the results shows the acceptability level of these storage services and the shortcomings in their design.Comment: 13 pages. 14 figures, Springer Journal of Grid Computing, 201

    Analysis of Cloud Storage Information Security and It’s Various Methods

    Get PDF
    Cloud computing is the latest paradigm in IT field promising trends. It provides the resources similar to accessibility of data, minimum cost and several other uses. But the major issue for cloud is the security of the information which is stored in the cloud. Various methods and specialized techniques are combined together for providing information security to data which is stored in cloud in this paper. The aim of this paper is to analyze various cryptographic techniques and to discuss about various security techniques over cloud and user authentication which is most helpful and useful in the information security over cloud. DOI: 10.17762/ijritcc2321-8169.15028

    TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

    Full text link
    Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201

    Vignette: Perceptual Compression for Video Storage and Processing Systems

    Full text link
    Compressed videos constitute 70% of Internet traffic, and video upload growth rates far outpace compute and storage improvement trends. Past work in leveraging perceptual cues like saliency, i.e., regions where viewers focus their perceptual attention, reduces compressed video size while maintaining perceptual quality, but requires significant changes to video codecs and ignores the data management of this perceptual information. In this paper, we propose Vignette, a compression technique and storage manager for perception-based video compression. Vignette complements off-the-shelf compression software and hardware codec implementations. Vignette's compression technique uses a neural network to predict saliency information used during transcoding, and its storage manager integrates perceptual information into the video storage system to support a perceptual compression feedback loop. Vignette's saliency-based optimizations reduce storage by up to 95% with minimal quality loss, and Vignette videos lead to power savings of 50% on mobile phones during video playback. Our results demonstrate the benefit of embedding information about the human visual system into the architecture of video storage systems

    Recent Developments in Cloud Based Systems: State of Art

    Full text link
    Cloud computing is the new buzzword in the head of the techies round the clock these days. The importance and the different applications of cloud computing are overwhelming and thus, it is a topic of huge significance. It provides several astounding features like Multitenancy, on demand service, pay per use etc. This manuscript presents an exhaustive survey on cloud computing technology and potential research issues in cloud computing that needs to be addressed

    Power quality and electromagnetic compatibility: special report, session 2

    Get PDF
    The scope of Session 2 (S2) has been defined as follows by the Session Advisory Group and the Technical Committee: Power Quality (PQ), with the more general concept of electromagnetic compatibility (EMC) and with some related safety problems in electricity distribution systems. Special focus is put on voltage continuity (supply reliability, problem of outages) and voltage quality (voltage level, flicker, unbalance, harmonics). This session will also look at electromagnetic compatibility (mains frequency to 150 kHz), electromagnetic interferences and electric and magnetic fields issues. Also addressed in this session are electrical safety and immunity concerns (lightning issues, step, touch and transferred voltages). The aim of this special report is to present a synthesis of the present concerns in PQ&EMC, based on all selected papers of session 2 and related papers from other sessions, (152 papers in total). The report is divided in the following 4 blocks: Block 1: Electric and Magnetic Fields, EMC, Earthing systems Block 2: Harmonics Block 3: Voltage Variation Block 4: Power Quality Monitoring Two Round Tables will be organised: - Power quality and EMC in the Future Grid (CIGRE/CIRED WG C4.24, RT 13) - Reliability Benchmarking - why we should do it? What should be done in future? (RT 15
    • …
    corecore