22 research outputs found

    Archive for time series values measured from power monitoring

    Get PDF
    Tato práce se věnuje různým způsobům komprese binárních souborů, které obsahují měření elektrických veličin. Po představení používaných kompresních algoritmů a kódování dat je čtenář seznámen s ukládanými veličinami vyskytujícími se v archivech měření elektrické energie. Následuje testování zvolených kompresních algoritmů na reálných datech. Ve všech prováděných testech se ke kompresi používají bezeztrátové algoritmy Lzma, Bzip2 a Deflate. Před kompresí těmito algoritmy se data různě předběžně zpracovávají, aby bylo možné porovnat efektivitu komprese při různé reprezentaci dat v souborech. Druhá část práce se zabývá představením zatím netestovaného ztrátového způsobu komprese. Při předběžném zpracování se využívá třída přesnosti měřících přístrojů k výhodné úpravě dat. Podrobné výsledky jsou obsaženy v této zprávě.This work deals with various compression techniques, that are used to compress binary files, that contain measurements of electricity. Following the presentation of some popular compression algorithms and encodings, reader is acquainted with measurements of electrical energy stored in electrical energy archives. Chosen compression techniques are tested on real data. During all tests, tested algorithms are Bzip2, Lzma and Deflate. Before the compression by the mentioned algorithms, the data is variously pre-processed so that I can evaluate different data representations. Second part of the work is about introduction of new compression process. This technique uses accuracy class in the pre-processing part of compression to homogenize the data. Detailed results are included in this work

    Towards green scientific data compression through high-level I/O interfaces

    Get PDF
    Every HPC system today has to cope with a deluge of data generated by scientific applications, simulations or large- scale experiments. The upscaling of supercomputer systems and infrastructures, generally results in a dramatic increase of their energy consumption. In this paper, we argue that techniques like data compression can lead to significant gains in terms of power efficiency by reducing both network and storage requirements. To that end, we propose a novel methodology for achieving on-the-fly intelligent determination of energy efficient data reduction for a given data set by leveraging state-of-the-art compression algorithms and meta data at application-level I/O. We motivate our work by analyzing the energy and storage saving needs of real-life scientific HPC applications, and review the various compression techniques that can be applied. We find that the resulting data reduction can decrease the data volume transferred and stored by as much as 80% in some cases, consequently leading to significant savings in storage and networking costs

    Data Compression For Energy-Efficiency Web Access On Mobile Devices

    Get PDF
    Nowadays, wireless data connections (2G, 3G and WiFi) have been the main- stream technologies for accessing Internet on modern mobile devices. However, users aware that heavy use of data transmission for web access via wireless interfaces leads battery life drain badly. In order to extend battery life time and improve user experience, we present the solution for offering "energy-efficiency web access on mobile devices". A new compression strategy named selective-compression is introduced as an improvement of traditional HTTP compression in this thesis. The selective-compression strategy can properly handle binaries of web contents. And its mechanism relies on client/remote proxy pair structure. From analysis of the experiment results, we make conclusion that the selective-compression strategy can bring nice benefits for energy saving and delay deduction on mobile devices while accessing web pages that include massive binaries. Furthermore, we give the suggestion to web developers and web service providers about how to create energy-efficient web pages

    Using semantic knowledge to improve compression on log files

    Get PDF
    With the move towards global and multi-national companies, information technology infrastructure requirements are increasing. As the size of these computer networks increases, it becomes more and more difficult to monitor, control, and secure them. Networks consist of a number of diverse devices, sensors, and gateways which are often spread over large geographical areas. Each of these devices produce log files which need to be analysed and monitored to provide network security and satisfy regulations. Data compression programs such as gzip and bzip2 are commonly used to reduce the quantity of data for archival purposes after the log files have been rotated. However, there are many other compression programs which exist - each with their own advantages and disadvantages. These programs each use a different amount of memory and take different compression and decompression times to achieve different compression ratios. System log files also contain redundancy which is not necessarily exploited by standard compression programs. Log messages usually use a similar format with a defined syntax. In the log files, all the ASCII characters are not used and the messages contain certain "phrases" which often repeated. This thesis investigates the use of compression as a means of data reduction and how the use of semantic knowledge can improve data compression (also applying results to different scenarios that can occur in a distributed computing environment). It presents the results of a series of tests performed on different log files. It also examines the semantic knowledge which exists in maillog files and how it can be exploited to improve the compression results. The results from a series of text preprocessors which exploit this knowledge are presented and evaluated. These preprocessors include: one which replaces the timestamps and IP addresses with their binary equivalents and one which replaces words from a dictionary with unused ASCII characters. In this thesis, data compression is shown to be an effective method of data reduction producing up to 98 percent reduction in filesize on a corpus of log files. The use of preprocessors which exploit semantic knowledge results in up to 56 percent improvement in overall compression time and up to 32 percent reduction in compressed size.TeXpdfTeX-1.40.

    Doctor of Philosophy

    Get PDF
    dissertationIn the past few years, we have seen a tremendous increase in digital data being generated. By 2011, storage vendors had shipped 905 PB of purpose-built backup appliances. By 2013, the number of objects stored in Amazon S3 had reached 2 trillion. Facebook had stored 20 PB of photos by 2010. All of these require an efficient storage solution. To improve space efficiency, compression and deduplication are being widely used. Compression works by identifying repeated strings and replacing them with more compact encodings while deduplication partitions data into fixed-size or variable-size chunks and removes duplicate blocks. While we have seen great improvements in space efficiency from these two approaches, there are still some limitations. First, traditional compressors are limited in their ability to detect redundancy across a large range since they search for redundant data in a fine-grain level (string level). For deduplication, metadata embedded in an input file changes more frequently, and this introduces more unnecessary unique chunks, leading to poor deduplication. Cloud storage systems suffer from unpredictable and inefficient performance because of interference among different types of workloads. This dissertation proposes techniques to improve the effectiveness of traditional compressors and deduplication in improving space efficiency, and a new IO scheduling algorithm to improve performance predictability and efficiency for cloud storage systems. The common idea is to utilize similarity. To improve the effectiveness of compression and deduplication, similarity in content is used to transform an input file into a compression- or deduplication-friendly format. We propose Migratory Compression, a generic data transformation that identifies similar data in a coarse-grain level (block level) and then groups similar blocks together. It can be used as a preprocessing stage for any traditional compressor. We find metadata have a huge impact in reducing the benefit of deduplication. To isolate the impact from metadata, we propose to separate metadata from data. Three approaches are presented for use cases with different constrains. For the commonly used tar format, we propose Migratory Tar: a data transformation and also a new tar format that deduplicates better. We also present a case study where we use deduplication to reduce storage consumption for storing disk images, while at the same time achieving high performance in image deployment. Finally, we apply the same principle of utilizing similarity in IO scheduling to prevent interference between random and sequential workloads, leading to efficient, consistent, and predictable performance for sequential workloads and a high disk utilization

    Analysing and comparing problem landscapes for black-box optimization via length scale

    Get PDF

    Robust data protection and high efficiency for IoTs streams in the cloud

    Get PDF
    Remotely generated streaming of the Internet of Things (IoTs) data has become a vital category upon which many applications rely. Smart meters collect readings for household activities such as power and gas consumption every second - the readings are transmitted wirelessly through various channels and public hops to the operation centres. Due to the unusually large streams sizes, the operation centres are using cloud servers where various entities process the data on a real-time basis for billing and power management. It is possible that smart pipe projects (where oil pipes are continuously monitored using sensors) and collected streams are sent to the public cloud for real-time flawed detection. There are many other similar applications that can render the world a convenient place which result in climate change mitigation and transportation improvement to name a few. Despite the obvious advantages of these applications, some unique challenges arise posing some questions regarding a suitable balance between guaranteeing the streams security, such as privacy, authenticity and integrity, while not hindering the direct operations on those streams, while also handling data management issues, such as the volume of protected streams during transmission and storage. These challenges become more complicated when the streams reside on third-party cloud servers. In this thesis, a few novel techniques are introduced to address these problems. We begin by protecting the privacy and authenticity of transmitted readings without disrupting the direct operations. We propose two steganography techniques that rely on different mathematical security models. The results look promising - security: only the approved party who has the required security tokens can retrieve the hidden secret, and distortion effect with the difference between the original and protected readings that are almost at zero. This means the streams can be used in their protected form at intermediate hops or third party servers. We then improved the integrity of the transmitted protected streams which are prone to intentional or unintentional noise - we proposed a secure error detection and correction based stenographic technique. This allows legitimate recipients to (1) detect and recover any noise loss from the hidden sensitive information without privacy disclosure, and (2) remedy the received protected readings by using the corrected version of the secret hidden data. It is evident from the experiments that our technique has robust recovery capabilities (i.e. Root Mean Square (RMS) <0.01%, Bit Error Rate (BER) = 0 and PRD < 1%). To solve the issue of huge transmitted protected streams, two compression algorithms for lossless IoTs readings are introduced to ensure the volume of protected readings at intermediate hops is reduced without revealing the hidden secrets. The first uses Gaussian approximation function to represent IoTs streams in a few parameters regardless of the roughness in the signal. The second reduces the randomness of the IoTs streams into a smaller finite field by splitting to enhance repetition and avoiding the floating operations round errors issues. Under the same conditions, our both techniques were superior to existing models mathematically (i.e. the entropy was halved) and empirically (i.e. achieved ratio was 3.8:1 to 4.5:1). We were driven by the question ‘Can the size of multi-incoming compressed protected streams be re-reduced on the cloud without decompression?’ to overcome the issue of vast quantities of compressed and protected IoTs streams on the cloud. A novel lossless size reduction algorithm was introduced to prove the possibility of reducing the size of already compressed IoTs protected readings. This is successfully achieved by employing similarity measurements to classify the compressed streams into subsets in order to reduce the effect of uncorrelated compressed streams. The values of every subset was treated independently for further reduction. Both mathematical and empirical experiments proved the possibility of enhancing the entropy (i.e. almost reduced by 50%) and the resultant size reduction (i.e. up to 2:1)

    Analysis for Scalable Coding of Quality-Adjustable Sensor Data

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 신현식.Machine-generated data such as sensor data now comprise major portion of available information. This thesis addresses two important problems: storing of massive sensor data collection and efficient sensing. We first propose a quality-adjustable sensor data archiving, which compresses entire collection of sensor data efficiently without compromising key features. Considering the data aging aspect of sensor data, we make our archiving scheme capable of controlling data fidelity to exploit less frequent data access of user. This flexibility on quality adjustability leads to more efficient usage of storage space. In order to store data from various sensor types in cost-effective way, we study the optimal storage configuration strategy using analytical models that capture characteristics of our scheme. This strategy helps storing sensor data blocks with the optimal configurations that maximizes data fidelity of various sensor data under given storage space. Next, we consider efficient sensing schemes and propose a quality-adjustable sensing scheme. We adopt compressive sensing (CS) that is well suited for resource-limited sensors because of its low computational complexity. We enhance quality adjustability intrinsic to CS with quantization and especially temporal downsampling. Our sensing architecture provides more rate-distortion operating points than previous schemes, which enables sensors to adapt data quality in more efficient way considering overall performance. Moreover, the proposed temporal downsampling improves coding efficiency that is a drawback of CS. At the same time, the downsampling further reduces computational complexity of sensing devices, along with sparse random matrix. As a result, our quality-adjustable sensing can deliver gains to a wide variety of resource-constrained sensing techniques.Abstract i Contents iii List of Figures vi List of Tables x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Spatio-Temporal Correlation in Sensor Data 3 1.3 Quality Adjustability of Sensor Data 7 1.4 Research Contributions 9 1.5 Thesis Organization 11 Chapter 2 Archiving of Sensor Data 12 2.1 Encoding Sensor Data Collection 12 2.1.1 Archiving Architecture 13 2.1.2 Data Conversion 16 2.2 Compression Ratio Comparison 20 2.3 Quality-Adjustable Archiving Model 25 2.3.1 Data Fidelity Model: Rate 25 2.3.2 Data Fidelity Model: Distortion 28 2.4 QP-Rate-Distortion Model 36 2.5 Optimal Rate Allocation 40 2.5.1 Rate Allocation Strategy 40 2.5.2 Optimal Storage Configuration 41 2.5.3 Experimental Results 44 Chapter 3 Scalable Management of Storage 46 3.1 Scalable Quality Management 46 3.1.1 Archiving Architecture 47 3.1.2 Compression Ratio Comparison 49 3.2 Enhancing Quality Adjustability 51 3.2.1 Data Fidelity Model: Rate 52 3.2.2 Data Fidelity Model: Distortion 55 3.3 Optimal Rate Allocation 59 3.3.1 Rate Allocation Strategy 60 3.3.2 Optimal Storage Configuration 63 3.3.3 Experimental Results 71 Chapter 4 Quality-Adjustable Sensing 73 4.1 Compressive Sensing 73 4.1.1 Compressive Sensing Problem 74 4.1.2 General Signal Recovery 76 4.1.3 Noisy Signal Recovery 76 4.2 Quality Adjustability in Sensing Environment 77 4.2.1 Quantization and Temporal Downsampling 79 4.2.2 Optimization with Error Model 85 4.3 Low-Complexity Sensing 88 4.3.1 Sparse Random Matrix 89 4.3.2 Resource Savings 92 Chapter 5 Conclusions 96 5.1 Summary 96 5.2 Future Research Directions 98 Bibliography 100 Abstract in Korean 109Docto

    A framework for mobile SOA using compression

    Get PDF
    The widely accepted standards of Service-Oriented Architecture (SOA) have changed the way many organisations conduct their everyday business. The significant popularity of mobile devices has seen a rapid increase in the rate of mobile technology enhancements, which have become widely used for communication, as well as conducting everyday tasks. An increased requirement in many businesses is for staff not to be tied down to the office. Consequently, mobile devices play an important role in achieving the mobility and information access that people desire. Due to the popularity and increasing use of SOA and mobile devices, Mobile Service-Oriented Architecture (Mobile SOA) has become a new industry catch-phrase. Many challenges, however, exist within the Mobile SOA environment. These issues include limitations on mobile devices, such as a reduced screen size, lack of processing power, insufficient processing memory, limited battery life, poor storage capacity, unreliable network connections, limited bandwidth available and high transfer costs. This research aimed to provide an elegant solution to the issues of a mobile device, which hinders the performance of Mobile SOA. The main objective of this research was to improve the effectiveness and efficiency of Mobile SOA. In order to achieve this goal, a framework was proposed, which supported intelligent compression of files used within a Web Service. The proposed framework provided a set of guidelines that facilitate the quick development of a system. A proof-of-concept prototype was developed, based on these guidelines and the framework design principles. The prototype provided practical evidence of the effectiveness of implementing a system based on the proposed framework. An analytical evaluation was conducted to determine the effectiveness of the prototype within the Mobile SOA environment. A performance evaluation was conducted to determine efficiency it provides. Additionally, the performance evaluation highlighted the decrease in file transfer time, as well as the significant reduction in transfer costs. The analytical and performance evaluations demonstrated that the prototype optimises the effectiveness and efficiency of Mobile SOA. The framework could, thus, be used to facilitate efficient file transfer between a Server and (Mobile) Client
    corecore