49 research outputs found

    Operating System Support for High-Performance Solid State Drives

    Get PDF

    Improving Reliability and Performance of NAND Flash Based Storage System

    Get PDF
    High seek and rotation overhead of magnetic hard disk drive (HDD) motivates development of storage devices, which can offer good random performance. As an alternative technology, NAND flash memory demonstrates low power consumption, microsecond-order access latency and good scalability. Thanks to these advantages, NAND flash based solid state disks (SSD) show many promising applications in enterprise servers. With multi-level cell (MLC) technique, the per-bit fabrication cost is reduced and low production cost enables NAND flash memory to extend its application to the consumer electronics. Despite these advantages, limited memory endurance, long data protection latency and write amplification continue to be the major challenges in the designs of NAND flash storage systems. The limited memory endurance and long data protection latency issue derive from memory bit errors. High bit error rate (BER) severely impairs data integrity and reduces memory durance. The limited endurance is a major obstacle to apply NAND flash memory to the application with high reliability requirement. To protect data integrity, hard-decision error correction codes (ECC) such as Bose-Chaudhuri-Hocquenghem (BCH) are employed. However, the hardware cost becomes prohibitively with the increase of BER when the BCH ECC is employed to extend system lifetime. To extend system lifespan without high hardware cost, we has proposed data pattern aware (DPA) error prevention system design. DPA realizes BER reduction by minimizing the occurrence of data patterns vulnerable to high BER with simple linear feedback shift register circuits. Experimental results show that DPA can increase the system lifetime by up to 4× with marginal hardware cost. With the technology node scaling down to 2Xnm, BER increases up to 0.01. Hard-decision ECCs and DPA are no longer applicable to guarantee data integrity due to either prohibitively high hardware cost or high storage overhead. Soft-decision ECC, such as lowdensity parity check (LDPC) code, has been introduced to provide more powerful error correction capability. However, LDPC code demands extra memory sensing operations, directly leading to long read latency. To reduce LDPC code induced read latency without adverse impact on system reliability, we has proposed FlexLevel NAND flash storage system design. The FlexLevel design reduces BER by broadening the noise margin via threshold voltage (Vth) level reduction. Under relatively low BER, no extra sensing level is required and therefore read performance can be improved. To balance Vth level reduction induced capacity loss and the read speedup, the FlexLevel design identifies the data with high LDPC overhead and only performs Vth reduction to these data. Experimental results show that compared with the best existing works, the proposed design achieves up to 11% read speedup with negligible capacity loss. Write amplification is a major cause to performance and endurance degradation of the NAND flash based storage system. In the object-based NAND flash device (ONFD), write amplification partially results from onode partial update and cascading update. Onode partial update only over-writes partial data of a NAND flash page and incurs unnecessary data migration of the un-updated data. Cascading update is update to object metadata in a cascading manner due to object data update or migration. Even through only several bytes in the object metadata are updated, one or more page has to be re-written, significantly degrading write performance. To minimize write operations incurred by onode partial update and cascading update, we has proposed a Data Migration Minimizing (DMM) device design. The DMM device incorporates 1) the multi-level garbage collection technique to minimize the unnecessary data migration of onode partial update and 2) the virtual B+ tree and diff cache to reduce the write operations incurred by cascading update. The experiment results demonstrate that the DMM device can offer up to 20% write reduction compared with the best state-of-art works

    Modelação de consumo de energia em Linux

    Get PDF
    Dissertação de mestrado integrado em Engenharia de Telecomunicações e InformáticaNestes últimos anos, a importância da eficiência energética nos sistemas informáticos tem vindo a crescer exponencialmente desde a área móvel até à computação de alto desempenho. O crescimento do mercado dos gadgets e a sua crescente dependência dos serviços de computação em Cloud são algumas das razões para este rápido avanço neste ramo tecnológico. A redução do consumo de energia e o aumento da produtividade de um sistema é, por várias razões, uma preocupação, tanto por parte do cliente como do fabricante. A exploração aprofundada do tópico pode contribuir para, por exemplo, o aumento da autonomia dos terminais móveis e a redução dos custos energéticos dos data centers e do cliente particular. Neste sentido, a monitorização do consumo de energia de um sistema desempenha um papel fundamental para se aprimorar a eficiência energética do mesmo. No entanto, o grande desafio atual da monitorização passa por categorizar o consumo de energia a vários níveis como, por processo, por máquina virtual ou por subsistema de hardware. Esta escala de granularidade na análise energética permite o desenvolvimento de relatórios mais incisivos e conclusivos sobre a distribuição de consumo de energia do sistema. No âmbito desta dissertação e tendo por base estes motivantes fatores, foi desenvolvido um modelo simples que visa estimar o consumo de energia do sistema na sua totalidade, categorizado por subsistema e, no caso do armazenamento secundário, também categorizado por processo. Este documento apresenta um estudo sobre diferentes metodologias de medição assim como sobre as abordagens possíveis para o modelo em sistemas de teste com diferentes tipos de armazenamento secundário e com uma das ultimas gerações de processador da Intel. Numa fase inicial foi feita uma investigação a vários aspetos referentes à energia de um sistema, incluindo modelos de estimação do consumo de sistemas, métodos de medição de consumo e a precisão desses mesmos métodos. No desenvolvimento do modelo, recorreu-se apenas a recursos existentes no sistema em causa para viabilizar um mais fácil investimento a larga escala. Por isso, o modelo recorre a interfaces de gestão e monitorização de energia como o RAPL e ACPI. Foram analisados os mais importantes aspetos energéticos de um sistema como a distribuição do consumo de energia estático e dinâmico pelos subsistemas e avaliouse a eficiência e o desempenho dos mesmos nas mais diversas atividades. Desta forma, garantiu-se uma maior polivalência do modelo e, de um modo geral, uma maior precisão do mesmo. O modelo foi validado com base em ferramentas disponibilizadas pelo sistema e através de medições físicas. Os resultados obtidos parecem satisfatórios, tendo sido registadas taxas de erro máximas de 5% no consumo total e de 10% no consumo do armazenamento secundário. Este modelo desenvolvido pode ser adaptado a outro sistema, no entanto, necessita de executar uma ferramenta de calibragem que realiza todas as etapas que foram executadas na configuração usada para este projeto. Isto acontece essencialmente no subsistema de armazenamento secundário onde não se recorre a qualquer ferramenta existente para a estimação da sua energia. Desta forma, há uma etapa inicial que consiste na exercitação do subsistema de armazenamento secundário através de ferramentas de benchmarking e na recolha de dados estatísticos do custo das operações. Em seguida, é feito um estudo sobre esses mesmos dados e são atribuídos diferentes pesos energéticos para cada operação executada no subsistema. Depois, constrói-se o modelo e este é calibrado com recurso a interfaces de energia como o RAPL e ACPI. No fim, este modelo deve ser capaz de apresentar a fatura energética de cada processo que utiliza o subsistema de armazenamento secundário. Além disso, o modelo deve também estimar o consumo de energia total do sistema e a sua distribuição pelos principais subsistemas.In the past few years, the importance of energy efficiency in computer systems has been growing exponentially from the mobile area to the high-performance computing. The gadgets market growth and their increasing dependence on Cloud computing services are some of the reasons for this fast progress in this technological field. The power consumption reduction and the increasing productivity of a system is, for multiple reasons, a concern both for the customer and the manufacturer. For example, the in-depth research of this topic can contribute for increasing the mobile terminals autonomy and for reducing the energy costs on data centers and even on the costumer devices. Altogether, the power consumption monitoring of a system plays an important role to enhance its energy efficiency. Some of the most challenging aspects of this research field are accurate power consumption measurement, estimation and categorization at different levels of the system, from hardware components to processes. This meticulous scale in the energy analysis allows the development of more incisive and conclusive reports about the system power consumption distribution. Within this thesis and based on these challenging factors, it has been developed a simple model that aims to estimate the system power consumption in its entirety, categorized by subsystem and, in the case of secondary storage, also categorized by process. This document presents an analysis of different measurement methods as well as about possible approaches for a model that run over experimental systems with different configurations of secondary storage and using a last generation Intel processor. Initially a research was made concerning many aspects about a system energy, including power consumption estimation models, consumption measurement methods and those methods accuracy. For the model development, it has been used only the existing resources in the tested system to enable an easier investment on a large scale architecture. Therefore, the model implements energy management and monitoring interfaces like RAPL and ACPI. The most important system energy concerns were analysed, such as the power consumption breakdown of static and dynamic power subsystems. The efficiency and performance of some system activities were evaluated. This process ensured the development of a more versatile model, and globally, with a greater accuracy. The model validation process was carried out based on tools provided by the system and by physical measurements. The obtained results seem adequate, having been recorded a maximum error rate of 5% concerning the total system consumption and 10% regarding the secondary storage consumption. This power consumption model can actually be applied to any typical computer system through a calibration tool also developed in this project. This calibration step will estimate relevant power consumption parameters regarding secondary storage since we cannot rely on existing tools or APIs concerning this subsystem. This calibration will exercise the secondary under different usage patterns. The collected statistics will be processed and the cost of relevant operations regarding this secondary storage will be estimated. The next step involves the model development and its calibration using power interfaces as RAPL and ACPI. In the end, this model should be capable of presenting the energy bill of each process that uses the secondary storage subsystem. Additionally, the model should also be able to estimate the total system power consumption and its breakdown by the major subsystems

    Data-intensive Systems on Modern Hardware : Leveraging Near-Data Processing to Counter the Growth of Data

    Get PDF
    Over the last decades, a tremendous change toward using information technology in almost every daily routine of our lives can be perceived in our society, entailing an incredible growth of data collected day-by-day on Web, IoT, and AI applications. At the same time, magneto-mechanical HDDs are being replaced by semiconductor storage such as SSDs, equipped with modern Non-Volatile Memories, like Flash, which yield significantly faster access latencies and higher levels of parallelism. Likewise, the execution speed of processing units increased considerably as nowadays server architectures comprise up to multiple hundreds of independently working CPU cores along with a variety of specialized computing co-processors such as GPUs or FPGAs. However, the burden of moving the continuously growing data to the best fitting processing unit is inherently linked to today’s computer architecture that is based on the data-to-code paradigm. In the light of Amdahl's Law, this leads to the conclusion that even with today's powerful processing units, the speedup of systems is limited since the fraction of parallel work is largely I/O-bound. Therefore, throughout this cumulative dissertation, we investigate the paradigm shift toward code-to-data, formally known as Near-Data Processing (NDP), which relieves the contention on the I/O bus by offloading processing to intelligent computational storage devices, where the data is originally located. Firstly, we identified Native Storage Management as the essential foundation for NDP due to its direct control of physical storage management within the database. Upon this, the interface is extended to propagate address mapping information and to invoke NDP functionality on the storage device. As the former can become very large, we introduce Physical Page Pointers as one novel NDP abstraction for self-contained immutable database objects. Secondly, the on-device navigation and interpretation of data are elaborated. Therefore, we introduce cross-layer Parsers and Accessors as another NDP abstraction that can be executed on the heterogeneous processing capabilities of modern computational storage devices. Thereby, the compute placement and resource configuration per NDP request is identified as a major performance criteria. Our experimental evaluation shows an improvement in the execution durations of 1.4x to 2.7x compared to traditional systems. Moreover, we propose a framework for the automatic generation of Parsers and Accessors on FPGAs to ease their application in NDP. Thirdly, we investigate the interplay of NDP and modern workload characteristics like HTAP. Therefore, we present different offloading models and focus on an intervention-free execution. By propagating the Shared State with the latest modifications of the database to the computational storage device, it is able to process data with transactional guarantees. Thus, we achieve to extend the design space of HTAP with NDP by providing a solution that optimizes for performance isolation, data freshness, and the reduction of data transfers. In contrast to traditional systems, we experience no significant drop in performance when an OLAP query is invoked but a steady and 30% faster throughput. Lastly, in-situ result-set management and consumption as well as NDP pipelines are proposed to achieve flexibility in processing data on heterogeneous hardware. As those produce final and intermediary results, we continue investigating their management and identified that an on-device materialization comes at a low cost but enables novel consumption modes and reuse semantics. Thereby, we achieve significant performance improvements of up to 400x by reusing once materialized results multiple times

    Evaluation and Identification of Authentic Smartphone Data

    Get PDF
    Mobile technology continues to evolve in the 21st century, providing end-users with mobile devices that support improved capabilities and advance functionality. This ever-improving technology allows smartphone platforms, such as Google Android and Apple iOS, to become prominent and popular among end-users. The reliance on and ubiquitous use of smartphones render these devices rich sources of digital data. This data becomes increasingly important when smartphones form part of regulatory matters, security incidents, criminal or civil cases. Digital data is, however, susceptible to change and can be altered intentionally or accidentally by end-users or installed applications. It becomes, therefore, essential to evaluate the authenticity of data residing on smartphones before submitting the data as potential digital evidence. This thesis focuses on digital data found on smartphones that have been created by smartphone applications and the techniques that can be used to evaluate and identify authentic data. Identification of authentic smartphone data necessitates a better understanding of the smartphone, the related smartphone applications and the environment in which the smartphone operates. Derived from the conducted research and gathered knowledge are the requirements for authentic smartphone data. These requirements are captured in the smartphone data evaluation model to assist digital forensic professionals with the assessment of smartphone data. The smartphone data evaluation model, however, only stipulates how to evaluate the smartphone data and not what the outcome of the evaluation is. Therefore, a classification model is constructed using the identified requirements and the smartphone data evaluation model. The classification model presents a formal classification of the evaluated smartphone data, which is an ordered pair of values. The first value represents the grade of the authenticity of the data and the second value describes the completeness of the evaluation. Collectively, these models form the basis for the developed SADAC tool, a proof of concept digital forensic tool that assists with the evaluation and classification of smartphone data. To conclude, the evaluation and classification models are assessed to determine the effectiveness and efficiency of the models to evaluate and identify authentic smartphone data. The assessment involved two attack scenarios to manipulate smartphone data and the subsequent evaluation of the effects of these attack scenarios using the SADAC tool. The results produced by evaluating the smartphone data associated with each attack scenario confirmed the classification of the authenticity of smartphone data is feasible. Digital forensic professionals can use the provided models and developed SADAC tool to evaluate and identify authentic smartphone data. The outcome of this thesis provides a scientific and strategic approach for evaluating and identifying authentic smartphone data, offering needed assistance to digital forensic professionals. This research also adds to the field of digital forensics by providing insights into smartphone forensics, architectural components of smartphone applications and the nature of authentic smartphone data.Thesis (PhD)--University of Pretoria, 2019.Computer SciencePhDUnrestricte

    Resurrection: Rethinking Magnetic Tapes For Cost Efficient Data Preservation

    Get PDF
    With the advent of Big Data technologies-the capacity to store and efficiently process large sets of data, doors of opportunities for developing business intelligence that was previously unknown, has opened. Each phase in the processing of this data requires specialized infrastructures. One such phase, the preservation and archiving of data, has proven its usefulness time and again. Data archives are processed using novel data mining methods to elicit vital data gathered over long periods of time and efficiently audit the growth of a business or an organization. Data preservation is also an important aspect of business processes which helps in avoiding loss of important information due to system failures, human errors and natural calamities. This thesis investigates the need, discusses possibilities and presents a novel, highly cost-effective, unified, long- term storage solution for data. Some of the common processes followed in large-scale data warehousing systems are analyzed for overlooked, inordinate shortcomings and a profitably feasible solution is conceived for them. The gap between the general needs of 'efficient' long-term storage and common, current functionalities is analyzed. An attempt to bridge this gap is made through the use of a hybrid, hierarchical media based, performance enhancing middleware and a monolithic namespace filesystem in a new storage architecture, Tape Cloud. The scope of studies carried out by us involves interpreting the effects of using heterogeneous storage media in terms of operational behavior, average latency of data transactions and power consumption. The results show the advantages of the new storage system by demonstrating the difference in operating costs, personnel costs and total cost of ownership from varied perspectives in a business model.Computer Science, Department o
    corecore