14 research outputs found
Study On Endurance Of Flash Memory Ssds
Flash memory promises to revolutionize storage systems because of its massive performance gains, ruggedness, large decrease in power usage and physical space requirements, but it is not a direct replacement for magnetic hard disks. Flash memory possesses fundamentally different characteristics and in order to fully utilize the positive aspects of flash memory, we must engineer around its unique limitations. The primary limitations are lack of in-place updates, the asymmetry between the sizes of the write and erase operations, and the limited endurance of flash memory cells. This leads to the need for efficient methods for block cleaning, combating write amplification and performing wear leveling. These are fundamental attributes of flash memory and will always need to be understood and efficiently managed to produce an efficient and high performance storage system.
Our goal in this work is to provide analysis and algorithms for efficiently managing data storage for endurance in flash memory. We present update codes, a class of floating codes, which encodes data updates as flash memory cell increments that results in reduced block erases and longer lifespan of flash memory, and provides a new algorithm for constructing optimal floating codes. We also analyze the theoretically possible limits of write amplification reduction and minimization by using offline workloads. We give an estimation of the minimal write amplification by a workload decomposition algorithm and find that write amplification can be pushed to zero with relatively low over-provisioning. Additionally, we give simple, efficient and practical algorithms that are effective in reducing write amplification and performing wear leveling. Finally, we present a quantitative model of wear levels in flash memory by constructing a difference equation that gives erase counts of a block with workload, wear leveling strategy and SSD configuration as parameters
Letter from the Special Issue Editor
Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies
A framework for multidimensional indexes on distributed and highly-available data stores
Spatial Big Data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of peta bytes of spatial data per year. However, as many authors have pointed out, the lack of specialized frameworks dealing with such kind of data is limiting possible applications and probably precluding many scientific breakthroughs.
In this thesis, we describe three HPC scientific applications, ranging from molecular dynamics, neuroscience analysis, and physics simulations, where we experience first hand the limits of the existing technologies. Thanks to our experience, we define the desirable missing functionalities, and we focus on two features that when combined significantly improve the way scientific data is analyzed.
On one side, scientific simulations generate complex datasets where multiple correlated characteristics describe each item. For instance, a particle might have a space position (x,y,z) at a given time (t). If we want to find all elements within the same area and period, we either have to scan the whole dataset, or we must organize the data so that all items in the same space and time are stored together. The second approach is called Multidimensional Indexing (MI), and it uses different techniques to cluster and to organize similar data together.
On the other side, approximate analytics has been often indicated as a smart and flexible way to explore large datasets in a short period. Approximate analytics includes a broad family of algorithms which aims to speed up analytical workloads by relaxing the precision of the results within a specific interval of confidence. For instance, if we want to know the average age in a group with 1-year precision, we can consider just a random fraction of all the people, thus reducing the amount of calculation. But if we also want less I/O operations, we need efficient data sampling, which means organizing data in a way that we do not need to scan the whole data set to generate a random sample of it.
According to our analysis, combining Multidimensional Indexing with efficient data Sampling (MIS) is a vital missing feature not available in the current distributed data management solutions.
This thesis aims to solve such a shortcoming and it provides novel scalable solutions. At first, we describe the existing data management alternatives; then we motivate our preference for NoSQL key-value databases. Secondly, we propose an analytical model to study the influence of data models on the scalability and performance of this kind of distributed database. Thirdly, we use the analytical model to design two novel multidimensional indexes with efficient data sampling: the D8tree and the AOTree. Our first solution, the D8tree, improves state of the art for approximate spatial queries on static and mostly read dataset. Later, we enhanced the data ingestion capability or our approach by introducing the AOTree, an algorithm that enables the query performance of the D8tree even for HPC write-intensive applications. We compared our solution with PostgreSQL and plain storage, and we demonstrate that our proposal has better performance and scalability.
Finally, we describe Qbeast, the novel distributed system that implements the D8tree and the AOTree using NoSQL technologies, and we illustrate how Qbeast simplifies the workflow of scientists in various HPC applications providing a scalable and integrated solution for data analysis and management.La gestión de BigData con información espacial está considerada como una tendencia esencial en el futuro de las aplicaciones cientÃficas y de negocio. De hecho, se generan cientos de petabytes de datos espaciales por año mediante instrumentos de investigación, dispositivos médicos y redes sociales. Sin embargo, tal y como muchos autores han señalado, la falta de entornos especializados en manejar este tipo de datos está limitando sus posibles aplicaciones y está impidiendo muchos avances cientÃficos. En esta tesis, describimos 3 aplicaciones cientÃficas HPC, que cubren los ámbitos de dinámica molecular, análisis neurocientÃfico y simulaciones fÃsicas, donde hemos experimentado en primera mano las limitaciones de las tecnologÃas existentes. Gracias a nuestras experiencias, hemos podido definir qué funcionalidades serÃan deseables y no existen, y nos hemos centrado en dos caracterÃsticas que, al combinarlas, mejoran significativamente la manera en la que se analizan los datos cientÃficos. Por un lado, las simulaciones cientÃficas generan conjuntos de datos complejos, en los que cada elemento es descrito por múltiples caracterÃsticas correlacionadas. Por ejemplo, una partÃcula puede tener una posición espacial (x, y, z) en un momento dado (t). Si queremos encontrar todos los elementos dentro de la misma área y periodo, o bien recorremos y analizamos todo el conjunto de datos, o bien organizamos los datos de manera que se almacenen juntos todos los elementos que comparten área en un momento dado. Esta segunda opción se conoce como Indexación Multidimensional (IM) y usa diferentes técnicas para agrupar y organizar datos similares. Por otro lado, se suele señalar que las analÃticas aproximadas son una manera inteligente y flexible de explorar grandes conjuntos de datos en poco tiempo. Este tipo de analÃticas incluyen una amplia familia de algoritmos que acelera el tiempo de procesado, relajando la precisión de los resultados dentro de un determinado intervalo de confianza. Por ejemplo, si queremos saber la edad media de un grupo con precisión de un año, podemos considerar sólo un subconjunto aleatorio de todas las personas, reduciendo asà la cantidad de cálculo. Pero si además queremos menos operaciones de entrada/salida, necesitamos un muestreo eficiente de datos, que implica organizar los datos de manera que no necesitemos recorrerlos todos para generar una muestra aleatoria. De acuerdo con nuestros análisis, la combinación de Indexación Multidimensional con Muestreo eficiente de datos (IMM) es una caracterÃstica vital que no está disponible en las soluciones actuales de gestión distribuida de datos. Esta tesis pretende resolver esta limitación y proporciona unas soluciones novedosas que son escalables. En primer lugar, describimos las alternativas de gestión de datos que existen y motivamos nuestra preferencia por las bases de datos NoSQL basadas en clave-valor. En segundo lugar, proponemos un modelo analÃtico para estudiar la influencia que tienen los modelos de datos sobre la escalabilidad y el rendimiento de este tipo de bases de datos distribuidas. En tercer lugar, usamos el modelo analÃtico para diseñar dos novedosos algoritmos IMM: el D8tree y el AOTree. Nuestra primera solución, el D8tree, mejora el estado del arte actual para consultas espaciales aproximadas, cuando el conjunto de datos es estático y mayoritariamente de lectura. Después, mejoramos la capacidad de ingestión introduciendo el AOTree, un algoritmo que conserva el rendimiento del D8tree incluso para aplicaciones HPC intensivas en escritura. Hemos comparado nuestra solución con PostgreSQL y almacenamiento plano demostrando que nuestra propuesta mejora tanto el rendimiento como la escalabilidad. Finalmente, describimos Qbeast, el sistema que implementa los algoritmos D8tree y AOTree, e ilustramos cómo Qbeast simplifica el flujo de trabajo de los cientÃficos ofreciendo una solución escalable e integraPostprint (published version
Exploiting intrinsic flash properties to enhance modern storage systems
The longstanding goals of storage system design have been to provide simple abstractions for applications to efficiently access data while ensuring the data durability and security on a hardware device. The traditional storage system, which was designed for slow hard disk drive with little parallelism, does not fit for the new storage technologies such as the faster flash memory with high internal parallelism. The gap between the storage system software and flash device causes both resource inefficiency and sub-optimal performance. This dissertation focuses on the rethinking of the storage system design for flash memory with a holistic approach from the system level to the device level and revisits several critical aspects of the storage system design including the storage performance, performance isolation, energy-efficiency, and data security. The traditional storage system lacks full performance isolation between applications sharing the device because it does not make the software aware of the underlying flash properties and constraints. This dissertation proposes FlashBlox, a storage virtualization system that utilizes flash parallelism to provide hardware isolation between applications by assigning them on dedicated chips. FlashBlox reduces the tail latency of storage operations dramatically compared with the existing software-based isolation techniques while achieving uniform lifetime for the flash device. As the underlying flash device latency is reduced significantly compared to the conventional hard disk drive, the storage software overhead has become the major bottleneck. This dissertation presents FlashMap, a holistic flash-based storage stack that combines memory, storage and device-level indirections into a unified layer. By combining these layers, FlashMap reduces critical-path latency for accessing data in the flash device and improves DRAM caching efficiency significantly for flash management. The traditional storage software incurs energy-intensive storage operations due to the need for maintaining data durability and security for personal data, which has become a significant challenge for resource-constrained devices such as mobiles and wearables. This dissertation proposes WearDrive, a fast and energy-efficient storage system for wearables. WearDrive treats the battery-backed DRAM as non-volatile memory to store personal data and trades the connected phone’s battery for the wearable’s by performing large and energy-intensive tasks on the phone while performing small and energy-efficient tasks locally using battery-backed DRAM. WearDrive improves wearable’s battery life significantly with negligible impact to the phone’s battery life. The storage software which has been developed for decades is still vulnerable to malware attacks. For example, the encryption ransomware which is a malicious software that stealthily encrypts user files and demands a ransom to provide access to these files. Prior solutions such as ransomware detection and data backups have been proposed to defend against encryption ransomware. Unfortunately, by the time the ransomware is detected, some files already undergo encryption and the user is still required to pay a ransom to access those files. Furthermore, ransomware variants can obtain kernel privilege to terminate
or destroy these software-based defense systems. This dissertation presents FlashGuard, a ransomware-tolerant SSD which has a firmware-level recovery system that allows effective data recovery from encryption ransomware. FlashGuard leverages the intrinsic flash properties to defend against the encryption ransomware and adds minimal overhead to regular storage operations.Ph.D
Recommended from our members
Accurate modeling of core and memory locality for proxy generation targeting emerging applications and architectures
Designing optimal computer systems for improved performance and energy efficiency requires architects and designers to have a deep understanding of the end-user workloads. However, many end-users (e.g., large corporations, banks, defense organizations, etc.) are apprehensive to share their applications with designers due to the confidential nature of software code and data. In addition, emerging applications pose significant challenges to early design space exploration due to their long-running nature and the highly complex nature of their software stack that cannot be supported on many early performance models.
The above challenges can be overcome by using a proxy benchmark. A miniaturized proxy benchmark can be used as a substitute of the original workload to perform early computer performance evaluation. The process of generating a proxy benchmark consists of extracting a set of key statistics to summarize the behavior of end-user applications through profiling and using the collected statistics to synthesize a representative proxy benchmark. Using such proxy benchmarks can help designers to understand the behavior of end-user’s workloads in a reasonable time without the users having to disclose sensitive information about their workloads.
Prior proxy benchmarking schemes leverage micro-architecture independent metrics, derived from detailed simulation tools, to generate proxy benchmarks. However, many emerging workloads do not work reliably with many profiling or simulation tools, in which case it becomes impossible to apply prior proxy generation techniques to generate proxy benchmarks for such complex applications. Furthermore, these techniques model instruction pipeline-level locality in great detail, but abstract out memory locality modeling using simple stride-based models. This results in poor cloning accuracy especially for emerging applications, which have larger memory footprints and complex access patterns. A few detailed cache and memory locality modeling techniques have also been proposed in literature. However, these techniques either model limited locality metrics and suffer from poor cloning accuracy or are fairly accurate, but at the expense of significant metadata overhead. Finally, none of the prior proxy benchmarking techniques model both core and memory locality with high accuracy. As a result, they are not useful for studying system-level performance behavior. Keeping the above key limitations and shortcomings of prior work in mind, this dissertation presents several techniques that expand the frontiers of workload proxy benchmarking, thereby enabling computer designers to gain a better and faster understanding of end-user application behavior without compromising the privileged nature of software or data.
This dissertation first presents a core-level proxy benchmark generation methodology that leverages performance metrics derived from hardware performance counter measurements to create miniature proxy benchmarks targeting emerging big-data applications. The presented performance counter based characterization and associated extrapolation into generic parameters for proxy generation enables faster analysis (runs almost at native hardware speeds, unlike prior workload cloning proposals) and proxy generation for emerging applications that do not work with simulators or profiling tools. The generated proxy benchmarks are representative of the performance of the real-world big-data applications, including operating system and run-time effects, and yet converge to results quickly without needing any complex software stack support.
Next, to improve upon the accuracy and efficiency of prior memory proxy benchmarking techniques, this dissertation presents a novel memory locality modeling technique that leverages localized pattern detection to create miniature memory proxy benchmarks. The presented technique models memory reference locality by decomposing an application’s memory accesses into a set of independent streams (localized by using address region based localization property), tracking fine-grained patterns within the localized streams and, finally, chaining or interleaving accesses from different localized memory streams to create an ordered proxy memory access sequence. This dissertation further extends the workload cloning approach to Graphics Processing Units (GPUs) and presents a novel proxy generation methodology to model the inherent memory access locality of GPU applications, while also accounting for the GPU’s parallel execution model. The generated memory proxy benchmarks help to enable fast and efficient design space exploration of futuristic memory hierarchies.
Finally, this dissertation presents a novel technique to integrate accurate core and memory locality models to create system-level proxy benchmarks targeting emerging applications. This is a new capability that can facilitate efficient overall system (core, cache and memory subsystem) design-space exploration. This dissertation further presents a novel methodology that exploits the synthetic benchmark generation framework to create hypothetical workloads with performance behavior that does not currently exist. Such proxies can be generated to cover anticipated code trends and can represent futuristic workloads before the workloads even exist.Electrical and Computer Engineerin
A framework for multidimensional indexes on distributed and highly-available data stores
Spatial Big Data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of peta bytes of spatial data per year. However, as many authors have pointed out, the lack of specialized frameworks dealing with such kind of data is limiting possible applications and probably precluding many scientific breakthroughs.
In this thesis, we describe three HPC scientific applications, ranging from molecular dynamics, neuroscience analysis, and physics simulations, where we experience first hand the limits of the existing technologies. Thanks to our experience, we define the desirable missing functionalities, and we focus on two features that when combined significantly improve the way scientific data is analyzed.
On one side, scientific simulations generate complex datasets where multiple correlated characteristics describe each item. For instance, a particle might have a space position (x,y,z) at a given time (t). If we want to find all elements within the same area and period, we either have to scan the whole dataset, or we must organize the data so that all items in the same space and time are stored together. The second approach is called Multidimensional Indexing (MI), and it uses different techniques to cluster and to organize similar data together.
On the other side, approximate analytics has been often indicated as a smart and flexible way to explore large datasets in a short period. Approximate analytics includes a broad family of algorithms which aims to speed up analytical workloads by relaxing the precision of the results within a specific interval of confidence. For instance, if we want to know the average age in a group with 1-year precision, we can consider just a random fraction of all the people, thus reducing the amount of calculation. But if we also want less I/O operations, we need efficient data sampling, which means organizing data in a way that we do not need to scan the whole data set to generate a random sample of it.
According to our analysis, combining Multidimensional Indexing with efficient data Sampling (MIS) is a vital missing feature not available in the current distributed data management solutions.
This thesis aims to solve such a shortcoming and it provides novel scalable solutions. At first, we describe the existing data management alternatives; then we motivate our preference for NoSQL key-value databases. Secondly, we propose an analytical model to study the influence of data models on the scalability and performance of this kind of distributed database. Thirdly, we use the analytical model to design two novel multidimensional indexes with efficient data sampling: the D8tree and the AOTree. Our first solution, the D8tree, improves state of the art for approximate spatial queries on static and mostly read dataset. Later, we enhanced the data ingestion capability or our approach by introducing the AOTree, an algorithm that enables the query performance of the D8tree even for HPC write-intensive applications. We compared our solution with PostgreSQL and plain storage, and we demonstrate that our proposal has better performance and scalability.
Finally, we describe Qbeast, the novel distributed system that implements the D8tree and the AOTree using NoSQL technologies, and we illustrate how Qbeast simplifies the workflow of scientists in various HPC applications providing a scalable and integrated solution for data analysis and management.La gestión de BigData con información espacial está considerada como una tendencia esencial en el futuro de las aplicaciones cientÃficas y de negocio. De hecho, se generan cientos de petabytes de datos espaciales por año mediante instrumentos de investigación, dispositivos médicos y redes sociales. Sin embargo, tal y como muchos autores han señalado, la falta de entornos especializados en manejar este tipo de datos está limitando sus posibles aplicaciones y está impidiendo muchos avances cientÃficos. En esta tesis, describimos 3 aplicaciones cientÃficas HPC, que cubren los ámbitos de dinámica molecular, análisis neurocientÃfico y simulaciones fÃsicas, donde hemos experimentado en primera mano las limitaciones de las tecnologÃas existentes. Gracias a nuestras experiencias, hemos podido definir qué funcionalidades serÃan deseables y no existen, y nos hemos centrado en dos caracterÃsticas que, al combinarlas, mejoran significativamente la manera en la que se analizan los datos cientÃficos. Por un lado, las simulaciones cientÃficas generan conjuntos de datos complejos, en los que cada elemento es descrito por múltiples caracterÃsticas correlacionadas. Por ejemplo, una partÃcula puede tener una posición espacial (x, y, z) en un momento dado (t). Si queremos encontrar todos los elementos dentro de la misma área y periodo, o bien recorremos y analizamos todo el conjunto de datos, o bien organizamos los datos de manera que se almacenen juntos todos los elementos que comparten área en un momento dado. Esta segunda opción se conoce como Indexación Multidimensional (IM) y usa diferentes técnicas para agrupar y organizar datos similares. Por otro lado, se suele señalar que las analÃticas aproximadas son una manera inteligente y flexible de explorar grandes conjuntos de datos en poco tiempo. Este tipo de analÃticas incluyen una amplia familia de algoritmos que acelera el tiempo de procesado, relajando la precisión de los resultados dentro de un determinado intervalo de confianza. Por ejemplo, si queremos saber la edad media de un grupo con precisión de un año, podemos considerar sólo un subconjunto aleatorio de todas las personas, reduciendo asà la cantidad de cálculo. Pero si además queremos menos operaciones de entrada/salida, necesitamos un muestreo eficiente de datos, que implica organizar los datos de manera que no necesitemos recorrerlos todos para generar una muestra aleatoria. De acuerdo con nuestros análisis, la combinación de Indexación Multidimensional con Muestreo eficiente de datos (IMM) es una caracterÃstica vital que no está disponible en las soluciones actuales de gestión distribuida de datos. Esta tesis pretende resolver esta limitación y proporciona unas soluciones novedosas que son escalables. En primer lugar, describimos las alternativas de gestión de datos que existen y motivamos nuestra preferencia por las bases de datos NoSQL basadas en clave-valor. En segundo lugar, proponemos un modelo analÃtico para estudiar la influencia que tienen los modelos de datos sobre la escalabilidad y el rendimiento de este tipo de bases de datos distribuidas. En tercer lugar, usamos el modelo analÃtico para diseñar dos novedosos algoritmos IMM: el D8tree y el AOTree. Nuestra primera solución, el D8tree, mejora el estado del arte actual para consultas espaciales aproximadas, cuando el conjunto de datos es estático y mayoritariamente de lectura. Después, mejoramos la capacidad de ingestión introduciendo el AOTree, un algoritmo que conserva el rendimiento del D8tree incluso para aplicaciones HPC intensivas en escritura. Hemos comparado nuestra solución con PostgreSQL y almacenamiento plano demostrando que nuestra propuesta mejora tanto el rendimiento como la escalabilidad. Finalmente, describimos Qbeast, el sistema que implementa los algoritmos D8tree y AOTree, e ilustramos cómo Qbeast simplifica el flujo de trabajo de los cientÃficos ofreciendo una solución escalable e integr
Research In High Performance And Low Power Computer Systems For Data-intensive Environment
According to the data affinity, DAFA re-organizes data to maximize the parallelism of the affinitive data, and also subjective to the overall load balance. This enables DAFA to realize the maximum number of map tasks with data-locality. Besides the system performance, power consumption is another important concern of current computer systems. In the U.S. alone, the energy used by servers which could be saved comes to 3.17 million tons of carbon dioxide, or 580,678 cars {Kar09}. However, the goals of high performance and low energy consumption are at odds with each other. An ideal power management strategy should be able to dynamically respond to the change (either linear or nonlinear, or non-model) of workloads and system configuration without violating the performance requirement. We propose a novel power management scheme called MAR (modeless, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption under performance constraints. By using richer feedback factors, e.g. the I/O wait, MAR is able to accurately describe the relationships among core frequencies, performance and power consumption. We adopt a modeless control model to reduce the complexity of system modeling. MAR is designed for CMP (Chip Multi Processor) systems by employing multi-input/multi-output (MIMO) theory and per-core level DVFS (Dynamic Voltage and Frequency Scaling).; TRAID deduplicates this overlap by only logging one compact version (XOR results) of recovery references for the updating data. It minimizes the amount of log content as well as the log flushing overhead, thereby boosts the overall transaction processing performance. At the same time, TRAID guarantees comparable RAID reliability, the same recovery correctness and ACID semantics of traditional transactional processing systems. On the other hand, the emerging myriad data intensive applications place a demand for high-performance computing resources with massive storage. Academia and industry pioneers have been developing big data parallel computing frameworks and large-scale distributed file systems (DFS) widely used to facilitate the high-performance runs of data-intensive applications, such as bio-informatics {Sch09}, astronomy {RSG10}, and high-energy physics {LGC06}. Our recent work {SMW10} reported that data distribution in DFS can significantly affect the efficiency of data processing and hence the overall application performance. This is especially true for those with sophisticated access patterns. For example, Yahoo\u27s Hadoop {refg} clusters employs a random data placement strategy for load balance and simplicity {reff}. This allows the MapReduce {DG08} programs to access all the data (without or not distinguishing interest locality) at full parallelism. Our work focuses on Hadoop systems. We observed that the data distribution is one of the most important factors that affect the parallel programming performance. However, the default Hadoop adopts random data distribution strategy, which does not consider the data semantics, specifically, data affinity. We propose a Data-Affinity-Aware (DAFA) data placement scheme to address the above problem. DAFA builds a history data access graph to exploit the data affinity.; The evolution of computer science and engineering is always motivated by the requirements for better performance, power efficiency, security, user interface (UI), etc {CM02}. The first two factors are potential tradeoffs: better performance usually requires better hardware, e.g., the CPUs with larger number of transistors, the disks with higher rotation speed; however, the increasing number of transistors on the single die or chip reveals super-linear growth in CPU power consumption {FAA08a}, and the change in disk rotation speed has a quadratic effect on disk power consumption {GSK03}. We propose three new systematic approaches as shown in Figure 1.1, Transactional RAID, data-affinity-aware data placement DAFA and Modeless power management, to tackle the performance problem in Database systems, large scale clusters or cloud platforms, and the power management problem in Chip Multi Processors, respectively. The first design, Transactional RAID (TRAID), is motivated by the fact that in recent years, more storage system applications have employed transaction processing techniques Figure 1.1 Research Work Overview] to ensure data integrity and consistency. In transaction processing systems(TPS), log is a kind of redundancy to ensure transaction ACID (atomicity, consistency, isolation, durability) properties and data recoverability. Furthermore, high reliable storage systems, such as redundant array of inexpensive disks (RAID), are widely used as the underlying storage system for Databases to guarantee system reliability and availability with high I/O performance. However, the Databases and storage systems tend to implement their independent fault tolerant mechanisms {GR93, Tho05} from their own perspectives and thereby leading to potential high overhead. We observe the overlapped redundancies between the TPS and RAID systems, and propose a novel reliable storage architecture called Transactional RAID (TRAID)