29 research outputs found

    Bridging the Gap between Application and Solid-State-Drives

    Get PDF
    Data storage is one of the important and often critical parts of the computing system in terms of performance, cost, reliability, and energy. Numerous new memory technologies, such as NAND flash, phase change memory (PCM), magnetic RAM (STT-RAM) and Memristor, have emerged recently. Many of them have already entered the production system. Traditional storage optimization and caching algorithms are far from optimal because storage I/Os do not show simple locality. To provide optimal storage we need accurate predictions of I/O behavior. However, the workloads are increasingly dynamic and diverse, making the long and short time I/O prediction challenge. Because of the evolution of the storage technologies and the increasing diversity of workloads, the storage software is becoming more and more complex. For example, Flash Translation Layer (FTL) is added for NAND-flash based Solid State Disks (NAND-SSDs). However, it introduces overhead such as address translation delay and garbage collection costs. There are many recent studies aim to address the overhead. Unfortunately, there is no one-size-fits-all solution due to the variety of workloads. Despite rapidly evolving in storage technologies, the increasing heterogeneity and diversity in machines and workloads coupled with the continued data explosion exacerbate the gap between computing and storage speeds. In this dissertation, we improve the data storage performance from both top-down and bottom-up approach. First, we will investigate exposing the storage level parallelism so that applications can avoid I/O contentions and workloads skew when scheduling the jobs. Second, we will study how architecture aware task scheduling can improve the performance of the application when PCM based NVRAM are equipped. Third, we will develop an I/O correlation aware flash translation layer for NAND-flash based Solid State Disks. Fourth, we will build a DRAM-based correlation aware FTL emulator and study the performance in various filesystems

    Survey on Deduplication Techniques in Flash-Based Storage

    Get PDF
    Data deduplication importance is growing with the growth of data volumes. The domain of data deduplication is in active development. Recently it was influenced by appearance of Solid State Drive. This new type of disk has significant differences from random access memory and hard disk drives and is widely used now. In this paper we propose a novel taxonomy which reflects the main issues related to deduplication in Solid State Drive. We present a survey on deduplication techniques focusing on flash-based storage. We also describe several Open Source tools implementing data deduplication and briefly describe open research problems related to data deduplication in flash-based storage systems

    Pattern-Based Systems Engineering (PBSE) - Product lifecycle Management (PLM) integration and validation

    Get PDF
    Mass customization, small lot sizes, reduced cost, high variability of product types and changing product portfolio are characteristics of modern manufacturing systems during life cycle. A direct consequence of these characteristics is a more complex system and supply chain. Product lifecycle management (PLM) and model based system engineering (MBSE) are tools which have been proposed and implemented to address different aspects of this complexity and resulting challenges. Our previous work has successfully implemented a MBSE model into a PLM platform. More specifically, Pattern based system engineering (S* pattern) models of systems are integrated with TEAMCENTER to link and interface system level with component level, and streamline the lifecycle across disciplines. The benefit of the implementation is two folded. On one side it helps system engineers using system engineering models enable a shift from learning how to model to implementing the model, which leads to more effective systems definition, design, integration and testing. On the other side the PLM platform provides a reliable database to store legacy data for future use also track changes during the entire process, including one of the most important tools that a systems engineer needs which is an automatic report generation tool. In the current work, we have configured a PLM platform (TEAMCENTER) to support automatic generation of reports and requirements tables using a generic Oil Filter system lifecycle. There are three tables that have been configured for automatic generation which are Feature definitions table, Detail Requirements table and Stakeholder Feature Attributes table. These tables where specifically chosen as they describe all the requirements of the system and cover all physical behaviours the oil filter system shall exhibit during its physical interactions with external systems. The requirement tables represent core content for a typical systems engineering report. With the help of the automatic report generation tool, it is possible to prepare the entire report within one single system, the PLM system, to ensure a single reliable data source for an organization. Automatic generation of these contents can save the systems engineers time, avoid duplicated work and human errors in report preparation, train future generation of workforce in the lifecycle all the while encouraging standardized documents in an organization

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

    Data Science and Knowledge Discovery

    Get PDF
    Data Science (DS) is gaining significant importance in the decision process due to a mix of various areas, including Computer Science, Machine Learning, Math and Statistics, domain/business knowledge, software development, and traditional research. In the business field, DS's application allows using scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data to support the decision process. After collecting the data, it is crucial to discover the knowledge. In this step, Knowledge Discovery (KD) tasks are used to create knowledge from structured and unstructured sources (e.g., text, data, and images). The output needs to be in a readable and interpretable format. It must represent knowledge in a manner that facilitates inferencing. KD is applied in several areas, such as education, health, accounting, energy, and public administration. This book includes fourteen excellent articles which discuss this trending topic and present innovative solutions to show the importance of Data Science and Knowledge Discovery to researchers, managers, industry, society, and other communities. The chapters address several topics like Data mining, Deep Learning, Data Visualization and Analytics, Semantic data, Geospatial and Spatio-Temporal Data, Data Augmentation and Text Mining
    corecore