63 research outputs found

    Efficient data reliability management of cloud storage systems for big data applications

    Get PDF
    Cloud service providers are consistently striving to provide efficient and reliable service, to their client's Big Data storage need. Replication is a simple and flexible method to ensure reliability and availability of data. However, it is not an efficient solution for Big Data since it always scales in terabytes and petabytes. Hence erasure coding is gaining traction despite its shortcomings. Deploying erasure coding in cloud storage confronts several challenges like encoding/decoding complexity, load balancing, exponential resource consumption due to data repair and read latency. This thesis has addressed many challenges among them. Even though data durability and availability should not be compromised for any reason, client's requirements on read performance (access latency) may vary with the nature of data and its access pattern behaviour. Access latency is one of the important metrics and latency acceptance range can be recorded in the client's SLA. Several proactive recovery methods, for erasure codes are proposed in this research, to reduce resource consumption due to recovery. Also, a novel cache based solution is proposed to mitigate the access latency issue of erasure coding

    Exploration of Erasure-Coded Storage Systems for High Performance, Reliability, and Inter-operability

    Get PDF
    With the unprecedented growth of data and the use of low commodity drives in local disk-based storage systems and remote cloud-based servers has increased the risk of data loss and an overall increase in the user perceived system latency. To guarantee high reliability, replication has been the most popular choice for decades, because of simplicity in data management. With the high volume of data being generated every day, the storage cost of replication is very high and is no longer a viable approach. Erasure coding is another approach of adding redundancy in storage systems, which provides high reliability at a fraction of the cost of replication. However, the choice of erasure codes being used affects the storage efficiency, reliability, and overall system performance. At the same time, the performance and interoperability are adversely affected by the slower device components and complex central management systems and operations. To address the problems encountered in various layers of the erasure coded storage system, in this dissertation, we explore the different aspects of storage and design several techniques to improve the reliability, performance, and interoperability. These techniques range from the comprehensive evaluation of erasure codes, application of erasure codes for highly reliable and high-performance SSD system, to the design of new erasure coding and caching schemes for Hadoop Distributed File System, which is one of the central management systems for distributed storage. Detailed evaluation and results are also provided in this dissertation

    Enabling Distributed Applications Optimization in Cloud Environment

    Get PDF
    The past few years have seen dramatic growth in the popularity of public clouds, such as Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Container-as-a-Service (CaaS). In both commercial and scientific fields, quick environment setup and application deployment become a mandatory requirement. As a result, more and more organizations choose cloud environments instead of setting up the environment by themselves from scratch. The cloud computing resources such as server engines, orchestration, and the underlying server resources are served to the users as a service from a cloud provider. Most of the applications that run in public clouds are the distributed applications, also called multi-tier applications, which require a set of servers, a service ensemble, that cooperate and communicate to jointly provide a certain service or accomplish a task. Moreover, a few research efforts are conducting in providing an overall solution for distributed applications optimization in the public cloud. In this dissertation, we present three systems that enable distributed applications optimization: (1) the first part introduces DocMan, a toolset for detecting containerized applicationā€™s dependencies in CaaS clouds, (2) the second part introduces a system to deal with hot/cold blocks in distributed applications, (3) the third part introduces a system named FP4S, a novel fragment-based parallel state recovery mechanism that can handle many simultaneous failures for a large number of concurrently running stream applications

    Active Data Replica Recovery for Quality-Assurance Big Data Analysis in IC-IoT

    Get PDF
    QoS-aware big data analysis is critical in Information-Centric Internet of Things (IC-IoT) system to support various applications like smart city, smart grid, smart health, intelligent transportation systems, and so on. The employment of non-volatile memory (NVM) in cloud or edge system provides good opportunity to improve quality of data analysis tasks. However, we have to face the data recovery problem led by NVM failure due to the limited write endurance. In this paper, we investigate the data recovery problem for QoS guarantee and system robustness, followed by proposing a rarity-aware data recovery algorithm. The core idea is to establish the rarity indicator to evaluate the replica distribution and service requirement comprehensively. With this idea, we give the lost replicas with distinguishing priority and eliminate the unnecessary replicas. Then, the data replicas are recovered stage by stage to guarantee QoS and provide system robustness. From our extensive experiments and simulations, it is shown that the proposed algorithm has significant performance improvement on QoS and robustness than the traditional direct data recovery method. Besides, the algorithm gives an acceptable data recovery time

    Research In High Performance And Low Power Computer Systems For Data-intensive Environment

    Get PDF
    According to the data affinity, DAFA re-organizes data to maximize the parallelism of the affinitive data, and also subjective to the overall load balance. This enables DAFA to realize the maximum number of map tasks with data-locality. Besides the system performance, power consumption is another important concern of current computer systems. In the U.S. alone, the energy used by servers which could be saved comes to 3.17 million tons of carbon dioxide, or 580,678 cars {Kar09}. However, the goals of high performance and low energy consumption are at odds with each other. An ideal power management strategy should be able to dynamically respond to the change (either linear or nonlinear, or non-model) of workloads and system configuration without violating the performance requirement. We propose a novel power management scheme called MAR (modeless, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption under performance constraints. By using richer feedback factors, e.g. the I/O wait, MAR is able to accurately describe the relationships among core frequencies, performance and power consumption. We adopt a modeless control model to reduce the complexity of system modeling. MAR is designed for CMP (Chip Multi Processor) systems by employing multi-input/multi-output (MIMO) theory and per-core level DVFS (Dynamic Voltage and Frequency Scaling).; TRAID deduplicates this overlap by only logging one compact version (XOR results) of recovery references for the updating data. It minimizes the amount of log content as well as the log flushing overhead, thereby boosts the overall transaction processing performance. At the same time, TRAID guarantees comparable RAID reliability, the same recovery correctness and ACID semantics of traditional transactional processing systems. On the other hand, the emerging myriad data intensive applications place a demand for high-performance computing resources with massive storage. Academia and industry pioneers have been developing big data parallel computing frameworks and large-scale distributed file systems (DFS) widely used to facilitate the high-performance runs of data-intensive applications, such as bio-informatics {Sch09}, astronomy {RSG10}, and high-energy physics {LGC06}. Our recent work {SMW10} reported that data distribution in DFS can significantly affect the efficiency of data processing and hence the overall application performance. This is especially true for those with sophisticated access patterns. For example, Yahoo\u27s Hadoop {refg} clusters employs a random data placement strategy for load balance and simplicity {reff}. This allows the MapReduce {DG08} programs to access all the data (without or not distinguishing interest locality) at full parallelism. Our work focuses on Hadoop systems. We observed that the data distribution is one of the most important factors that affect the parallel programming performance. However, the default Hadoop adopts random data distribution strategy, which does not consider the data semantics, specifically, data affinity. We propose a Data-Affinity-Aware (DAFA) data placement scheme to address the above problem. DAFA builds a history data access graph to exploit the data affinity.; The evolution of computer science and engineering is always motivated by the requirements for better performance, power efficiency, security, user interface (UI), etc {CM02}. The first two factors are potential tradeoffs: better performance usually requires better hardware, e.g., the CPUs with larger number of transistors, the disks with higher rotation speed; however, the increasing number of transistors on the single die or chip reveals super-linear growth in CPU power consumption {FAA08a}, and the change in disk rotation speed has a quadratic effect on disk power consumption {GSK03}. We propose three new systematic approaches as shown in Figure 1.1, Transactional RAID, data-affinity-aware data placement DAFA and Modeless power management, to tackle the performance problem in Database systems, large scale clusters or cloud platforms, and the power management problem in Chip Multi Processors, respectively. The first design, Transactional RAID (TRAID), is motivated by the fact that in recent years, more storage system applications have employed transaction processing techniques Figure 1.1 Research Work Overview] to ensure data integrity and consistency. In transaction processing systems(TPS), log is a kind of redundancy to ensure transaction ACID (atomicity, consistency, isolation, durability) properties and data recoverability. Furthermore, high reliable storage systems, such as redundant array of inexpensive disks (RAID), are widely used as the underlying storage system for Databases to guarantee system reliability and availability with high I/O performance. However, the Databases and storage systems tend to implement their independent fault tolerant mechanisms {GR93, Tho05} from their own perspectives and thereby leading to potential high overhead. We observe the overlapped redundancies between the TPS and RAID systems, and propose a novel reliable storage architecture called Transactional RAID (TRAID)

    Benchmarking Hadoop performance on different distributed storage systems

    Get PDF
    Distributed storage systems have been in place for years, and have undergone significant changes in architecture to ensure reliable storage of data in a cost-effective manner. With the demand for data increasing, there has been a shift from disk-centric to memory-centric computing - the focus is on saving data in memory rather than on the disk. The primary motivation for this is the increased speed of data processing. This could, however, mean a change in the approach to providing the necessary fault-tolerance - instead of data replication, other techniques may be considered. One example of an in-memory distributed storage system is Tachyon. Instead of replicating data files in memory, Tachyon provides fault-tolerance by maintaining a record of the operations needed to generate the data files. These operations are replayed if the files are lost. This approach is termed lineage. Tachyon is already deployed by many well-known companies. This thesis work compares the storage performance of Tachyon with that of the on-disk storage systems HDFS and Ceph. After studying the architectures of well-known distributed storage systems, the major contribution of the work is to integrate Tachyon with Ceph as an underlayer storage system, and understand how this affects its performance, and how to tune Tachyon to extract maximum performance out of it

    Discovering New Vulnerabilities in Computer Systems

    Get PDF
    Vulnerability research plays a key role in preventing and defending against malicious computer system exploitations. Driven by a multi-billion dollar underground economy, cyber criminals today tirelessly launch malicious exploitations, threatening every aspect of daily computing. to effectively protect computer systems from devastation, it is imperative to discover and mitigate vulnerabilities before they fall into the offensive parties\u27 hands. This dissertation is dedicated to the research and discovery of new design and deployment vulnerabilities in three very different types of computer systems.;The first vulnerability is found in the automatic malicious binary (malware) detection system. Binary analysis, a central piece of technology for malware detection, are divided into two classes, static analysis and dynamic analysis. State-of-the-art detection systems employ both classes of analyses to complement each other\u27s strengths and weaknesses for improved detection results. However, we found that the commonly seen design patterns may suffer from evasion attacks. We demonstrate attacks on the vulnerabilities by designing and implementing a novel binary obfuscation technique.;The second vulnerability is located in the design of server system power management. Technological advancements have improved server system power efficiency and facilitated energy proportional computing. However, the change of power profile makes the power consumption subjected to unaudited influences of remote parties, leaving the server systems vulnerable to energy-targeted malicious exploit. We demonstrate an energy abusing attack on a standalone open Web server, measure the extent of the damage, and present a preliminary defense strategy.;The third vulnerability is discovered in the application of server virtualization technologies. Server virtualization greatly benefits today\u27s data centers and brings pervasive cloud computing a step closer to the general public. However, the practice of physical co-hosting virtual machines with different security privileges risks introducing covert channels that seriously threaten the information security in the cloud. We study the construction of high-bandwidth covert channels via the memory sub-system, and show a practical exploit of cross-virtual-machine covert channels on virtualized x86 platforms

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends
    • ā€¦
    corecore