42 research outputs found

    Managing contamination delay to improve Timing Speculation architectures

    Get PDF
    Timing Speculation (TS) is a widely known method for realizing better-than-worst-case systems. Aggressive clocking, realizable by TS, enable systems to operate beyond specified safe frequency limits to effectively exploit the data dependent circuit delay. However, the range of aggressive clocking for performance enhancement under TS is restricted by short paths. In this paper, we show that increasing the lengths of short paths of the circuit increases the effectiveness of TS, leading to performance improvement. Also, we propose an algorithm to efficiently add delay buffers to selected short paths while keeping down the area penalty. We present our algorithm results for ISCAS-85 suite and show that it is possible to increase the circuit contamination delay by up to 30% without affecting the propagation delay. We also explore the possibility of increasing short path delays further by relaxing the constraint on propagation delay and analyze the performance impact

    Improving Job Processing Speed through Shuffle Phase Optimization for SSD-based Hadoop MapReduce System

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(์ง€๋Šฅํ˜•์œตํ•ฉ์‹œ์Šคํ…œ์ „๊ณต), 2015. 8. ํ™์„ฑ์ˆ˜.๋งต๋ฆฌ๋“€์Šค๋Š” ํด๋ผ์šฐ๋“œ ๋ฐ์ดํ„ฐ์„ผํ„ฐ์—์„œ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ถ„์‚ฐ ์ฒ˜๋ฆฌ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ชจ๋ธ์ด๋‹ค. ๋งต๋ฆฌ๋“€์Šค๋Š” ๋งต, ์…”ํ”Œ, ๋ฆฌ๋“€์Šค์˜ 3๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ํ•˜๋‘ก ๋งต๋ฆฌ๋“€์Šค๋Š” ๋งต๋ฆฌ๋“€์Šค ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ ์ค‘ ๊ฐ€์žฅ ๋งŽ์ด ์“ฐ์ด๋Š” ๊ฒƒ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํ˜„์žฌ ํ•˜๋‘ก ๋งต๋ฆฌ๋“€์Šค์˜ ์…”ํ”Œ ๋‹จ๊ณ„๋Š” ๋™์ผ ๋ฐ์ดํ„ฐ์˜ ์ค‘๋ณต๋œ ์ฝ๊ธฐ/์“ฐ๊ธฐ๋กœ ๋Œ€๋Ÿ‰์˜ I/O๋ฅผ ๋ฐœ์ƒ์‹œํ‚ค๋ฉฐ, ๋„คํŠธ์›Œํฌ ์ „์†ก์— ์˜ํ•œ ๊ธด ์ง€์—ฐ์„ ๋ฐœ์ƒ์‹œํ‚จ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” SSD ๊ธฐ๋ฐ˜ ํ•˜๋‘ก ๋งต๋ฆฌ๋“€์Šค ์‹œ์Šคํ…œ์—์„œ ๋ฐ์ดํ„ฐ ์ฃผ์†Œ ๊ธฐ๋ฐ˜์˜ ์…”ํ”Œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ์ฃผ์†Œ ๊ธฐ๋ฐ˜์˜ ์…”ํ”Œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ (1) ๋ฐ์ดํ„ฐ ์ฃผ์†Œ ๊ธฐ๋ฐ˜ ์ •๋ ฌ ๋ฐฉ๋ฒ•, (2) ๋ฐ์ดํ„ฐ ์ฃผ์†Œ ๊ธฐ๋ฐ˜ ๋ณ‘ํ•ฉ ๋ฐฉ๋ฒ•๊ณผ (3) ๋งต ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ์„  ์ „์†ก ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ์ด๋Š” ์ž„์˜ ์ฝ๊ธฐ/์“ฐ๊ธฐ ์†๋„๊ฐ€ ๋น ๋ฅธ SSD์˜ ํŠน์ง•์„ ํ™œ์šฉํ•˜์—ฌ ๋Œ€๋Ÿ‰์˜ ์ค‘๊ฐ„ ๋ฐ์ดํ„ฐ ์ „์ฒด๋ฅผ ์ •๋ ฌํ•˜๋Š” ๋Œ€์‹  ์ž‘์€ ํฌ๊ธฐ์˜ ๋ฐ์ดํ„ฐ ์ฃผ์†Œ์ •๋ณด๋งŒ์„ ์ •๋ ฌํ•˜๊ณ , ๋งต ํƒœ์Šคํฌ์—์„œ ๋ฆฌ๋“€์Šค ํƒœ์Šคํฌ๋กœ์˜ ๋ฐ์ดํ„ฐ ์ „์†ก์„ ๋งต ์ถœ๋ ฅ ํŒŒ์ผ์ด ์•„๋‹Œ ์Šคํ•„ ํŒŒ์ผ๊ณผ ์ฃผ์†Œ์ •๋ณด ํŒŒ์ผ๋กœ ํ•จ์œผ๋กœ์จ ๋„คํŠธ์›Œํฌ ์ „์†ก ์‹œ์ž‘์„ ์•ž๋‹น๊ธธ ์ˆ˜ ์žˆ๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด๋‹ค. ์ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ (1) ๋กœ์ปฌ ์ €์žฅ์žฅ์น˜์— ๋Œ€ํ•œ ์ฝ๊ธฐ/์“ฐ๊ธฐ ํšŸ์ˆ˜์™€ ๋ฐ์ดํ„ฐ ์–‘์„ ์ค„์ด๊ณ , (2) ๋„คํŠธ์›Œํฌ ์ „์†ก์„ ์œ„ํ•œ ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ค„์—ฌ ํ•˜๋‘ก ๋งต๋ฆฌ๋“€์Šค ์…”ํ”Œ ๋‹จ๊ณ„์˜ ์ˆ˜ํ–‰์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•˜์˜€๋‹ค. ๋ฐ์ดํ„ฐ ์ฃผ์†Œ ๊ธฐ๋ฐ˜์˜ ์…”ํ”Œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ•˜๋‘ก 1.2.1์— ๊ตฌํ˜„ํ•˜๊ณ  ์‹คํ—˜ํ•˜์˜€๋‹ค. ์‹คํ—˜๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ์ฃผ์†Œ ๊ธฐ๋ฐ˜์˜ ์…”ํ”Œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ Terasort ๋ฒค์น˜๋งˆํฌ์™€ Wordcount ๋ฒค์น˜๋งˆํฌ์˜ ํ‰๊ท  ์‹คํ–‰์‹œ๊ฐ„์ด ๊ฐ๊ฐ 8%์™€ 1% ๊ฐ์†Œ์‹œํ‚ด์„ ๋ณด์˜€๋‹ค.์ดˆ ๋ก i ๋ชฉ ์ฐจ iii ํ‘œ ๋ชฉ์ฐจ iv ๊ทธ๋ฆผ ๋ชฉ์ฐจ v ์ œ 1 ์žฅ ์„œ ๋ก  1 ์ œ 2 ์žฅ ๊ด€๋ จ ์—ฐ๊ตฌ 5 2.1 ํ•˜๋‘ก ๋งต๋ฆฌ๋“€์Šค ์„ฑ๋Šฅ ๊ฐœ์„  ์—ฐ๊ตฌ 5 2.2 SSD ๊ธฐ๋ฐ˜ ํ•˜๋‘ก ์‹œ์Šคํ…œ ์—ฐ๊ตฌ 6 ์ œ 3 ์žฅ ๋ฐฐ ๊ฒฝ 9 3.1 ๋งต๋ฆฌ๋“€์Šค ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ชจ๋ธ 9 3.2 ํ•˜๋‘ก ๋งต๋ฆฌ๋“€์Šค 11 3.3 SSD (Solid State Drive) ํŠน์„ฑ 13 ์ œ 4 ์žฅ ์‹œ์Šคํ…œ ๋ชจ๋ธ 15 4.1 SSD ๊ธฐ๋ฐ˜์˜ ํ•˜๋‘ก ์‹œ์Šคํ…œ 15 4.2 ํ•˜๋‘ก ๋งต๋ฆฌ๋“€์Šค์˜ ์…”ํ”Œ ๋‹จ๊ณ„ 16 ์ œ 5 ์žฅ ๋ฌธ์ œ ์ •์˜ 19 5.1 ๋™์ผ ๋ฐ์ดํ„ฐ์˜ ์ค‘๋ณต ์ฝ๊ธฐ/์“ฐ๊ธฐ ๋ฌธ์ œ 19 5.2 ๋„คํŠธ์›Œํฌ ์ „์†ก์˜ ์ง€์—ฐ ๋ฌธ์ œ 20 ์ œ 6 ์žฅ ๋ฐ์ดํ„ฐ ์ฃผ์†Œ ๊ธฐ๋ฐ˜ ์…”ํ”Œ ๋ฉ”์ปค๋‹ˆ์ฆ˜ 22 6.1 ๋ฐ์ดํ„ฐ ์ฃผ์†Œ ๊ธฐ๋ฐ˜ ์ •๋ ฌ 22 6.2 ๋ฐ์ดํ„ฐ ์ฃผ์†Œ ๊ธฐ๋ฐ˜ ๋ณ‘ํ•ฉ 23 6.3 ๋งต ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ์„  ์ „์†ก 26 ์ œ 7 ์žฅ ์‹คํ—˜ ๋ฐ ํ‰๊ฐ€ 28 7.1 ์‹คํ—˜ ํ™˜๊ฒฝ 28 7.2 ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฐ ํ‰๊ฐ€ 30 ์ œ 8 ์žฅ ๊ฒฐ ๋ก  35 ์ฐธ๊ณ  ๋ฌธํ—Œ 37 Abstract 40Maste

    A storage architecture for data-intensive computing

    Get PDF
    The assimilation of computing into our daily lives is enabling the generation of data at unprecedented rates. In 2008, IDC estimated that the "digital universe" contained 486 exabytes of data [9]. The computing industry is being challenged to develop methods for the cost-effective processing of data at these large scales. The MapReduce programming model has emerged as a scalable way to perform data-intensive computations on commodity cluster computers. Hadoop is a popular open-source implementation of MapReduce. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem --- HDFS --- is written in Java and designed for portability across heterogeneous hardware and software platforms. The efficiency of a Hadoop cluster depends heavily on the performance of this underlying storage system. This thesis is the first to analyze the interactions between Hadoop and storage. It describes how the user-level Hadoop filesystem, instead of efficiently capturing the full performance potential of the underlying cluster hardware, actually degrades application performance significantly. Architectural bottlenecks in the Hadoop implementation result in inefficient HDFS usage due to delays in scheduling new MapReduce tasks. Further, HDFS implicitly makes assumptions about how the underlying native platform manages storage resources, even though native filesystems and I/O schedulers vary widely in design and behavior. Methods to eliminate these bottlenecks in HDFS are proposed and evaluated both in terms of their application performance improvement and impact on the portability of the Hadoop framework. In addition to improving the performance and efficiency of the Hadoop storage system, this thesis also focuses on improving its flexibility. The goal is to allow Hadoop to coexist in cluster computers shared with a variety of other applications through the use of virtualization technology. The introduction of virtualization breaks the traditional Hadoop storage architecture, where persistent HDFS data is stored on local disks installed directly in the computation nodes. To overcome this challenge, a new flexible network-based storage architecture is proposed, along with changes to the HDFS framework. Network-based storage enables Hadoop to operate efficiently in a dynamic virtualized environment and furthers the spread of the MapReduce parallel programming model to new applications

    Cooperative caching for object storage

    Full text link
    Data is increasingly stored in data lakes, vast immutable object stores that can be accessed from anywhere in the data center. By providing low cost and scalable storage, today immutable object-storage based data lakes are used by a wide range of applications with diverse access patterns. Unfortunately, performance can suffer for applications that do not match the access patterns for which the data lake was designed. Moreover, in many of today's (non-hyperscale) data centers, limited bisectional bandwidth will limit data lake performance. Today many computer clusters integrate caches both to address the mismatch between application performance requirements and the capabilities of the shared data lake, and to reduce the demand on the data center network. However, per-cluster caching; i) means the expensive cache resources cannot be shifted between clusters based on demand, ii) makes sharing expensive because data accessed by multiple clusters is independently cached by each of them, and iii) makes it difficult for clusters to grow and shrink if their servers are being used to cache storage. In this dissertation, we present two novel data-center wide cooperative cache architectures, Datacenter-Data-Delivery Network (D3N) and Directory-Based Datacenter-Data-Delivery Network (D4N) that are designed to be part of the data lake itself rather than part of the computer clusters that use it. D3N and D4N distribute caches across the data center to enable data sharing and elasticity of cache resources where requests are transparently directed to nearby cache nodes. They dynamically adapt to changes in access patterns and accelerate workloads while providing the same consistency, trust, availability, and resilience guarantees as the underlying data lake. We nd that exploiting the immutability of object stores significantly reduces the complexity and provides opportunities for cache management strategies that were not feasible for previous cooperative cache systems for le or block-based storage. D3N is a multi-layer cooperative cache that targets workloads with large read-only datasets like big data analytics. It is designed to be easily integrated into existing data lakes with only limited support for write caching of intermediate data, and avoiding any global state by, for example, using consistent hashing for locating blocks and making all caching decisions based purely on local information. Our prototype is performant enough to fully exploit the (5 GB/s read) SSDs and (40, Gbit/s) NICs in our system and improve the runtime of realistic workloads by up to 3x. The simplicity of D3N has enabled us, in collaboration with industry partners, to upstream the two-layer version of D3N into the existing code base of the Ceph object store as a new experimental feature, making it available to the many data lakes around the world based on Ceph. D4N is a directory-based cooperative cache that provides a reliable write tier and a distributed directory that maintains a global state. It explores the use of global state to implement more sophisticated cache management policies and enables application-specific tuning of caching policies to support a wider range of applications than D3N. In contrast to previous cache systems that implement their own mechanism for maintaining dirty data redundantly, D4N re-uses the existing data lake (Ceph) software for implementing a write tier and exploits the semantics of immutable objects to move aged objects to the shared data lake. This design greatly reduces the barrier to adoption and enables D4N to take advantage of sophisticated data lake features such as erasure coding. We demonstrate that D4N is performant enough to saturate the bandwidth of the SSDs, and it automatically adapts replication to the working set of the demands and outperforms the state of art cluster cache Alluxio. While it will be substantially more complicated to integrate the D4N prototype into production quality code that can be adopted by the community, these results are compelling enough that our partners are starting that effort. D3N and D4N demonstrate that cooperative caching techniques, originally designed for file systems, can be employed to integrate caching into todayโ€™s immutable object-based data lakes. We find that the properties of immutable object storage greatly simplify the adoption of these techniques, and enable integration of caching in a fashion that enables re-use of existing battle tested software; greatly reducing the barrier of adoption. In integrating the caching in the data lake, and not the compute cluster, this research opens the door to efficient data center wide sharing of data and resources

    A Survey on Automatic Parameter Tuning for Big Data Processing Systems

    Get PDF
    Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe

    Commodity single board computer clusters and their applications

    Get PDF
    ยฉ 2018 Current commodity Single Board Computers (SBCs) are sufficiently powerful to run mainstream operating systems and workloads. Many of these boards may be linked together, to create small, low-cost clusters that replicate some features of large data center clusters. The Raspberry Pi Foundation produces a series of SBCs with a price/performance ratio that makes SBC clusters viable, perhaps even expendable. These clusters are an enabler for Edge/Fog Compute, where processing is pushed out towards data sources, reducing bandwidth requirements and decentralizing the architecture. In this paper we investigate use cases driving the growth of SBC clusters, we examine the trends in future hardware developments, and discuss the potential of SBC clusters as a disruptive technology. Compared to traditional clusters, SBC clusters have a reduced footprint, are low-cost, and have low power requirements. This enables different models of deploymentโ€”particularly outside traditional data center environments. We discuss the applicability of existing software and management infrastructure to support exotic deployment scenarios and anticipate the next generation of SBC. We conclude that the SBC cluster is a new and distinct computational deployment paradigm, which is applicable to a wider range of scenarios than current clusters. It facilitates Internet of Things and Smart City systems and is potentially a game changer in pushing application logic out towards the network edge

    Composable architecture for rack scale big data computing

    No full text
    The rapid growth of cloud computing, both in terms of the spectrum and volume of cloud workloads, necessitate re-visiting the traditional rack-mountable servers based datacenter design. Next generation datacenters need to offer enhanced support for: (i) fast changing system configuration requirements due to workload constraints, (ii) timely adoption of emerging hardware technologies, and (iii) maximal sharing of systems and subsystems in order to lower costs. Disaggregated datacenters, constructed as a collection of individual resources such as CPU, memory, disks etc., and composed into workload execution units on demand, are an interesting new trend that can address the above challenges. In this paper, we demonstrated the feasibility of composable systems through building a rack scale composable system prototype using PCIe switch. Through empirical approaches, we develop assessment of the opportunities and challenges for leveraging the composable architecture for rack scale cloud datacenters with a focus on big data and NoSQL workloads. In particular, we compare and contrast the programming models that can be used to access the composable resources, and developed the implications for the network and resource provisioning and management for rack scale architecture
    corecore