Search CORE

252 research outputs found

Dependable Computing on Inexact Hardware through Anomaly Detection.

Author: Khudia Daya Shanker
Publication venue
Publication date: 01/01/2015
Field of study

Reliability of transistors is on the decline as transistors continue to shrink in size. Aggressive voltage scaling is making the problem even worse. Scaled-down transistors are more susceptible to transient faults as well as permanent in-field hardware failures. In order to continue to reap the benefits of technology scaling, it has become imperative to tackle the challenges risen due to the decreasing reliability of devices for the mainstream commodity market. Along with the worsening reliability, achieving energy efficiency and performance improvement by scaling is increasingly providing diminishing marginal returns. More than any other time in history, the semiconductor industry faces the crossroad of unreliability and the need to improve energy efficiency. These challenges of technology scaling can be tackled by categorizing the target applications in the following two categories: traditional applications that have relatively strict correctness requirement on outputs and emerging class of soft applications, from various domains such as multimedia, machine learning, and computer vision, that are inherently inaccuracy tolerant to a certain degree. Traditional applications can be protected against hardware failures by low-cost detection and protection methods while soft applications can trade off quality of outputs to achieve better performance or energy efficiency. For traditional applications, I propose an efficient, software-only application analysis and transformation solution to detect data and control flow transient faults. The intelligence of the data flow solution lies in the use of dynamic application information such as control flow, memory and value profiling. The control flow protection technique achieves its efficiency by simplifying signature calculations in each basic block and by performing checking at a coarse-grain level. For soft applications, I develop a quality control technique. The quality control technique employs continuous, light-weight checkers to ensure that the approximation is controlled and application output is acceptable. Overall, I show that the use of low-cost checkers to produce dependable results on commodity systems---constructed from inexact hardware components---is efficient and practical.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113341/1/dskhudia_1.pd

Deep Blue Documents at the University of Michigan

Cache-conscious Splitting of MapReduce Tasks and its Application to Stencil Computations

Author: Magro Daniel Lobato Vieira
Publication venue
Publication date: 01/01/2015
Field of study

Modern cluster systems are typically composed by nodes with multiple processing units and memory hierarchies comprising multiple cache levels of various sizes. To leverage the full potential of these architectures it is necessary to explore concepts such as parallel programming and the layout of data onto the memory hierarchy. However, the inherent complexity of these concepts and the heterogeneity of the target architectures raises several challenges at application development and performance portability levels, respectively. In what concerns parallel programming, several model and frameworks are available, of which MapReduce [16] is one of the most popular. It was developed at Google [16] for the parallel and distributed processing of large amounts of data in large clusters of commodity machines. Although being very powerful tools, the reference MapReduce frameworks, such as Hadoop and Spark, do not leverage the characteristics of the underlying memory hierarchy. This shortcoming is particularly noticeable in computations that benefit from temporal locality, such as stencil computations. In this context, the goal of this thesis is to improve the performance of MapReduce computations that benefit from temporal locality. To that end we optimize the mapping of MapReduce computations in a machine’s cache memory hierarchy by applying cacheaware tiling techniques. We prototyped our solution on top of the framework Hadoop MapReduce, incorporating a cache-awareness in the splitting stage. To validate our solution and assess its benefits, we developed an API for expressing stencil computations on top the developed framework. The experimental results show that, for a typical stencil computation, our solution delivers an average speed-up of 1.77 while reaching a peek speed-up of 3.2. These findings allows us to conclude that cacheaware decomposition of MapReduce computations considerably boosts the execution of this class of MapReduce computations

Repositório da Universidade Nova de Lisboa

Accelerators for Data Processing

Author: Koçberber Yusuf Onur
Publication venue: Lausanne, EPFL
Publication date: 08/09/2015
Field of study

The explosive growth in digital data and its growing role in real-time analytics motivate the design of high-performance database management systems (DBMSs). Meanwhile, slowdown in supply voltage scaling has stymied improvements in core performance and ushered an era of power-limited chips. These developments motivate the design of software and hardware DBMS accelerators that (1) maximize utility by accelerating the dominant operations, and (2) provide flexibility in the choice of DBMS, data layout, and data types. In this thesis, we identify pointer-intensive data structure operations as a key performance and efficiency bottleneck in data analytics workloads. We observe that data analytics tasks include a large number of independent data structure lookups, each of which is characterized by dependent long-latency memory accesses due to pointer chasing. Unfortunately, exploiting such inter-lookup parallelism to overlap memory accesses from different lookups is not possible within the limited instruction window of modern out-of-order cores. Similarly, software prefetching techniques attempt to exploit inter-lookup parallelism by statically staging independent lookups, and hence break down in the face of irregularity across lookup stages. Based on these observations, we provide a dynamic software acceleration scheme for exploiting inter-lookup parallelism to hide the memory access latency despite the irregularities across lookups. Furthermore, we propose a programmable hardware accelerator to maximize the efficiency of the data structure lookups. As a result, through flexible hardware and software techniques we eliminate a key efficiency and performance bottleneck in data analytics operations

Infoscience - École polytechnique fédérale de Lausanne

Resource-efficient processing of large data volumes

Author: Noll Stefan
Publication venue
Publication date: 01/01/2021
Field of study

The complex system environment of data processing applications makes it very challenging to achieve high resource efficiency. In this thesis, we develop solutions that improve resource efficiency at multiple system levels by focusing on three scenarios that are relevant—but not limited—to database management systems. First, we address the challenge of understanding complex systems by analyzing memory access characteristics via efficient memory tracing. Second, we leverage information about memory access characteristics to optimize the cache usage of algorithms and to avoid cache pollution by applying hardware-based cache partitioning. Third, after optimizing resource usage within a multicore processor, we optimize resource usage across multiple computer systems by addressing the problem of resource contention for bulk loading, i.e., ingesting large volumes of data into the system. We develop a distributed bulk loading mechanism, which utilizes network bandwidth and compute power more efficiently and improves both bulk loading throughput and query processing performance

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Recommended from our members

PebblesDB : building key-value stores using fragmented log-structured merge trees

Author: Raju Pandian
Publication venue
Publication date: 30/08/2018
Field of study

Key-value stores such as LevelDB and RocksDB offer excellent write throughput, but suffer high write amplification. The write amplification problem is due to the Log-Structured Merge Trees data structure that underlies these key-value stores. To remedy this problem, this thesis presents a novel data structure that is inspired by Skip Lists, termed Fragmented Log- Structured Merge Trees (FLSM). FLSM introduces the notion of guards to organize logs (sstables or files containing the data on storage), and avoids rewriting data in the same level. Theoretically, we show how FLSM can address the problem of write amplification. We build PebblesDB, a high-performance key-value store, by modifying HyperLevelDB to use the FLSM data structure. We evaluate PebblesDB using micro-benchmarks and show that for write-intensive workloads, PebblesDB reduces write amplification by 2.4-3x compared to RocksDB, while increasing write throughput by 6.7x. We evaluate PebblesDB extensively under a variety of benchmarks, workload patterns, and environmental factors and analyze how it performs in different scenarios. We modify two widely-used NoSQL stores, MongoDB and HyperDex, to use PebblesDB as their underlying storage engine. Evaluating these applications using the YCSB benchmark shows that throughput is increased by 18-105% when using PebblesDB (compared to their default storage engines) while write IO is decreased by 35-55%.Computer Science

Texas ScholarWorks

Analysis and acceleration of data mining algorithms on high performance reconfigurable computing platforms

Author: Sun Song
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

With the continued development of computation and communication technologies, we are overwhelmed with electronic data. Ubiquitous data in governments, commercial enterprises, universities and various organizations records our decisions, transactions and thoughts. The data collection rate is undergoing tremendous increase. And there is no end in sight. On one hand, as the volume of data explodes, the gap between the human being\u27s understanding of the data and the knowledge hidden in the data will be enlarged. The algorithms and techniques, collectively known as data mining, are emerged to bridge the gap. The data mining algorithms are usually data-compute intensive. On the other hand, the overall computing system performance is not increasing at an equal rate. Consequently, there is strong requirement to design special computing systems to accelerate data mining applications. FPGAs based High Performance Reconfigurable Computing(HPRC) system is to design optimized hardware architecture for a given problem. The increased gate count, arithmetic capability, and other features of modern FPGAs now allow researcher to implement highly complicated reconfigurable computational architecture. In contrast with ASICs, FPGAs have the advantages of low power, low nonrecurring engineering costs, high design flexibility and the ability to update functionality after shipping. In this thesis, we first design the architectures for data intensive and data-compute intensive applications respectively. Then we present a general HPRC framework for data mining applications: Frequent Pattern Mining(FPM) is a data-compute intensive application which is to find commonly occurring itemsets in databases. We use systolic tree architecture in FPGA hardware to mimic the internal memory layout of FP-growth algorithm while achieving higher throughput. The experimental results demonstrate that the proposed hardware architecture is faster than the software approach. Sparse Matrix-Vector Multiplication(SMVM) is a data-intensive application which is an important computing core in many applications. We present a scalable and efficient FPGA-based SMVM architecture which can handle arbitrary matrix sizes without preprocessing or zero padding and can be dynamically expanded based on the available I/O bandwidth. The experimental results using a commercial FPGA-based acceleration system demonstrate that our reconfigurable SMVM engine is more efficient than existing state-of-the-art, with speedups over a highly optimized software implementation of 2.5X to 6.5X, depending on the sparsity of the input benchmark. Accelerating Text Classification Using SMVM is performed in Convey HC-1 HPRC platform. The SMVM engines are deployed into multiple FPGA chips. Text documents are represented as large sparse matrices using Vector Space Model(VSM). The k-nearest neighbor algorithm uses SMVM to perform classification simultaneously on multiple FPGAs. Our experiment shows that the classification in Convey HC-1 is several times faster compared with the traditional computing architecture. MapReduce Reconfigurable Framework for Data Mining Applications is a pipelined and high performance framework for FPGA design based on the MapReduce model. Our goal is to lessen the FPGA programmer burden while minimizing performance degradation. The designer only need focus on the mapper and reducer modules design. We redesigned the SMVM architecture using the MapReduce Framework. The manual VHDL code is only 15 percent of that used in the customized architecture

Digital Repository @ Iowa State University (ISU)

Software/Hardware Co-Design and Co-Specialisation: Novel Simulation Techniques and Optimisations

Author: Rodchenko Andrey
Publication venue
Publication date: 01/08/2018
Field of study

The University of Manchester - Institutional Repository

Key-Value Stores on Flash Storage Devices:A Survey

Author: Doekemeijer Krijn
Trivedi Animesh
Publication venue
Publication date: 11/05/2022
Field of study

VU Research Portal