Search CORE

338 research outputs found

The Librarian & the Big Data: Bridging the Gap

Author: Rajasekar Arcot
Publication venue: eScholarship@UMassChan
Publication date: 09/04/2014
Field of study

Arcot Rajasekar, PhD, is Professor in the School of Information and Science, Chief Scientist for the Renaissance Computing Institute, and Co-Director of the Data Intensive Cyber Environments Center, all at the University of North Carolina at Chapel Hill. He spoke on how the library and information science community can meet the challenges of the scientific data explosion

eScholarship@UMMS

Communication centric platforms for future high data intensive applications

Author: Ahmad Balal
Publication venue: The University of Edinburgh
Publication date: 01/03/2009
Field of study

The notion of platform based design is considered as a viable solution to boost the design productivity by favouring reuse design methodology. With the scaling down of device feature size and scaling up of design complexity, throughput limitations, signal integrity and signal latency are becoming a bottleneck in future communication centric System-on-Chip (SoC) design. This has given birth to communication centric platform based designs. Development of heterogeneous multi-core architectures has caused the on-chip communication medium tailored for a specific application domain to deal with multidomain traffic patterns. This makes the current application specific communication centric platforms unsuitable for future SoC architectures. The work presented in this thesis, endeavours to explore the current communication media to establish the expectations from future on-chip interconnects. A novel communication centric platform based design flow is proposed, which consists of four communication centric platforms that are based on shared global bus, hierarchical bus, crossbars and a novel hybrid communication medium. Developed with a smart platform controller, the platforms support Open Core Protocol (OCP) socket standard, allowing cores to integrate in a plug and play fashion without the need to reprogram the pre-verified platforms. This drastically reduces the design time of SoC architectures. Each communication centric platform has different throughput, area and power characteristics, thus, depending on the design constraints, processing cores can be integrated to the most appropriate communication platform to realise the desired SoC architecture. A novel hybrid communication medium is also developed in this thesis, which combines the advantages of two different types of communication media in a single SoC architecture. The hybrid communication medium consists of crossbar matrix and shared bus medium . Simulation results and implementation of WiMAX receiver as a real-life example shows a 65% increase in data throughput than shared bus based communication medium, 13% decrease in area and 11% decrease in power than crossbar based communication medium. In order to automate the generation of SoC architectures with optimised communication architectures, a tool called SOCCAD (SoC Communication architecture development) is developed. Components needed for the realisation of the given application can be selected from the tool’s in-built library. Offering an optimised communication centric placement, the tool generates the complete SystemC code for the system with different interconnect architectures, along with its power and area characteristics. The generated SystemC code can be used for quick simulation and coupled with efficient test benches can be used for quick verification. Network-on-Chip (NoC) is considered as a solution to the communication bottleneck in future SoC architectures with data throughput requirements of over 10GB/s. It aims to provide low power, efficient link utilisation, reduced data contention and reduced area on silicon. Current on-chip networks, developed with fixed architectural parameters, do not utilise the available resources efficiently. To increase this efficiency, a novel dynamically reconfigurable NoC (drNoC) is developed in this thesis. The proposed drNoC reconfigures itself in terms of switching, routing and packet size with the changing communication requirements of the system at run time, thus utilising the maximum available channel bandwidth. In order to increase the applicability of drNoC, the network interface is designed to support OCP socket standard. This makes drNoC a highly reuseable communication framework, qualifying it as a communication centric platform for high data intensive SoC architectures. Simulation results show a 32% increase in data throughput and 22-35% decrease in network delay when compared with a traditional NoC with fixed parameters

Edinburgh Research Archive

Recommended from our members

Building Reliable Software for Persistent Memory

Author: Zhang Lu
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Persistent memory (PMEM) technologies preserve data across power cycles and provide performance comparable to DRAM. In emerging computer systems, PMEM will operate on the main memory bus, becoming byte-addressable and cache-coherent. One key feature enabled by persistent memory is to allow software directly accessing durable data using the CPU’s load/store instructions, even from the user-space.However, building reliable software for persistent memory faces new challenges from two aspects: crash consistency and fault tolerance. Maintaining crash consistency requires the ability to recover data integrity in the event of system crashes. Using load/store instructions to access durable data introduces a new programming paradigm, that is prone to new types of programming errors. Fault tolerance involves detecting and recovering from persistent memory errors, including memory media errors and scribbles from software bugs. With direct access, file systems and user-space applications have to explicitly manage these errors, instead of relying on convenient functions from lower I/O stacks.We identify unique challenges in improving reliability for PMEM-based software and propose solutions. The thesis first introduces NOVA-Fortis, a fault-tolerant PMEM file system incorporating replication, checksums, and parity for protecting the file system’s metadata and the user’s file data. NOVA-Fortis is both fast and resilient in the face of corruption due to media errors and software bugs.NOVA-Fortis only protects file data via the read() and write() system calls. When an application memory-maps a PMEM file, NOVA-Fortis has to disable file data protection because mmap() leaves the file system unaware of updates made to the file. For protecting memory-mapped PMEM data, we present Pangolin, a fault-tolerant persistent object library to protect an application’s objects from persistent memory errors.Writing programs to ensure crash consistency in PMEM remains challenging. Recovery bugs arise as a new type of programming error, preventing a post-crash PMEM file from recovering to a consistent state. Thus, we design two debugging tools for persistent memory programming: PmemConjurer and PmemSanitizer. PmemConjurer is a static analyzer using symbolic execution to find recovery bugs without running a compiled program. PmemSanitizer contains compiler instrumentation and run-time recovery bug analysis, compensating PmemConjurer with multi-threading support and store reordering tests

eScholarship - University of California

Development Life Cycle

Author: Diane Suchan
Jeff Schwalb
Joe Jarzombek
Kase Johnstun
Kevin Stamey
Nicole Kentta
Steven F. Mattern
Publication venue
Publication date
Field of study

Security in the Software Life Cycle This article emphasizes how developers need to make additional, significant increases in their processes, by adding structure and repeatability to further the security and quality of their software. by Joe Jarzombek and Karen Mercedes Goertze

CiteSeerX

Dynamic partial reconfiguration management for high performance and reliability in FPGAs

Author: Ebrahim Ali
Publication venue: The University of Edinburgh
Publication date: 26/11/2015
Field of study

Modern Field-Programmable Gate Arrays (FPGAs) are no longer used to implement small “glue logic” circuitries. The high-density of reconfigurable logic resources in today’s FPGAs enable the implementation of large systems in a single chip. FPGAs are highly flexible devices; their functionality can be altered by simply loading a new binary file in their configuration memory. While the flexibility of FPGAs is comparable to General-Purpose Processors (GPPs), in the sense that different functions can be performed using the same hardware, the performance gain that can be achieved using FPGAs can be orders of magnitudes higher as FPGAs offer the ability for customisation of parallel computational architectures. Dynamic Partial Reconfiguration (DPR) allows for changing the functionality of certain blocks on the chip while the rest of the FPGA is operational. DPR has sparked the interest of researchers to explore new computational platforms where computational tasks are off-loaded from a main CPU to be executed using dedicated reconfigurable hardware accelerators configured on demand at run-time. By having a battery of custom accelerators which can be swapped in and out of the FPGA at runtime, a higher computational density can be achieved compared to static systems where the accelerators are bound to fixed locations within the chip. Furthermore, the ability of relocating these accelerators across several locations on the chip allows for the implementation of adaptive systems which can mitigate emerging faults in the FPGA chip when operating in harsh environments. By porting the appropriate fault mitigation techniques in such computational platforms, the advantages of FPGAs can be harnessed in different applications in space and military electronics where FPGAs are usually seen as unreliable devices due to their sensitivity to radiation and extreme environmental conditions. In light of the above, this thesis investigates the deployment of DPR as: 1) a method for enhancing performance by efficient exploitation of the FPGA resources, and 2) a method for enhancing the reliability of systems intended to operate in harsh environments. Achieving optimal performance in such systems requires an efficient internal configuration management system to manage the reconfiguration and execution of the reconfigurable modules in the FPGA. In addition, the system needs to support “fault-resilience” features by integrating parameterisable fault detection and recovery capabilities to meet the reliability standard of fault-tolerant applications. This thesis addresses all the design and implementation aspects of an Internal Configuration Manger (ICM) which supports a novel bitstream relocation model to enable the placement of relocatable accelerators across several locations on the FPGA chip. In addition to supporting all the configuration capabilities required to implement a Reconfigurable Operating System (ROS), the proposed ICM also supports the novel multiple-clone configuration technique which allows for cloning several instances of the same hardware accelerator at the same time resulting in much shorter configuration time compared to traditional configuration techniques. A faulttolerant (FT) version of the proposed ICM which supports a comprehensive faultrecovery scheme is also introduced in this thesis. The proposed FT-ICM is designed with a much smaller area footprint compared to Triple Modular Redundancy (TMR) hardening techniques while keeping a comparable level of fault-resilience. The capabilities of the proposed ICM system are demonstrated with two novel applications. The first application demonstrates a proof-of-concept reliable FPGA server solution used for executing encryption/decryption queries. The proposed server deploys bitstream relocation and modular redundancy to mitigate both permanent and transient faults in the device. It also deploys a novel Built-In Self- Test (BIST) diagnosis scheme, specifically designed to detect emerging permanent faults in the system at run-time. The second application is a data mining application where DPR is used to increase the computational density of a system used to implement the Frequent Itemset Mining (FIM) problem

Edinburgh Research Archive

Resiliency Mechanisms for In-Memory Column Stores

Author: Kolditz Till
Publication venue
Publication date: 15/02/2019
Field of study

The key objective of database systems is to reliably manage data, while high query throughput and low query latency are core requirements. To date, database research activities mostly concentrated on the second part. However, due to the constant shrinking of transistor feature sizes, integrated circuits become more and more unreliable and transient hardware errors in the form of multi-bit flips become more and more prominent. In a more recent study (2013), in a large high-performance cluster with around 8500 nodes, a failure rate of 40 FIT per DRAM device was measured. For their system, this means that every 10 hours there occurs a single- or multi-bit flip, which is unacceptably high for enterprise and HPC scenarios. Causes can be cosmic rays, heat, or electrical crosstalk, with the latter being exploited actively through the RowHammer attack. It was shown that memory cells are more prone to bit flips than logic gates and several surveys found multi-bit flip events in main memory modules of today's data centers. Due to the shift towards in-memory data management systems, where all business related data and query intermediate results are kept solely in fast main memory, such systems are in great danger to deliver corrupt results to their users. Hardware techniques can not be scaled to compensate the exponentially increasing error rates. In other domains, there is an increasing interest in software-based solutions to this problem, but these proposed methods come along with huge runtime and/or storage overheads. These are unacceptable for in-memory data management systems. In this thesis, we investigate how to integrate bit flip detection mechanisms into in-memory data management systems. To achieve this goal, we first build an understanding of bit flip detection techniques and select two error codes, AN codes and XOR checksums, suitable to the requirements of in-memory data management systems. The most important requirement is effectiveness of the codes to detect bit flips. We meet this goal through AN codes, which exhibit better and adaptable error detection capabilities than those found in today's hardware. The second most important goal is efficiency in terms of coding latency. We meet this by introducing a fundamental performance improvements to AN codes, and by vectorizing both chosen codes' operations. We integrate bit flip detection mechanisms into the lowest storage layer and the query processing layer in such a way that the remaining data management system and the user can stay oblivious of any error detection. This includes both base columns and pointer-heavy index structures such as the ubiquitous B-Tree. Additionally, our approach allows adaptable, on-the-fly bit flip detection during query processing, with only very little impact on query latency. AN coding allows to recode intermediate results with virtually no performance penalty. We support our claims by providing exhaustive runtime and throughput measurements throughout the whole thesis and with an end-to-end evaluation using the Star Schema Benchmark. To the best of our knowledge, we are the first to present such holistic and fast bit flip detection in a large software infrastructure such as in-memory data management systems. Finally, most of the source code fragments used to obtain the results in this thesis are open source and freely available.:1 INTRODUCTION 1.1 Contributions of this Thesis 1.2 Outline 2 PROBLEM DESCRIPTION AND RELATED WORK 2.1 Reliable Data Management on Reliable Hardware 2.2 The Shift Towards Unreliable Hardware 2.3 Hardware-Based Mitigation of Bit Flips 2.4 Data Management System Requirements 2.5 Software-Based Techniques For Handling Bit Flips 2.5.1 Operating System-Level Techniques 2.5.2 Compiler-Level Techniques 2.5.3 Application-Level Techniques 2.6 Summary and Conclusions 3 ANALYSIS OF CODING TECHNIQUES 3.1 Selection of Error Codes 3.1.1 Hamming Coding 3.1.2 XOR Checksums 3.1.3 AN Coding 3.1.4 Summary and Conclusions 3.2 Probabilities of Silent Data Corruption 3.2.1 Probabilities of Hamming Codes 3.2.2 Probabilities of XOR Checksums 3.2.3 Probabilities of AN Codes 3.2.4 Concrete Error Models 3.2.5 Summary and Conclusions 3.3 Throughput Considerations 3.3.1 Test Systems Descriptions 3.3.2 Vectorizing Hamming Coding 3.3.3 Vectorizing XOR Checksums 3.3.4 Vectorizing AN Coding 3.3.5 Summary and Conclusions 3.4 Comparison of Error Codes 3.4.1 Effectiveness 3.4.2 Efficiency 3.4.3 Runtime Adaptability 3.5 Performance Optimizations for AN Coding 3.5.1 The Modular Multiplicative Inverse 3.5.2 Faster Softening 3.5.3 Faster Error Detection 3.5.4 Comparison to Original AN Coding 3.5.5 The Multiplicative Inverse Anomaly 3.6 Summary 4 BIT FLIP DETECTING STORAGE 4.1 Column Store Architecture 4.1.1 Logical Data Types 4.1.2 Storage Model 4.1.3 Data Representation 4.1.4 Data Layout 4.1.5 Tree Index Structures 4.1.6 Summary 4.2 Hardened Data Storage 4.2.1 Hardened Physical Data Types 4.2.2 Hardened Lightweight Compression 4.2.3 Hardened Data Layout 4.2.4 UDI Operations 4.2.5 Summary and Conclusions 4.3 Hardened Tree Index Structures 4.3.1 B-Tree Verification Techniques 4.3.2 Justification For Further Techniques 4.3.3 The Error Detecting B-Tree 4.4 Summary 5 BIT FLIP DETECTING QUERY PROCESSING 5.1 Column Store Query Processing 5.2 Bit Flip Detection Opportunities 5.2.1 Early Onetime Detection 5.2.2 Late Onetime Detection 5.2.3 Continuous Detection 5.2.4 Miscellaneous Processing Aspects 5.2.5 Summary and Conclusions 5.3 Hardened Intermediate Results 5.3.1 Materialization of Hardened Intermediates 5.3.2 Hardened Bitmaps 5.4 Summary 6 END-TO-END EVALUATION 6.1 Prototype Implementation 6.1.1 AHEAD Architecture 6.1.2 Diversity of Physical Operators 6.1.3 One Concrete Operator Realization 6.1.4 Summary and Conclusions 6.2 Performance of Individual Operators 6.2.1 Selection on One Predicate 6.2.2 Selection on Two Predicates 6.2.3 Join Operators 6.2.4 Grouping and Aggregation 6.2.5 Delta Operator 6.2.6 Summary and Conclusions 6.3 Star Schema Benchmark Queries 6.3.1 Query Runtimes 6.3.2 Improvements Through Vectorization 6.3.3 Storage Overhead 6.3.4 Summary and Conclusions 6.4 Error Detecting B-Tree 6.4.1 Single Key Lookup 6.4.2 Key Value-Pair Insertion 6.5 Summary 7 SUMMARY AND CONCLUSIONS 7.1 Future Work A APPENDIX A.1 List of Golden As A.2 More on Hamming Coding A.2.1 Code examples A.2.2 Vectorization BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLES LIST OF LISTINGS LIST OF ACRONYMS LIST OF SYMBOLS LIST OF DEFINITION

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Doctor of Philosophy in Computing

Author: Udipi Aniruddha N.
Publication venue: University of Utah
Publication date: 01/01/2012
Field of study

dissertatio

The University of Utah: J. Willard Marriott Digital Library

Doctor of Philosophy

Author: Udipi Aniruddha N.
Publication venue: University of Utah
Publication date: 01/05/2012
Field of study

dissertationThe computing landscape is undergoing a major change, primarily enabled by ubiquitous wireless networks and the rapid increase in the use of mobile devices which access a web-based information infrastructure. It is expected that most intensive computing may either happen in servers housed in large datacenters (warehouse- scale computers), e.g., cloud computing and other web services, or in many-core high-performance computing (HPC) platforms in scientific labs. It is clear that the primary challenge to scaling such computing systems into the exascale realm is the efficient supply of large amounts of data to hundreds or thousands of compute cores, i.e., building an efficient memory system. Main memory systems are at an inflection point, due to the convergence of several major application and technology trends. Examples include the increasing importance of energy consumption, reduced access stream locality, increasing failure rates, limited pin counts, increasing heterogeneity and complexity, and the diminished importance of cost-per-bit. In light of these trends, the memory system requires a major overhaul. The key to architecting the next generation of memory systems is a combination of the prudent incorporation of novel technologies, and a fundamental rethinking of certain conventional design decisions. In this dissertation, we study every major element of the memory system - the memory chip, the processor-memory channel, the memory access mechanism, and memory reliability, and identify the key bottlenecks to efficiency. Based on this, we propose a novel main memory system with the following innovative features: (i) overfetch-aware re-organized chips, (ii) low-cost silicon photonic memory channels, (iii) largely autonomous memory modules with a packet-based interface to the proces- sor, and (iv) a RAID-based reliability mechanism. Such a system is energy-efficient, high-performance, low-complexity, reliable, and cost-effective, making it ideally suited to meet the requirements of future large-scale computing systems

The University of Utah: J. Willard Marriott Digital Library