Search CORE

83 research outputs found

Operating System Support for High-Performance Solid State Drives

Author: Bjørling Matias
Publication venue: IT-Universitetet i København
Publication date: 01/01/2016
Field of study

The IT University of Copenhagen's Repository

Analog Spiking Neuromorphic Circuits and Systems for Brain- and Nanotechnology-Inspired Cognitive Computing

Author: Wu Xinyu
Publication venue: 'IUScholarWorks'
Publication date: 01/12/2016
Field of study

Human society is now facing grand challenges to satisfy the growing demand for computing power, at the same time, sustain energy consumption. By the end of CMOS technology scaling, innovations are required to tackle the challenges in a radically different way. Inspired by the emerging understanding of the computing occurring in a brain and nanotechnology-enabled biological plausible synaptic plasticity, neuromorphic computing architectures are being investigated. Such a neuromorphic chip that combines CMOS analog spiking neurons and nanoscale resistive random-access memory (RRAM) using as electronics synapses can provide massive neural network parallelism, high density and online learning capability, and hence, paves the path towards a promising solution to future energy-efficient real-time computing systems. However, existing silicon neuron approaches are designed to faithfully reproduce biological neuron dynamics, and hence they are incompatible with the RRAM synapses, or require extensive peripheral circuitry to modulate a synapse, and are thus deficient in learning capability. As a result, they eliminate most of the density advantages gained by the adoption of nanoscale devices, and fail to realize a functional computing system. This dissertation describes novel hardware architectures and neuron circuit designs that synergistically assemble the fundamental and significant elements for brain-inspired computing. Versatile CMOS spiking neurons that combine integrate-and-fire, passive dense RRAM synapses drive capability, dynamic biasing for adaptive power consumption, in situ spike-timing dependent plasticity (STDP) and competitive learning in compact integrated circuit modules are presented. Real-world pattern learning and recognition tasks using the proposed architecture were demonstrated with circuit-level simulations. A test chip was implemented and fabricated to verify the proposed CMOS neuron and hardware architecture, and the subsequent chip measurement results successfully proved the idea. The work described in this dissertation realizes a key building block for large-scale integration of spiking neural network hardware, and then, serves as a step-stone for the building of next-generation energy-efficient brain-inspired cognitive computing systems

Boise State University - ScholarWorks

Studies in Exascale Computer Architecture: Interconnect, Resiliency, and Checkpointing

Author: Abeyratne Sandunmalee
Publication venue
Publication date
Field of study

Today’s supercomputers are built from the state-of-the-art components to extract as much performance as possible to solve the most computationally intensive problems in the world. Building the next generation of exascale supercomputers, however, would require re-architecting many of these components to extract over 50x more performance than the current fastest supercomputer in the United States. To contribute towards this goal, two aspects of the compute node architecture were examined in this thesis: the on-chip interconnect topology and the memory and storage checkpointing platforms. As a first step, a skeleton exascale system was modeled to meet 1 exaflop of performance along with 100 petabytes of main memory. The model revealed that large kilo-core processors would be necessary to meet the exaflop performance goal; existing topologies, however, would not scale to those levels. To address this new challenge, we investigated and proposed asymmetric high-radix topologies that decoupled local and global communications and used different radix routers for switching network traffic at each level. The proposed topologies scaled more readily to higher numbers of cores with better latency and energy consumption than before. The vast number of components that the model revealed would be needed in these exascale systems cautioned towards better fault tolerance mechanisms. To address this challenge, we showed that local checkpoints within the compute node can be saved to a hybrid DRAM and SSD platform in order to write them faster without wearing out the SSD or consuming a lot of energy. A hybrid checkpointing platform allowed more frequent checkpoints to be made without sacrificing performance. Subsequently, we proposed switching to a DIMM-based SSD in order to perform fine-grained I/O operations that would be integral in interleaving checkpointing and computation while still providing persistence guarantees. Two more techniques that consolidate and overlap checkpointing were designed to better hide the checkpointing latency to the SSD.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137096/1/sabeyrat_1.pd

Deep Blue Documents at the University of Michigan

Parallel programming systems for scalable scientific computing

Author: Heldens S.J.
Publication venue
Publication date: 01/01/2024
Field of study

High-performance computing (HPC) systems are more powerful than ever before. However, this rise in performance brings with it greater complexity, presenting significant challenges for researchers who wish to use these systems for their scientific work. This dissertation explores the development of scalable programming solutions for scientific computing. These solutions aim to be effective across a diverse range of computing platforms, from personal desktops to advanced supercomputers.To better understand HPC systems, this dissertation begins with a literature review on exascale supercomputers, massive systems capable of performing 10¹⁸ floating-point operations per second. This review combines both manual and data-driven analyses, revealing that while traditional challenges of exascale computing have largely been addressed, issues like software complexity and data volume remain. Additionally, the dissertation introduces the open-source software tool (called LitStudy) developed for this research.Next, this dissertation introduces two novel programming systems. The first system (called Rocket) is designed to scale all-versus-all algorithms to massive datasets. It features a multi-level software-based cache, a divide-and-conquer approach, hierarchical work-stealing, and asynchronous processing to maximize data reuse, exploit data locality, dynamically balance workloads, and optimize resource utilization. The second system (called Lightning) aims to scale existing single-GPU kernel functions across multiple GPUs, even on different nodes, with minimal code adjustments. Results across eight benchmarks on up to 32 GPUs show excellent scalability.The dissertation concludes by proposing a set of design principles for developing parallel programming systems for scalable scientific computing. These principles, based on lessons from this PhD research, represent significant steps forward in enabling researchers to efficiently utilize HPC systems

International Migration, Integration and Social Cohesion online publications

Abstractions for Portable Data Management in Heterogeneous Memory Systems

Author: Foyer Clement M
Publication venue
Publication date: 24/06/2021
Field of study

Explore Bristol Research

Small-Scale Intelligent Vehicle Design Platform

Author: Grant Christopher
Miley Jay
Phillips Evan
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2017
Field of study

Intelligent Vehicle Design is a growing field with the potential to save many lives by actively minimizing the impacts of human error. Though there are many ways to research intelligent vehicle control, full-scale implementations are expensive and dangerous and computer simulations have extremely steep learning curves. Researchers and students need an accessible, adaptable, and robust development platform to rapidly create and test autonomous control algorithms. While small-scale platforms are often designed from the ground up for specific projects, this requires analysis, design, and manufacture. The goal of this project is to develop a small-scale intelligent vehicle that can be configured with physical sensors and programmed with control algorithms designed in Simulink. We will strive to make our design adaptable and reproducible through intentional design and documentation. We have completed the design to adapt a 1/7th scale remote control vehicle with a custom chassis, independently driven wheels, and a Raspberry Pi based control package. An inertial measurement unit, an ultrasonic rangefinder, and a camera will give the system realtime data about itself and its surroundings. This well-documented research platform will enable more students to get hands on experience in developing and testing intelligent vehicle systems. These students will become the next generation of vehicle safety engineers, developing the life-saving intelligent vehicle systems of the future

DigitalCommons@CalPoly

Holistic Performance Analysis and Optimization of Unified Virtual Memory

Author: Allen Tyler
Publication venue: Clemson University Libraries
Publication date: 01/08/2022
Field of study

The programming difficulty of creating GPU-accelerated high performance computing (HPC) codes has been greatly reduced by the advent of Unified Memory technologies that abstract the management of physical memory away from the developer. However, these systems incur substantial overhead that paradoxically grows for codes where these technologies are most useful. While these technologies are increasingly adopted for use in modern HPC frameworks and applications, the performance cost reduces the efficiency of these systems and turns away some developers from adoption entirely. These systems are naturally difficult to optimize due to the large number of interconnected hardware and software components that must be untangled to perform thorough analysis. In this thesis, we take the first deep dive into a functional implementation of a Unified Memory system, NVIDIA UVM, to evaluate the performance and characteristics of these systems. We show specific hardware and software interactions that cause serialization between host and devices. We further provide a quantitative evaluation of fault handling for various applications under different scenarios, including prefetching and oversubscription. Through lower-level analysis, we find that the driver workload is dependent on the interactions among application access patterns, GPU hardware constraints, and Host OS components. These findings indicate that the cost of host OS components is significant and present across UM implementations. We also provide a proof-of-concept asynchronous approach to memory management in UVM that allows for reduced system overhead and improved application performance. This study provides constructive insight into future implementations and systems, such as Heterogeneous Memory Management

Clemson University: TigerPrints

On the Secure and Resilient Design of Connected Vehicles: Methods and Guidelines

Author: Rosenstatter Thomas
Publication venue
Publication date: 01/01/2021
Field of study

Vehicles have come a long way from being purely mechanical systems to systems that consist of an internal network of more than 100 microcontrollers and systems that communicate with external entities, such as other vehicles, road infrastructure, the manufacturer’s cloud and external applications. This combination of resource constraints, safety-criticality, large attack surface and the fact that millions of people own and use them each day, makes securing vehicles particularly challenging as security practices and methods need to be tailored to meet these requirements.This thesis investigates how security demands should be structured to ease discussions and collaboration between the involved parties and how requirements engineering can be accelerated by introducing generic security requirements. Practitioners are also assisted in choosing appropriate techniques for securing vehicles by identifying and categorising security and resilience techniques suitable for automotive systems. Furthermore, three specific mechanisms for securing automotive systems and providing resilience are designed and evaluated. The first part focuses on cyber security requirements and the identification of suitable techniques based on three different approaches, namely (i) providing a mapping to security levels based on a review of existing security standards and recommendations; (ii) proposing a taxonomy for resilience techniques based on a literature review; and (iii) combining security and resilience techniques to protect automotive assets that have been subject to attacks. The second part presents the design and evaluation of three techniques. First, an extension for an existing freshness mechanism to protect the in-vehicle communication against replay attacks is presented and evaluated. Second, a trust model for Vehicle-to-Vehicle communication is developed with respect to cyber resilience to allow a vehicle to include trust in neighbouring vehicles in its decision-making processes. Third, a framework is presented that enables vehicle manufacturers to protect their fleet by detecting anomalies and security attacks using vehicle trust and the available data in the cloud

Chalmers Research