102 research outputs found

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Doctor of Philosophy

    Get PDF
    dissertationThe internet-based information infrastructure that has powered the growth of modern personal/mobile computing is composed of powerful, warehouse-scale computers or datacenters. These heavily subscribed datacenters perform data-processing jobs under intense quality of service guarantees. Further, high-performance compute platforms are being used to model and analyze increasingly complex scientific problems and natural phenomena. To ensure that the high-performance needs of these machines are met, it is necessary to increase the efficiency of the memory system that supplies data to the processing cores. Many of the microarchitectural innovations that were designed to scale the memory wall (e.g., out-of-order instruction execution, on-chip caches) are being rendered less effective due to several emerging trends (e.g., increased emphasis on energy consumption, limited access locality). This motivates the optimization of the main memory system itself. The key to an efficient main memory system is the memory controller. In particular, the scheduling algorithm in the memory controller greatly influences its performance. This dissertation explores this hypothesis in several contexts. It develops tools to better understand memory scheduling and develops scheduling innovations for CPUs and GPUs. We propose novel memory scheduling techniques that are strongly aware of the access patterns of the clients as well as the microarchitecture of the memory device. Based on these, we present (i) a Dynamic Random Access Memory (DRAM) chip microarchitecture optimized for reducing write-induced slowdown, (ii) a memory scheduling algorithm that exploits these features, (iii) several memory scheduling algorithms to reduce the memory-related stall experienced by irregular General Purpose Graphics Processing Unit (GPGPU) applications, and (iv) the Utah Simulated Memory Module (USIMM), a detailed, validated simulator for DRAM main memory that we use for analyzing and proposing scheduler algorithms

    Doctor of Philosophy in Computing

    Get PDF
    dissertatio

    Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

    Get PDF
    Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

    NASA Tech Briefs, May 1990

    Get PDF
    Topics: New Product Ideas; NASA TU Services; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences

    RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)

    Full text link
    RAID proposal advocated replacing large disks with arrays of PC disks, but as the capacity of small disks increased 100-fold in 1990s the production of large disks was discontinued. Storage dependability is increased via replication or erasure coding. Cloud storage providers store multiple copies of data obviating for need for further redundancy. Varitaions of RAID based on local recovery codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs have low latency and high bandwidth, are more reliable, consume less power and have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial text overlap with arXiv:2306.0876

    Enabling Edge-Intelligence in Resource-Constrained Autonomous Systems

    Get PDF
    The objective of this research is to shift Machine Learning algorithms from resource-extensive server/cloud to compute-limited edge nodes by designing energy-efficient ML systems. Multiple sub-areas of research in this domain are explored for the application of drone autonomous navigation. Our principal goal is to enable the UAV to autonomously navigate using Reinforcement Learning, without incurring any additional hardware or sensor cost. Most of the lightweight UAVs are limited in their resources such as compute capabilities and onboard energy source, and the conventional state-of-the-art ML algorithms cannot be directly implemented on them. This research addresses this issue by devising energy-efficient ML algorithms, modifying existing ML algorithms, designing energy-efficient ML accelerators, and leveraging the hardware-algorithm co-design. RL is notorious for being data-hungry and requires trials and error for it to converge. Hence it cannot be directly implemented on real drones until the issues of safety, data limitations, and reward generation is addressed. Instead of learning the task from scratch, just like humans, RL algorithms can benefit from prior knowledge which can help them converge to their goals in less time and consume less energy. Multiple drones can be collectively used to help each other by sharing their locally learned knowledge. Such distributive systems can help agents learn their respective local tasks faster but may become vulnerable to attacks in the presence of adversarial agents which needs to be addressed. Finally, the improvement in the energy efficiency of RL-based systems achieved from the algorithmic approaches is limited by the underlying hardware and computing architectures. Hence, these need to be redesigned in an application-specific way exploring and exploiting the nature of the most used ML operators This can be done by exploring new computing devices and considering the data reuse and dataflow of ML operators within the architectural design. This research discusses these issues by addressing them and presenting better alternatives. It is concluded that energy consumption at multiple levels of hierarchy needs to be addressed by exploring algorithmic, hardware-based, and algorithm-hardware co-design approaches.Ph.D
    • …
    corecore