102 research outputs found
Design Space Exploration and Resource Management of Multi/Many-Core Systems
The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends
Recommended from our members
Very-Large-Scale-Integration Circuit Techniques in Internet-of-Things Applications
Heading towards the era of Internet-of-things (IoT) means both opportunity and challenge for the circuit-design community. In a system where billions of devices are equipped with the ability to sense, compute, communicate with each other and perform tasks in a coordinated manner, security and power management are among the most critical challenges.
Physically unclonable function (PUF) emerges as an important security primitive in hardware-security applications; it provides an object-specific physical identifier hidden within the intrinsic device variations, which is hard to expose and reproduce by adversaries. Yet, designing a compact PUF robust to noise, temperature and voltage remains a challenge.
This thesis presents a novel PUF design approach based on a pair of ultra-compact analog circuits whose output is proportional to absolute temperature. The proposed approach is demonstrated through two works: (1) an ultra-compact and robust PUF based on voltage-compensated proportional-to-absolute-temperature voltage generators that occupies 8.3Ă— less area than the previous work with the similar robustness and twice the robustness of the previously most compact PUF design and (2) a technique to transform a 6T-SRAM array into a robust analog PUF with minimal overhead. In this work, similar circuit topology is used to transform a preexisting on-chip SRAM into a PUF, which further reduces the area in (1) with no robustness penalty.
In this thesis, we also explore techniques for power management circuit design.
Energy harvesting is an essential functionality in an IoT sensor node, where battery replacement is cost-prohibitive or impractical. Yet, existing energy-harvesting power management units (EH PMU) suffer from efficiency loss in the two-step voltage conversion: harvester-to-battery and battery-to-load. We propose an EH PMU architecture with hybrid energy storage, where a capacitor is introduced in addition to the battery to serve as an intermediate energy buffer to minimize the battery involvement in the system energy flow. Test-case measurements show as much as a 2.2Ă— improvement in the end-to-end energy efficiency.
In contrast, with the drastically reduced power consumption of IoT nodes that operates in the sub-threshold regime, adaptive dynamic voltage scaling (DVS) for supply-voltage margin removal, fully on-chip integration and high power conversion efficiency (PCE) are required in PMU designs. We present a PMU–load co-design based on a fully integrated switched-capacitor DC-DC converter (SC-DC) and hybrid error/replica-based regulation for a fully digital PMU control. The PMU is integrated with a neural spike processor (NSP) that achieves a record-low power consumption of 0.61 µW for 96 channels. A tunable replica circuit is added to assist the error regulation and prevent loss of regulation. With automatic energy-robustness co-optimization, the PMU can set the SC-DC’s optimal conversion ratio and switching frequency. The PMU achieves a PCE of 77.7% (72.2%) at VIN = 0.6 V (1 V) and at the NSP’s margin-free operating point
Doctor of Philosophy
dissertationThe internet-based information infrastructure that has powered the growth of modern personal/mobile computing is composed of powerful, warehouse-scale computers or datacenters. These heavily subscribed datacenters perform data-processing jobs under intense quality of service guarantees. Further, high-performance compute platforms are being used to model and analyze increasingly complex scientific problems and natural phenomena. To ensure that the high-performance needs of these machines are met, it is necessary to increase the efficiency of the memory system that supplies data to the processing cores. Many of the microarchitectural innovations that were designed to scale the memory wall (e.g., out-of-order instruction execution, on-chip caches) are being rendered less effective due to several emerging trends (e.g., increased emphasis on energy consumption, limited access locality). This motivates the optimization of the main memory system itself. The key to an efficient main memory system is the memory controller. In particular, the scheduling algorithm in the memory controller greatly influences its performance. This dissertation explores this hypothesis in several contexts. It develops tools to better understand memory scheduling and develops scheduling innovations for CPUs and GPUs. We propose novel memory scheduling techniques that are strongly aware of the access patterns of the clients as well as the microarchitecture of the memory device. Based on these, we present (i) a Dynamic Random Access Memory (DRAM) chip microarchitecture optimized for reducing write-induced slowdown, (ii) a memory scheduling algorithm that exploits these features, (iii) several memory scheduling algorithms to reduce the memory-related stall experienced by irregular General Purpose Graphics Processing Unit (GPGPU) applications, and (iv) the Utah Simulated Memory Module (USIMM), a detailed, validated simulator for DRAM main memory that we use for analyzing and proposing scheduler algorithms
Doctor of Philosophy in Computing
dissertatio
Simulation and implementation of novel deep learning hardware architectures for resource constrained devices
Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
NASA Tech Briefs, May 1990
Topics: New Product Ideas; NASA TU Services; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences
RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)
RAID proposal advocated replacing large disks with arrays of PC disks, but as
the capacity of small disks increased 100-fold in 1990s the production of large
disks was discontinued. Storage dependability is increased via replication or
erasure coding. Cloud storage providers store multiple copies of data obviating
for need for further redundancy. Varitaions of RAID based on local recovery
codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs
have low latency and high bandwidth, are more reliable, consume less power and
have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial
text overlap with arXiv:2306.0876
Enabling Edge-Intelligence in Resource-Constrained Autonomous Systems
The objective of this research is to shift Machine Learning algorithms from resource-extensive server/cloud to compute-limited edge nodes by designing energy-efficient ML systems. Multiple sub-areas of research in this domain are explored for the application of drone autonomous navigation. Our principal goal is to enable the UAV to autonomously navigate using Reinforcement Learning, without incurring any additional hardware or sensor cost. Most of the lightweight UAVs are limited in their resources such as compute capabilities and onboard energy source, and the conventional state-of-the-art ML algorithms cannot be directly implemented on them. This research addresses this issue by devising energy-efficient ML algorithms, modifying existing ML algorithms, designing energy-efficient ML accelerators, and leveraging the hardware-algorithm co-design. RL is notorious for being data-hungry and requires trials and error for it to converge. Hence it cannot be directly implemented on real drones until the issues of safety, data limitations, and reward generation is addressed. Instead of learning the task from scratch, just like humans, RL algorithms can benefit from prior knowledge which can help them converge to their goals in less time and consume less energy. Multiple drones can be collectively used to help each other by sharing their locally learned knowledge. Such distributive systems can help agents learn their respective local tasks faster but may become vulnerable to attacks in the presence of adversarial agents which needs to be addressed. Finally, the improvement in the energy efficiency of RL-based systems achieved from the algorithmic approaches is limited by the underlying hardware and computing architectures. Hence, these need to be redesigned in an application-specific way exploring and exploiting the nature of the most used ML operators This can be done by exploring new computing devices and considering the data reuse and dataflow of ML operators within the architectural design. This research discusses these issues by addressing them and presenting better alternatives. It is concluded that energy consumption at multiple levels of hierarchy needs to be addressed by exploring algorithmic, hardware-based, and algorithm-hardware co-design approaches.Ph.D
- …