238 research outputs found

    Real-Time Beamforming Using High-Speed FPGAs at the Allen Telescope Array

    Get PDF
    The Allen Telescope Array (ATA) at the Hat Creek Radio Observatory (HCRO) is a wideā€field panchromatic radio telescope currently consisting of 42 offsetā€Gregorian antennas each with a 6 m aperture, with plans to expand the array to 350 antennas. Through unique backā€end hardware, the ATA performs realā€time wideband beamforming with independent subarray capabilities and customizable beam shaping. The beamformers enable science observations requiring the full gain of the array, time domain (nonintegrated) output, and interference excision or orthogonal beamsets. In this paper we report on the design of this beamformer, including architecture and experimental results. Furthermore, we address some practical considerations in largeā€N wideband beamformers implemented on field programmable gate array platforms, including device utilization, methods of calibration and control, and interchip synchronization

    Towards Power- and Energy-Efficient Datacenters

    Full text link
    As the Internet evolves, cloud computing is now a dominant form of computation in modern lives. Warehouse-scale computers (WSCs), or datacenters, comprising the foundation of this cloud-centric web have been able to deliver satisfactory performance to both the Internet companies and the customers. With the increased focus and popularity of the cloud, however, datacenter loads rise and grow rapidly, and Internet companies are in need of boosted computing capacity to serve such demand. Unfortunately, power and energy are often the major limiting factors prohibiting datacenter growth: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. This dissertation aims to investigate the issues of power and energy usage in a modern datacenter environment. We identify the source of power and energy inefficiency at three levels in a modern datacenter environment and provides insights and solutions to address each of these problems, aiming to prepare datacenters for critical future growth. We start at the datacenter-level and find that the peak provisioning and improper service placement in multi-level power delivery infrastructures fragment the power budget inside production datacenters, degrading the compute capacity the existing infrastructure can support. We find that the heterogeneity among datacenter workloads is key to address this issue and design systematic methods to reduce the fragmentation and improve the utilization of the power budget. This dissertation then narrow the focus to examine the energy usage of individual servers running cloud workloads. Especially, we examine the power management mechanisms employed in these servers and find that the coarse time granularity of these mechanisms is one critical factor that leads to excessive energy consumption. We propose an intelligent and low overhead solution on top of the emerging finer granularity voltage/frequency boosting circuit to effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost, improving energy efficiency without sacrificing the quality of services. The final focus of this dissertation takes a further step to investigate how using a fundamentally more efficient computing substrate, field programmable gate arrays (FPGAs), benefit datacenter power and energy efficiency. Different from other types of hardware accelerations, FPGAs can be reconfigured on-the-fly to provide fine-grain control over hardware resource allocation and presents a unique set of challenges for optimal workload scheduling and resource allocation. We aim to design a set coordinated algorithms to manage these two key factors simultaneously and fully explore the benefit of deploying FPGAs in the highly varying cloud environment.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144043/1/hsuch_1.pd

    Domain Specific Computing in Tightly-Coupled Heterogeneous Systems

    Get PDF
    Over the past several decades, researchers and programmers across many disciplines have relied on Moores law and Dennard scaling for increases in compute capability in modern processors. However, recent data suggest that the number of transistors per square inch on integrated circuits is losing pace with Moores laws projection due to the breakdown of Dennard scaling at smaller semiconductor process nodes. This has signaled the beginning of a new ā€œgolden age in computer architectureā€ in which the paradigm will be shifted from improving traditional processor performance for general tasks to architecting hardware that executes a class of applications in a high-performing manner. This shift will be paved, in part, by making compute systems more heterogeneous and investigating domain specific architectures. However, the notion of domain specific architectures raises many research questions. Specifically, what constitutes a domain? How does one architect hardware for a specific domain? In this dissertation, we present our work towards domain specific computing. We start by constructing a guiding definition for our target domain and then creating a benchmark suite of applications based on our domain definition. We then use quantitative metrics from the literature to characterize our domain in order to gain insights regarding what would be most beneficial in hardware targeted specifically for the domain. From the characterization, we learn that data movement is a particularly salient aspect of our domain. Motivated by this fact, we evaluate our target platform, the Intel HARPv2 CPU+FPGA system, for architecting domain specific hardware through a portability and performance evaluation. To guide the creation of domain specific hardware for this platform, we create a novel tool to quantify spatial and temporal locality. We apply this tool to our benchmark suite and use the generated outputs as features to an unsupervised clustering algorithm. We posit that the resulting clusters represent sub-domains within our originally specified domain; specifically, these clusters inform whether a kernel of computation should be designed as a widely vectorized or deeply pipelined compute unit. Using the lessons learned from the domain characterization and hardware platform evaluation, we outline our process of designing hardware for our domain, and empirically verify that our prediction regarding a wide or deep kernel implementation is correct

    Improving Performance Estimation for FPGA-based Accelerators for Convolutional Neural Networks

    Get PDF
    Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application instances. To determine the optimal configuration of an FPGA-based accelerator, it is necessary to explore the design space and an accurate performance prediction plays an important role during the exploration. This work introduces a novel method for fast and accurate estimation of latency based on a Gaussian process parametrised by an analytic approximation and coupled with runtime data. The experiments conducted on three different CNNs on an FPGA-based accelerator on Intel Arria 10 GX 1150 demonstrated a 30.7% improvement in accuracy with respect to the mean absolute error in comparison to a standard analytic method in leave-one-out cross-validation.Comment: This article is accepted for publication at ARC'202
    • ā€¦
    corecore