12 research outputs found
Recommended from our members
Design Space Exploration of Accelerators for Warehouse Scale Computing
With Moore’s law grinding to a halt, accelerators are one of the ways that new silicon can improve performance, and they are already a key component in modern datacenters. Accelerators are integrated circuits that implement parts of an application with the objective of higher energy efficiency compared to execution on a standard general purpose CPU. Many accelerators can target any particular workload, generally with a wide range of performance, and costs such as area or power. Exploring these design choices, called Design Space Exploration (DSE), is a crucial step in trying to find the most efficient accelerator design, the one that produces the largest reduction of the total cost of ownership.
This work aims to improve this design space exploration phase for accelerators and to avoid pitfalls in the process. This dissertation supports the thesis that early design choices – including the level of specialization – are critical for accelerator development and therefore require benchmarks reflective of production workloads. We present three studies that support this thesis. First, we show how to benchmark datacenter applications by creating a benchmark for large video sharing infrastructures. Then, we present two studies focused on accelerators for analytical query processing. The first is an analysis on the impact of Network on Chip specialization while the second analyses the impact of the level of specialization.
The first part of this dissertation introduces vbench: a video transcoding benchmark tailored to the growing video-as-a-service market. Video transcoding is not accurately represented in current computer architecture benchmarks such as SPEC or PARSEC. Despite posing a big computational burden for cloud video providers, such as YouTube and Facebook, it is not included in cloud benchmarks such as CloudSuite. Using vbench, we found that the microarchitectural profile of video transcoding is highly dependent on the input video, that SIMD extensions provide limited benefits, and that commercial hardware transcoders impose tradeoffs that are not ideal for cloud video providers. Our benchmark should spur architectural innovations for this critical workload. This work shows how to benchmark a real world warehouse scale application and the possible pitfalls in case of a mischaracterization.
When considering accelerators for the different, but no less important, application of analytical query processing, design space exploration plays a critical role. We analyzed the Q100, a class of accelerators for this application domain, using TPC-H as the reference benchmark. We found that the hardware computational blocks have to be tailored to the requirements of the application, but also the Network on Chip (NoC) can be specialized. We developed an algorithm capable of producing more effective Q100 designs by tailoring the NoC to the communication requirements of the system. Our algorithm is capable of producing designs that are Pareto optimal compared to standard NoC topologies. This shows how NoC specialization is highly effective for accelerators and it should be an integral part of design space exploration for large accelerators’ designs.
The third part of this dissertation analyzes the impact of the level of specialization, e.g. using an ASIC or Coarse Grain Reconfigurable Architecture (CGRA) implementation, on an accelerator performance. We developed a CGRA architecture capable of executing SQL query plans. We compare this architecture against Q100, an ASIC that targets the same class of workloads. Despite being less specialized, this programmable architecture shows comparable performance to the Q100 given an area and power budget. Resource usage explains this counterintuitive result, since a well programmed, homogeneous array of resources is able to more effectively harness silicon for the workload at hand. This suggests that a balanced accelerator research portfolio must include alternative programmable architectures – and their software stacks
Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems
Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ).
This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)
Spacelab Science Results Study
Beginning with OSTA-1 in November 1981 and ending with Neurolab in March 1998, a total of 36 Shuttle missions carried various Spacelab components such as the Spacelab module, pallet, instrument pointing system, or mission peculiar experiment support structure. The experiments carried out during these flights included astrophysics, solar physics, plasma physics, atmospheric science, Earth observations, and a wide range of microgravity experiments in life sciences, biotechnology, materials science, and fluid physics which includes combustion and critical point phenomena. In all, some 764 experiments were conducted by investigators from the U.S., Europe, and Japan. The purpose of this Spacelab Science Results Study is to document the contributions made in each of the major research areas by giving a brief synopsis of the more significant experiments and an extensive list of the publications that were produced. We have also endeavored to show how these results impacted the existing body of knowledge, where they have spawned new fields, and if appropriate, where the knowledge they produced has been applied
Statistical Performance Evaluation, System Modeling, Distributed Computation, and Signal Pattern Matching for a Compton Medical Imaging System.
Radionuclide cancer therapy requires imaging radiotracers that concentrate in tumors and emit high energy charged particles that kill tumor cells. These tracers, such as 131I, generally emit high energy photons that need to be imaged to estimate tumor dose and changes in size during treatment.
This research describes the performance of a dual-planar silicon-based Compton imaging system and compares it to a conventional parallel-hole collimated Anger camera with high energy general purpose lead collimator for imaging photons emitted from 131I.
The collimated Anger camera imposes a tradeoff between resolution and sensitivity due to the mechanical collimation. As the energy of photons exceed 364keV, increased septal penetration and scattering further degrade the imaging performance. Simulations of the Anger camera and the Compton imaging system demonstrate a 20-fold advantage in detection efficiency and higher spatial resolution for detecting high energy photons by the Compton camera since it decouples the tradeoff.
The system performance and comparision are analyzed using the modified uniform Cramer-Rao bound algorithms we developed along with the Monte Carlo calculations and system modeling. The bound show that the effect of Doppler broadening is the limiting factor for Compton camera performance for imaging 364keV photons. Performance of the two systems was compared and analyzed by simulating a 2D disk with uniform activities. For the case in which the two imaging systems detected the same number of events, the proposed Compton imaging system has lower image variance than the Anger camera with HEGP when the FWHM of the desired point source response is less than 1.2 cm. This advantage was also demonstrated by imaging and reconstructing a 2D hot spot phantom.
In addition to the performance analysis, the distributed Maximum Likelihood Maximization Expectation algorithm with chessboard data partition was evaluated for speeding up image reconstruction for the Compton imaging system. A 1 x 64 distributed computing system speeded computation by about a factor of 22 compared to a single processor. Finally, a real-time signal processing and pattern matching system employing state-of-the-art digital electronics is described for solving problems of event pile-up raised by high photon count rate in the second detector.Ph.D.Biomedical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/60851/1/lhan_1.pd
Abstracts on Radio Direction Finding (1899 - 1995)
The files on this record represent the various databases that originally composed the CD-ROM issue of "Abstracts on Radio Direction Finding" database, which is now part of the Dudley Knox Library's Abstracts and Selected Full Text Documents on Radio Direction Finding (1899 - 1995) Collection. (See Calhoun record https://calhoun.nps.edu/handle/10945/57364 for further information on this collection and the bibliography).
Due to issues of technological obsolescence preventing current and future audiences from accessing the bibliography, DKL exported and converted into the three files on this record the various databases contained in the CD-ROM.
The contents of these files are:
1) RDFA_CompleteBibliography_xls.zip [RDFA_CompleteBibliography.xls: Metadata for the complete bibliography, in Excel 97-2003 Workbook format; RDFA_Glossary.xls: Glossary of terms, in Excel 97-2003 Workbookformat; RDFA_Biographies.xls: Biographies of leading figures, in Excel 97-2003 Workbook format];
2) RDFA_CompleteBibliography_csv.zip [RDFA_CompleteBibliography.TXT: Metadata for the complete bibliography, in CSV format; RDFA_Glossary.TXT: Glossary of terms, in CSV format; RDFA_Biographies.TXT: Biographies of leading figures, in CSV format];
3) RDFA_CompleteBibliography.pdf: A human readable display of the bibliographic data, as a means of double-checking any possible deviations due to conversion
First Annual Workshop on Space Operations Automation and Robotics (SOAR 87)
Several topics relative to automation and robotics technology are discussed. Automation of checkout, ground support, and logistics; automated software development; man-machine interfaces; neural networks; systems engineering and distributed/parallel processing architectures; and artificial intelligence/expert systems are among the topics covered
Space station systems: A bibliography with indexes (supplement 9)
This bibliography lists 1,313 reports, articles, and other documents introduced into the NASA scientific and technical information system between January 1, 1989 and June 30, 1989. Its purpose is to provide helpful information to researchers, designers and managers engaged in Space Station technology development and mission design. Coverage includes documents that define major systems and subsystems related to structures and dynamic control, electronics and power supplies, propulsion, and payload integration. In addition, orbital construction methods, servicing and support requirements, procedures and operations, and missions for the current and future Space Station are included