85 research outputs found
Processor-In-Memory (PIM) Based Architectures for PetaFlops Potential Massively Parallel Processing
The report summarizes the work performed at the University of Notre Dame under a NASA grant from July 15, 1995 through July 14, 1996. Researchers involved in the work included the PI, Dr. Peter M. Kogge, and three graduate students under his direction in the Computer Science and Engineering Department: Stephen Dartt, Costin Iancu, and Lakshmi Narayanaswany. The organization of this report is as follows. Section 2 is a summary of the problem addressed by this work. Section 3 is a summary of the project's objectives and approach. Section 4 summarizes PIM technology briefly. Section 5 overviews the main results of the work. Section 6 then discusses the importance of the results and future directions. Also attached to this report are copies of several technical reports and publications whose contents directly reflect results developed during this study
Custom-Enabled System Architectures for High End Computing
The US Federal Government has convened a major committee to determine future directions for government sponsored high end computing system acquisitions and enabling research. The High End Computing Revitalization Task Force was inaugurated in 2003 involving all Federal agencies for which high end computing is critical to meeting mission goals. As part of the HECRTF agenda, a multi-day community wide workshop was conducted involving experts from academia, industry, and the national laboratories and centers to provide the broadest perspective on important issues related to the HECRTF purview. Among the most critical issues in establishing future directions is the relative merits of commodity based systems such as clusters and MPPs versus custom system architecture strategies. This paper presents a perspective on the importance and value of the custom architecture approach in meeting future US requirements in supercomputing. The contents of this paper reflect the ideas of the participants of the working group chartered to explore custom enabled system architectures for high end computing. As in any such consensus presentation, while this paper captures the key ideas and tradeoffs, it does not exactly match the viewpoint of any single contributor, and there remains much room for constructive disagreement and refinement of the essential conclusions
Yearly update : exascale projections for 2013.
The HPC architectures of today are significantly different for a decade ago, with high odds that further changes will occur on the road to Exascale. This paper discusses the %E2%80%9Cperfect storm%E2%80%9D in technology that produced this change, the classes of architectures we are dealing with, and probable trends in how they will evolve. These properties and trends are then evaluated in terms of what it likely means to future Exascale systems and applications.
Application Performance of Physical System Simulations
Various parallel computer benchmarking projects have been around since early 1990s but the adopted so far approaches for performance analysis require a significant revision in view of the recent developments of both the application domain and the computer technologies. This paper presents a novel performance evaluation methodology based on assessing the processing rate of two orthogonal use cases – dense and sparse physical systems – as well as the energy efficiency for both. Evaluation results with two popular codes — HPL and HPCG — validate our approach and demonstrate its use for analysis and interpretation in order to identify and confirm current technological challenges as well as to track and roadmap the future application performance of physical system simulations
Towards Advantages of Parameterized Quantum Pulses
The advantages of quantum pulses over quantum gates have attracted increasing
attention from researchers. Quantum pulses offer benefits such as flexibility,
high fidelity, scalability, and real-time tuning. However, while there are
established workflows and processes to evaluate the performance of quantum
gates, there has been limited research on profiling parameterized pulses and
providing guidance for pulse circuit design. To address this gap, our study
proposes a set of design spaces for parameterized pulses, evaluating these
pulses based on metrics such as expressivity, entanglement capability, and
effective parameter dimension. Using these design spaces, we demonstrate the
advantages of parameterized pulses over gate circuits in the aspect of duration
and performance at the same time thus enabling high-performance quantum
computing. Our proposed design space for parameterized pulse circuits has shown
promising results in quantum chemistry benchmarks.Comment: 11 Figures, 4 Table
Polygonal path simplification with angle constraints
We present efficient geometric algorithms for simplifying polygonal paths in R2 and R3 that have angle constraints, improving by nearly a linear factor over the graph-theoretic solutions based on known techniques. The algorithms we present match the time bounds for their unconstrained counterparts. As a key step in our solutions, we formulate and solve an off-line ball exclusion search problem, which may be of interest in its own right
Exascale Research: Preparing for the Post-Moore Era
(i) Achieving exascale performance at the end of this decade or the beginning of next decade is essential for progress in science – including progress on problems of major societal impact (such as weather or environmental impact); essential for the continued certification of the nuclear stockpile; and essential to our national security.
(ii) The rate of advance in the performance of CMOS technology is slowing down and is likely to plateau mid next decade. No alternative technology is ready for deployment.
(iii) Therefore, achieving exascale performance in 20 years may not be significantly cheaper than achieving it in 10 years – even if we could afford the wait.
(iv) It is essential (for continued progress in solving major societal problems, nuclear stockpile, security) to have a sustained growth in supercomputer performance and sustained advantage over competitors and potential enemies.
(v) To achieve this continued growth, we need research on (a) using CMOS more efficiently and (b) accelerating the development and deployment of a CMOS replacement.
(vi) (a) is (or should be) the focus of exascale research: How to get significantly higher compute efficiencies from a fixed transistor or energy budget. (b) is essential to explore, even if not for exascale in 10 years, as it will be necessary to continue beyond exascale. unpublishednot peer reviewe
Migratory Memory-Side Processing Breakthrough Architecture for Graph Analytics
Presented on November 2, 2018 at 1:45 p.m. the Klaus Advanced Computing Building, Room 1116 East/West, Georgia Institute of Technology (Georgia Tech).Second Annual Center for Research into Novel Computing Hierarchies (CRNCH) Summit, November 2, 2018 at Georgia Tech.Keynote Speaker - Dr. Peter Kogge is a Chaired Professor in Notre Dame's Department of Computer Science and Engineering. Peter is an IBM Fellow and the 2012 Seymour Cray Award winner among other awards. Prior to academia, he spent 26 yrs. with IBM Federal. Peter's undergraduate degree is from Notre Dame and he has a Ph.D. from Stanford in Electrical Engineering.Runtime: 40:52 minutesToday's data intensive applications, such as sparse-matrix linear algebra and graph analytics, do not exhibit the same locality traits as compute-intensive applications, resulting in the latency of individual memory accesses overwhelming the advantages of deeply pipe-lined fast cores. The Emu Migratory Memory-Side Processing architecture provides a highly efficient, fine-grained memory system and migrating threads that move the thread state as new memory locations are accessed, without explicit program directives. The "put-only" communication model dramatically reduces thread latency and total network bandwidth load as return trips and cache coherency are eliminated. Working with the CRNCH team, Emu shares results that validate the viability of the architecture, presents a roadmap for scalability and discusses how the architecture delivers orders of magnitude reduction in data movement, inter-process communication and energy requirements. The talk touches on the familiar programming
model selected for the architecture which makes it accessible to programmers and data scientists, and reveals upcoming areas of joint research with Georgia Tech in the area of Migratory Threads
- …