85 research outputs found

    Processor-In-Memory (PIM) Based Architectures for PetaFlops Potential Massively Parallel Processing

    Get PDF
    The report summarizes the work performed at the University of Notre Dame under a NASA grant from July 15, 1995 through July 14, 1996. Researchers involved in the work included the PI, Dr. Peter M. Kogge, and three graduate students under his direction in the Computer Science and Engineering Department: Stephen Dartt, Costin Iancu, and Lakshmi Narayanaswany. The organization of this report is as follows. Section 2 is a summary of the problem addressed by this work. Section 3 is a summary of the project's objectives and approach. Section 4 summarizes PIM technology briefly. Section 5 overviews the main results of the work. Section 6 then discusses the importance of the results and future directions. Also attached to this report are copies of several technical reports and publications whose contents directly reflect results developed during this study

    Custom-Enabled System Architectures for High End Computing

    Get PDF
    The US Federal Government has convened a major committee to determine future directions for government sponsored high end computing system acquisitions and enabling research. The High End Computing Revitalization Task Force was inaugurated in 2003 involving all Federal agencies for which high end computing is critical to meeting mission goals. As part of the HECRTF agenda, a multi-day community wide workshop was conducted involving experts from academia, industry, and the national laboratories and centers to provide the broadest perspective on important issues related to the HECRTF purview. Among the most critical issues in establishing future directions is the relative merits of commodity based systems such as clusters and MPPs versus custom system architecture strategies. This paper presents a perspective on the importance and value of the custom architecture approach in meeting future US requirements in supercomputing. The contents of this paper reflect the ideas of the participants of the working group chartered to explore custom enabled system architectures for high end computing. As in any such consensus presentation, while this paper captures the key ideas and tradeoffs, it does not exactly match the viewpoint of any single contributor, and there remains much room for constructive disagreement and refinement of the essential conclusions

    Yearly update : exascale projections for 2013.

    Get PDF
    The HPC architectures of today are significantly different for a decade ago, with high odds that further changes will occur on the road to Exascale. This paper discusses the %E2%80%9Cperfect storm%E2%80%9D in technology that produced this change, the classes of architectures we are dealing with, and probable trends in how they will evolve. These properties and trends are then evaluated in terms of what it likely means to future Exascale systems and applications.

    Application Performance of Physical System Simulations

    Get PDF
    Various parallel computer benchmarking projects have been around since early 1990s but the adopted so far approaches for performance analysis require a significant revision in view of the recent developments of both the application domain and the computer technologies. This paper presents a novel performance evaluation methodology based on assessing the processing rate of two orthogonal use cases – dense and sparse physical systems – as well as the energy efficiency for both. Evaluation results with two popular codes — HPL and HPCG — validate our approach and demonstrate its use for analysis and interpretation in order to identify and confirm current technological challenges as well as to track and roadmap the future application performance of physical system simulations

    Towards Advantages of Parameterized Quantum Pulses

    Full text link
    The advantages of quantum pulses over quantum gates have attracted increasing attention from researchers. Quantum pulses offer benefits such as flexibility, high fidelity, scalability, and real-time tuning. However, while there are established workflows and processes to evaluate the performance of quantum gates, there has been limited research on profiling parameterized pulses and providing guidance for pulse circuit design. To address this gap, our study proposes a set of design spaces for parameterized pulses, evaluating these pulses based on metrics such as expressivity, entanglement capability, and effective parameter dimension. Using these design spaces, we demonstrate the advantages of parameterized pulses over gate circuits in the aspect of duration and performance at the same time thus enabling high-performance quantum computing. Our proposed design space for parameterized pulse circuits has shown promising results in quantum chemistry benchmarks.Comment: 11 Figures, 4 Table

    Polygonal path simplification with angle constraints

    Get PDF
    We present efficient geometric algorithms for simplifying polygonal paths in R2 and R3 that have angle constraints, improving by nearly a linear factor over the graph-theoretic solutions based on known techniques. The algorithms we present match the time bounds for their unconstrained counterparts. As a key step in our solutions, we formulate and solve an off-line ball exclusion search problem, which may be of interest in its own right

    Exascale Research: Preparing for the Post-Moore Era

    Get PDF
    (i) Achieving exascale performance at the end of this decade or the beginning of next decade is essential for progress in science – including progress on problems of major societal impact (such as weather or environmental impact); essential for the continued certification of the nuclear stockpile; and essential to our national security. (ii) The rate of advance in the performance of CMOS technology is slowing down and is likely to plateau mid next decade. No alternative technology is ready for deployment. (iii) Therefore, achieving exascale performance in 20 years may not be significantly cheaper than achieving it in 10 years – even if we could afford the wait. (iv) It is essential (for continued progress in solving major societal problems, nuclear stockpile, security) to have a sustained growth in supercomputer performance and sustained advantage over competitors and potential enemies. (v) To achieve this continued growth, we need research on (a) using CMOS more efficiently and (b) accelerating the development and deployment of a CMOS replacement. (vi) (a) is (or should be) the focus of exascale research: How to get significantly higher compute efficiencies from a fixed transistor or energy budget. (b) is essential to explore, even if not for exascale in 10 years, as it will be necessary to continue beyond exascale. unpublishednot peer reviewe

    Migratory Memory-Side Processing Breakthrough Architecture for Graph Analytics

    No full text
    Presented on November 2, 2018 at 1:45 p.m. the Klaus Advanced Computing Building, Room 1116 East/West, Georgia Institute of Technology (Georgia Tech).Second Annual Center for Research into Novel Computing Hierarchies (CRNCH) Summit, November 2, 2018 at Georgia Tech.Keynote Speaker - Dr. Peter Kogge is a Chaired Professor in Notre Dame's Department of Computer Science and Engineering. Peter is an IBM Fellow and the 2012 Seymour Cray Award winner among other awards. Prior to academia, he spent 26 yrs. with IBM Federal. Peter's undergraduate degree is from Notre Dame and he has a Ph.D. from Stanford in Electrical Engineering.Runtime: 40:52 minutesToday's data intensive applications, such as sparse-matrix linear algebra and graph analytics, do not exhibit the same locality traits as compute-intensive applications, resulting in the latency of individual memory accesses overwhelming the advantages of deeply pipe-lined fast cores. The Emu Migratory Memory-Side Processing architecture provides a highly efficient, fine-grained memory system and migrating threads that move the thread state as new memory locations are accessed, without explicit program directives. The "put-only" communication model dramatically reduces thread latency and total network bandwidth load as return trips and cache coherency are eliminated. Working with the CRNCH team, Emu shares results that validate the viability of the architecture, presents a roadmap for scalability and discusses how the architecture delivers orders of magnitude reduction in data movement, inter-process communication and energy requirements. The talk touches on the familiar programming model selected for the architecture which makes it accessible to programmers and data scientists, and reveals upcoming areas of joint research with Georgia Tech in the area of Migratory Threads
    • …
    corecore