Search CORE

10 research outputs found

The future of computing beyond Moore's Law.

Author: Shalf John
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

Ezid

eScholarship - University of California

Modelling and Simulation for Power Distribution Grids of 3D Tiled Computing Arrays

Author: Thuphairo Pakon
Publication venue
Publication date: 01/04/2023
Field of study

This thesis presents modelling and simulation developments for power distribution grids of 3D tiled computing arrays (TCAs), a novel type of paradigm for HPC systems, and tests the feasibility of such systems for HPC systems domains. The exploration of a complex power-grid such as those found in the TCA concept requires detailed simulations of systems with hundreds and possibly thousands of modular nodes, each contributing to the collective behaviour of the system. In particular power, voltage, and current behaviours are critically important observations. To facilitate this investigation, and test the hypothesis, which seeks to understand if scalability is feasible for such systems, a bespoke simulation platform has been developed, and (importantly) validated against hardware prototypes of small systems. A number of systems are simulated, including systems consisting of arrays of ’balls’. Balls are collections of modular tiles that form a ball-like modular unit, and can then themselves be tiled into large scale systems. Evaluations typically involved simulation of cubic arrays of sizes ranging from 2x2x2 balls up to 10x10x10. Larger systems require extended simulation times. Therefore models are developed to extrapolate system behaviours for higher-orders of systems and to gauge the ultimate scalability of such TCA systems. It is found that systems of 40x40x40 are quite feasible with appropriate configurations. Data connectivity is explored to a lesser degree, but comparisons were made between TCA systems and well known comparable HPC systems, and it is concluded that TCA systems can be built with comparable data-flow and scalability, and that the electrical and engineering challenges associated with the novelty of 3D tiled systems can be met with practical solutions

White Rose E-theses Online

Horizons of modern molecular dynamics simulation in digitalized solid freeform fabrication with advanced materials

Author: Bizarri Gregory
Goel Gaurav
Goel Saurav
Knaggs Michael
Kumar Vinod
Matthews Allan
Murphy Adrian
Stukowski Alexander
Thakur Vijay Kumar
Tiwari Ashutosh
Upadhyaya Hari M.
Zhou Xiaowang W.
Publication venue: 'Elsevier BV'
Publication date: 22/09/2020
Field of study

Our ability to shape and finish a component by combined methods of fabrication including (but not limited to) subtractive, additive, and/or no theoretical mass-loss/addition during the fabrication is now popularly known as solid freeform fabrication (SFF). Fabrication of a telescope mirror is a typical example where grinding and polishing processes are first applied to shape the mirror, and thereafter, an optical coating is usually applied to enhance its optical performance. The area of nanomanufacturing cannot grow without a deep knowledge of the fundamentals of materials and consequently, the use of computer simulations is now becoming ubiquitous. This article is intended to highlight the most recent advances in the computation benefit specific to the area of precision SFF as these systems are traversing through the journey of digitalization and Industry-4.0. Specifically, this article demonstrates that the application of the latest materials modelling approaches, based on techniques such as molecular dynamics, are enabling breakthroughs in applied precision manufacturing techniques

Queen's University Belfast Research Portal

Cranfield CERES

The University of Manchester - Institutional Repository

White Rose Research Online

SRUC - Scotland's Rural College

センヨウプロセッサニヨルテイデンリョクジッコウニムケタベイジアンシンソウニューラルネットワークノアンテイシタガクシュウアルゴリズム

Author: ニシダケイゴ
西田圭吾
Publication venue
Publication date
Field of study

Osaka University Knowledge Archive

Collective analog bioelectronic computation

Author: Mandal Soumyajit, 1979-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2009
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 677-710).In this thesis, I present two examples of fast-and-highly-parallel analog computation inspired by architectures in biology. The first example, an RF cochlea, maps the partial differential equations that describe fluid-membrane-hair-cell wave propagation in the biological cochlea to an equivalent inductor-capacitor-transistor integrated circuit. It allows ultra-broadband spectrum analysis of RF signals to be performed in a rapid low-power fashion, thus enabling applications for universal or software radio. The second example exploits detailed similarities between the equations that describe chemical-reaction dynamics and the equations that describe subthreshold current flow in transistors to create fast-and-highly-parallel integrated-circuit models of protein-protein and gene-protein networks inside a cell. Due to a natural mapping between the Poisson statistics of molecular flows in a chemical reaction and Poisson statistics of electronic current flow in a transistor, stochastic effects are automatically incorporated into the circuit architecture, allowing highly computationally intensive stochastic simulations of large-scale biochemical reaction networks to be performed rapidly. I show that the exponentially tapered transmission-line architecture of the mammalian cochlea performs constant-fractional-bandwidth spectrum analysis with O(N) expenditure of both analysis time and hardware, where N is the number of analyzed frequency bins. This is the best known performance of any spectrum-analysis architecture, including the constant-resolution Fast Fourier Transform (FFT), which scales as O(N logN), or a constant-fractional-bandwidth filterbank, which scales as O (N2).(cont.) The RF cochlea uses this bio-inspired architecture to perform real-time, on-chip spectrum analysis at radio frequencies. I demonstrate two cochlea chips, implemented in standard 0.13m CMOS technology, that decompose the RF spectrum from 600MHz to 8GHz into 50 log-spaced channels, consume < 300mW of power, and possess 70dB of dynamic range. The real-time spectrum analysis capabilities of my chips make them uniquely suitable for ultra-broadband universal or software radio receivers of the future. I show that the protein-protein and gene-protein chips that I have built are particularly suitable for simulation, parameter discovery and sensitivity analysis of interaction networks in cell biology, such as signaling, metabolic, and gene regulation pathways. Importantly, the chips carry out massively parallel computations, resulting in simulation times that are independent of model complexity, i.e., O(1). They also automatically model stochastic effects, which are of importance in many biological systems, but are numerically stiff and simulate slowly on digital computers. Currently, non-fundamental data-acquisition limitations show that my proof-of-concept chips simulate small-scale biochemical reaction networks at least 100 times faster than modern desktop machines. It should be possible to get 103 to 106 simulation speedups of genome-scale and organ-scale intracellular and extracellular biochemical reaction networks with improved versions of my chips. Such chips could be important both as analysis tools in systems biology and design tools in synthetic biology.by Soumyajit Mandal.Ph.D

DSpace@MIT

Yuki shiran jiko soshikika tanbunshimaku no kiso bussei oyobi oyo ni kansuru kenkyu

Author: Yamamoto Hideaki
Publication venue
Publication date: 01/01/2009
Field of study

制度:新 ; 報告番号:甲2813号 ; 学位の種類:博士(工学) ; 授与年月日:2009/3/15 ; 早大学位記番号:新503

Waseda University Repository

Complexity, Emergent Systems and Complex Biological Systems:\ud Complex Systems Theory and Biodynamics. [Edited book by I.C. Baianu, with listed contributors (2011)]

Author: Baianu Prof. Dr I.C.
Publication venue: PediaPress: Mainz, Germany
Publication date: 03/03/2011
Field of study

An overview is presented of System dynamics, the study of the behaviour of complex systems, Dynamical system in mathematics Dynamic programming in computer science and control theory, Complex systems biology, Neurodynamics and Psychodynamics.\u

CogPrints Cognitive Sciences Eprint Archive

Structured parallelism discovery with hybrid static-dynamic analysis and evaluation technique

Author: Vasiladiotis Christos
Publication venue: The University of Edinburgh
Publication date: 17/01/2023
Field of study

Parallel computer architectures have dominated the computing landscape for the past two decades; a trend that is only expected to continue and intensify, with increasing specialization and heterogeneity. This creates huge pressure across the software stack to produce programming languages, libraries, frameworks and tools which will efficiently exploit the capabilities of parallel computers, not only for new software, but also revitalizing existing sequential code. Automatic parallelization, despite decades of research, has had limited success in transforming sequential software to take advantage of efficient parallel execution. This thesis investigates three approaches that use commutativity analysis as the enabler for parallelization. This has the potential to overcome limitations of traditional techniques. We introduce the concept of liveness-based commutativity for sequential loops. We examine the use of a practical analysis utilizing liveness-based commutativity in a symbolic execution framework. Symbolic execution represents input values as groups of constraints, consequently deriving the output as a function of the input and enabling the identification of further program properties. We employ this feature to develop an analysis and discern commutativity properties between loop iterations. We study the application of this approach on loops taken from real-world programs in the OLDEN and NAS Parallel Benchmark (NPB) suites, and identify its limitations and related overheads. Informed by these findings, we develop Dynamic Commutativity Analysis (DCA), a new technique that leverages profiling information from program execution with specific input sets. Using profiling information, we track liveness information and detect loop commutativity by examining the code’s live-out values. We evaluate DCA against almost 1400 loops of the NPB suite, discovering 86% of them as parallelizable. Comparing our results against dependence-based methods, we match the detection efficacy of two dynamic and outperform three static approaches, respectively. Additionally, DCA is able to automatically detect parallelism in loops which iterate over Pointer-Linked Data Structures (PLDSs), taken from wide range of benchmarks used in the literature, where all other techniques we considered failed. Parallelizing the discovered loops, our methodology achieves an average speedup of 3.6× across NPB (and up to 55×) and up to 36.9× for the PLDS-based loops on a 72-core host. We also demonstrate that our methodology, despite relying on specific input values for profiling each program, is able to correctly identify parallelism that is valid for all potential input sets. Lastly, we develop a methodology to utilize liveness-based commutativity, as implemented in DCA, to detect latent loop parallelism in the shape of patterns. Our approach applies a series of transformations which subsequently enable multiple applications of DCA over the generated multi-loop code section and match its loop commutativity outcomes against the expected criteria for each pattern. Applying our methodology on sets of sequential loops, we are able to identify well-known parallel patterns (i.e., maps, reduction and scans). This extends the scope of parallelism detection to loops, such as those performing scan operations, which cannot be determined as parallelizable by simply evaluating liveness-based commutativity conditions on their original form

Edinburgh Research Archive

Assessment of Molecular Modeling & Simulation

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref