Search CORE

104 research outputs found

Processor allocation strategies for modified hypercubes

Author: Haravu Nagasimha G.
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1992
Field of study

Parallel processing has been widely accepted to be the future in high speed computing. Among the various parallel architectures proposed/implemented, the hypercube has shown a lot of promise because of its poweful properties, like regular topology, fault tolerance, low diameter, simple routing, and ability to efficiently emulate other architectures. The major drawback of the hypercube network is that it can not be expanded in practice because the number of communication ports for each processor grows as the logarithm of the total number of processors in the system. Therefore, once a hypercube supercomputer of a certain dimensionality has been built, any future expansions can be accomplished only by replacing the VLSI chips. This is an undesirable feature and a lot of work has been under progress to eliminate this stymie, thus providing a platform for easier expansion. Modified hypercubes (MHs) have been proposed as the building blocks of hypercube-based systems supporting incremental growth techniques without introducing extra resources for individual hypercubes. However, processor allocation on MHs proves to be a challenge due to a slight deviation in their topology from that of the standard hypercube network. This thesis addresses the issue of processor allocation on MHs and proposes various strategies which are based, partially or entirely, on table look-up approaches. A study of the various task allocation strategies for standard hypercubes is conducted and their suitability for MHs is evaluated. It is shown that the proposed strategies have a perfect subcube recognition ability and a superior performance. Existing processor allocation strategies for pure hypercube networks are demonstrated to be ineffective for MHs, in the light of their inability to recognize all available subcubes. A comparative analysis that involves the buddy strategy and the new strategies is carried out using simulation results

Digital Commons @ New Jersey Institute of Technology (NJIT)

Predictive Scale-Bridging Simulations through Active Learning

Author: Chen Yu
Diaw Abdourahmane
Germann Timothy C.
Haack Jeffrey R.
Junghans Christoph
Kang Qinjun
Karra Satish
Livescu Daniel
Lubbers Nicholas
McKerns Michael
Mehana Mohamed
Pachalieva Aleksandra
Pavel Robert S.
Santos Javier E.
Viswanathan Hari S.
Publication venue
Publication date: 20/09/2022
Field of study

Throughout computational science, there is a growing need to utilize the continual improvements in raw computational horsepower to achieve greater physical fidelity through scale-bridging over brute-force increases in the number of mesh elements. For instance, quantitative predictions of transport in nanoporous media, critical to hydrocarbon extraction from tight shale formations, are impossible without accounting for molecular-level interactions. Similarly, inertial confinement fusion simulations rely on numerical diffusion to simulate molecular effects such as non-local transport and mixing without truly accounting for molecular interactions. With these two disparate applications in mind, we develop a novel capability which uses an active learning approach to optimize the use of local fine-scale simulations for informing coarse-scale hydrodynamics. Our approach addresses three challenges: forecasting continuum coarse-scale trajectory to speculatively execute new fine-scale molecular dynamics calculations, dynamically updating coarse-scale from fine-scale calculations, and quantifying uncertainty in neural network models

arXiv.org e-Print Archive

Directory of Open Access Journals

Peer-to-Peer Networks and Computation: Current Trends and Future Perspectives

Author: Awasthi Lalit K.
Gupta Ankur
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

This research papers examines the state-of-the-art in the area of P2P networks/computation. It attempts to identify the challenges that confront the community of P2P researchers and developers, which need to be addressed before the potential of P2P-based systems, can be effectively realized beyond content distribution and file-sharing applications to build real-world, intelligent and commercial software systems. Future perspectives and some thoughts on the evolution of P2P-based systems are also provided

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Computer vision algorithms on reconfigurable logic arrays

Author: A.K. Jain
N.K. Ratha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Gaussian Process Regression models for the properties of micro-tearing modes in spherical tokamak

Author: Buchanan James
Casson Francis
Fourcin Ben
Gray Ander
Hart Jordan
Hornsby William
Kennedy Daniel
Lykkegaard Mikkel
Nguyen Huy
Papadimas Nikolaos
Patel Bhavin
Roach Colin
Publication venue
Publication date: 14/11/2023
Field of study

Spherical tokamaks (STs) have many desirable features that make them an attractive choice for a future fusion power plant. Power plant viability is intrinsically related to plasma heat and particle confinement and this is often determined by the level of micro-instability driven turbulence. Accurate calculation of the properties of turbulent micro-instabilities is therefore critical for tokamak design, however, the evaluation of these properties is computationally expensive. The considerable number of geometric and thermodynamic parameters and the high resolutions required to accurately resolve these instabilities makes repeated use of direct numerical simulations in integrated modelling workflows extremely computationally challenging and creates the need for fast, accurate, reduced-order models. This paper outlines the development of a data-driven reduced-order model, often termed a {\it surrogate model} for the properties of micro-tearing modes (MTMs) across a spherical tokamak reactor-relevant parameter space utilising Gaussian Process Regression (GPR) and classification; techniques from machine learning. These two components are used in an active learning loop to maximise the efficiency of data acquisition thus minimising computational cost. The high-fidelity gyrokinetic code GS2 is used to calculate the linear properties of the MTMs: the mode growth rate, frequency and normalised electron heat flux; core components of a quasi-linear transport model. Five-fold cross-validation and direct validation on unseen data is used to ascertain the performance of the resulting surrogate models

arXiv.org e-Print Archive

Gaussian process for ground-motion prediction and emulation of systems of computer models

Author: Ming Deyu
Publication venue: UCL (University College London)
Publication date: 28/10/2020
Field of study

In this thesis, several challenges in both ground-motion modelling and the surrogate modelling, are addressed by developing methods based on Gaussian processes (GP). The first chapter contains an overview of the GP and summarises the key findings of the rest of the thesis. In the second chapter, an estimation algorithm, called the Scoring estimation approach, is developed to train GP-based ground-motion models with spatial correlation. The Scoring estimation approach is introduced theoretically and numerically, and it is proven to have desirable properties on convergence and computation. It is a statistically robust method, producing consistent and statistically efficient estimators of spatial correlation parameters. The predictive performance of the estimated ground-motion model is assessed by a simulation-based application, which gives important implications on the seismic risk assessment. In the third chapter, a GP-based surrogate model, called the integrated emulator, is introduced to emulate a system of multiple computer models. It generalises the state-of-the-art linked emulator for a system of two computer models and considers a variety of kernels (exponential, squared exponential, and two key Matérn kernels) that are essential in advanced applications. By learning the system structure, the integrated emulator outperforms the composite emulator, which emulates the entire system using only global inputs and outputs. Furthermore, its analytic expressions allow a fast and efficient design algorithm that could yield significant computational and predictive gains by allocating different runs to individual computer models based on their heterogeneous functional complexity. The benefits of the integrated emulator are demonstrated in a series of synthetic experiments and a feed-back coupled fire-detection satellite model. Finally, the developed method underlying the integrated emulator is used to construct a non-stationary Gaussian process model based on deep Gaussian hierarchy

UCL Discovery

Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

Author: Zhang Yu
Publication venue
Publication date: 28/01/2009
Field of study

Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

D-Scholarship@Pitt

Investigation of hybrid message-passing and shared-memory architectures for parallel computer : a case study : turbonet

Author: Li Xi
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1995
Field of study

Several DSP (Digital Signal Processing) algorithms are developed for the MIT TurboNet parallel computer. In contrast to other parallel computers that implement exclusively in hardware either the message-passing or the shared-memory communication paradigm, or employ distributed shared-memory architectures characterized by inefficient implementation of the shared-memory paradigm, the hybrid architecture of TurboNet supports direct, efficient implementation of both paradigms. Three versions of each algorithm are developed, if possible, corresponding to message-passing, shared-memory, and hybrid communications, respectively. Theoretical and experimental comparisons of algorithms are employed in the analysis of performance. The results prove that the hybrid versions generally achieve better performance than the other two versions. The main conclusion of this research is that small-scale and medium-scale parallel computers should implement directly in hardware both communication paradigms, for high performance, robustness in relation to the application space, and ease of algorithm development. To facilitate theoretical comparisons, a methodology is developed for highly accurate prediction of algorithm performance. The success of this methodology proves that such prediction is possible for complex parallel computers, such as TurboNet, if enough information is provided by the data dependence graphs

Digital Commons @ New Jersey Institute of Technology (NJIT)

Recommended from our members

A study of aspects of synchronisation and communication in certain parallel computer architectures

Author: Whitbread Martin John
Publication venue
Publication date: 01/01/1989
Field of study

This paper examines methods for synchronisation and communication between tasks in highly parallel arrays of processors. The development of various methods is researched and simulation techniques are applied to specific structures, to examine their effectiveness. Two approaches to simulation are presented, in the first case a discrete event simulator is applied to task synchronisation implemented with semaphores in a close coupled environment. Secondly the concurrent programming language Occam is used to simulate a systolic configuration of processors. In this case the design is verified, through actual system construction. Conclusions are drawn regarding the design disciplines and structure imposed by the use of these simulation techniques. A close relationship is found between the behaviour of a simulation written in Occam and the same structure constructed from multiple processors. Further research is suggested into the subject of dataflow processors, to find suitable means for simulating such systems, prior to implementation. A type of test vehicle is proposed that would operate a dataflow processor under the control of the development system

Open Research Online (The Open University)