104 research outputs found
Processor allocation strategies for modified hypercubes
Parallel processing has been widely accepted to be the future in high speed computing. Among the various parallel architectures proposed/implemented, the hypercube has shown a lot of promise because of its poweful properties, like regular topology, fault tolerance, low diameter, simple routing, and ability to efficiently emulate other architectures. The major drawback of the hypercube network is that it can not be expanded in practice because the number of communication ports for each processor grows as the logarithm of the total number of processors in the system. Therefore, once a hypercube supercomputer of a certain dimensionality has been built, any future expansions can be accomplished only by replacing the VLSI chips. This is an undesirable feature and a lot of work has been under progress to eliminate this stymie, thus providing a platform for easier expansion.
Modified hypercubes (MHs) have been proposed as the building blocks of hypercube-based systems supporting incremental growth techniques without introducing extra resources for individual hypercubes.
However, processor allocation on MHs proves to be a challenge due to a slight deviation in their topology from that of the standard hypercube network. This thesis addresses the issue of processor allocation on MHs and proposes various strategies which are based, partially or entirely, on table look-up approaches. A study of the various task allocation strategies for standard hypercubes is conducted and their suitability for MHs is evaluated. It is shown that the proposed strategies have a perfect subcube recognition ability and a superior performance. Existing processor allocation strategies for pure hypercube networks are demonstrated to be ineffective for MHs, in the light of their inability to recognize all available subcubes. A comparative analysis that involves the buddy strategy and the new strategies is carried out using simulation results
Predictive Scale-Bridging Simulations through Active Learning
Throughout computational science, there is a growing need to utilize the
continual improvements in raw computational horsepower to achieve greater
physical fidelity through scale-bridging over brute-force increases in the
number of mesh elements. For instance, quantitative predictions of transport in
nanoporous media, critical to hydrocarbon extraction from tight shale
formations, are impossible without accounting for molecular-level interactions.
Similarly, inertial confinement fusion simulations rely on numerical diffusion
to simulate molecular effects such as non-local transport and mixing without
truly accounting for molecular interactions. With these two disparate
applications in mind, we develop a novel capability which uses an active
learning approach to optimize the use of local fine-scale simulations for
informing coarse-scale hydrodynamics. Our approach addresses three challenges:
forecasting continuum coarse-scale trajectory to speculatively execute new
fine-scale molecular dynamics calculations, dynamically updating coarse-scale
from fine-scale calculations, and quantifying uncertainty in neural network
models
Peer-to-Peer Networks and Computation: Current Trends and Future Perspectives
This research papers examines the state-of-the-art in the area of P2P networks/computation. It attempts to identify the challenges that confront the community of P2P researchers and developers, which need to be addressed before the potential of P2P-based systems, can be effectively realized beyond content distribution and file-sharing applications to build real-world, intelligent and commercial software systems. Future perspectives and some thoughts on the evolution of P2P-based systems are also provided
Gaussian Process Regression models for the properties of micro-tearing modes in spherical tokamak
Spherical tokamaks (STs) have many desirable features that make them an
attractive choice for a future fusion power plant. Power plant viability is
intrinsically related to plasma heat and particle confinement and this is often
determined by the level of micro-instability driven turbulence. Accurate
calculation of the properties of turbulent micro-instabilities is therefore
critical for tokamak design, however, the evaluation of these properties is
computationally expensive. The considerable number of geometric and
thermodynamic parameters and the high resolutions required to accurately
resolve these instabilities makes repeated use of direct numerical simulations
in integrated modelling workflows extremely computationally challenging and
creates the need for fast, accurate, reduced-order models.
This paper outlines the development of a data-driven reduced-order model,
often termed a {\it surrogate model} for the properties of micro-tearing modes
(MTMs) across a spherical tokamak reactor-relevant parameter space utilising
Gaussian Process Regression (GPR) and classification; techniques from machine
learning. These two components are used in an active learning loop to maximise
the efficiency of data acquisition thus minimising computational cost. The
high-fidelity gyrokinetic code GS2 is used to calculate the linear properties
of the MTMs: the mode growth rate, frequency and normalised electron heat flux;
core components of a quasi-linear transport model. Five-fold cross-validation
and direct validation on unseen data is used to ascertain the performance of
the resulting surrogate models
Gaussian process for ground-motion prediction and emulation of systems of computer models
In this thesis, several challenges in both ground-motion modelling and the surrogate modelling, are addressed by developing methods based on Gaussian processes (GP). The first chapter contains an overview of the GP and summarises the key findings of the rest of the thesis. In the second chapter, an estimation algorithm, called the Scoring estimation approach, is developed to train GP-based ground-motion models with spatial correlation. The Scoring estimation approach is introduced theoretically and numerically, and it is proven to have desirable properties on convergence and computation. It is a statistically robust method, producing consistent and statistically efficient estimators of spatial correlation parameters. The predictive performance of the estimated ground-motion model is assessed by a simulation-based application, which gives important implications on the seismic risk assessment. In the third chapter, a GP-based surrogate model, called the integrated emulator, is introduced to emulate a system of multiple computer models. It generalises the state-of-the-art linked emulator for a system of two computer models and considers a variety of kernels (exponential, squared exponential, and two key Matérn kernels) that are essential in advanced applications. By learning the system structure, the integrated emulator outperforms the composite emulator, which emulates the entire system using only global inputs and outputs. Furthermore, its analytic expressions allow a fast and efficient design algorithm that could yield significant computational and predictive gains by allocating different runs to individual computer models based on their heterogeneous functional complexity. The benefits of the integrated emulator are demonstrated in a series of synthetic experiments and a feed-back coupled fire-detection satellite model. Finally, the developed method underlying the integrated emulator is used to construct a non-stationary Gaussian process model based on deep Gaussian hierarchy
Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks
Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment
Investigation of hybrid message-passing and shared-memory architectures for parallel computer : a case study : turbonet
Several DSP (Digital Signal Processing) algorithms are developed for the MIT TurboNet parallel computer. In contrast to other parallel computers that implement exclusively in hardware either the message-passing or the shared-memory communication paradigm, or employ distributed shared-memory architectures characterized by inefficient implementation of the shared-memory paradigm, the hybrid architecture of TurboNet supports direct, efficient implementation of both paradigms. Three versions of each algorithm are developed, if possible, corresponding to message-passing, shared-memory, and hybrid communications, respectively. Theoretical and experimental comparisons of algorithms are employed in the analysis of performance. The results prove that the hybrid versions generally achieve better performance than the other two versions. The main conclusion of this research is that small-scale and medium-scale parallel computers should implement directly in hardware both communication paradigms, for high performance, robustness in relation to the application space, and ease of algorithm development. To facilitate theoretical comparisons, a methodology is developed for highly accurate prediction of algorithm performance. The success of this methodology proves that such prediction is possible for complex parallel computers, such as TurboNet, if enough information is provided by the data dependence graphs
Recommended from our members
A study of aspects of synchronisation and communication in certain parallel computer architectures
This paper examines methods for synchronisation and communication between tasks in highly parallel arrays of processors. The development of various methods is researched and simulation techniques are applied to specific structures, to examine their effectiveness. Two approaches to simulation are presented, in the first case a discrete event simulator is applied to task synchronisation implemented with semaphores in a close coupled environment. Secondly the concurrent programming language Occam is used to simulate a systolic configuration of processors. In this case the design is verified, through actual system construction.
Conclusions are drawn regarding the design disciplines and structure imposed by the use of these simulation techniques. A close relationship is found between the behaviour of a simulation written in Occam and the same structure constructed from multiple processors.
Further research is suggested into the subject of dataflow processors, to find suitable means for simulating such systems, prior to implementation. A type of test vehicle is proposed that would operate a dataflow processor under the control of the development system
- …