434,927 research outputs found
Parallel implementation of the TRANSIMS micro-simulation
This paper describes the parallel implementation of the TRANSIMS traffic
micro-simulation. The parallelization method is domain decomposition, which
means that each CPU of the parallel computer is responsible for a different
geographical area of the simulated region. We describe how information between
domains is exchanged, and how the transportation network graph is partitioned.
An adaptive scheme is used to optimize load balancing. We then demonstrate how
computing speeds of our parallel micro-simulations can be systematically
predicted once the scenario and the computer architecture are known. This makes
it possible, for example, to decide if a certain study is feasible with a
certain computing budget, and how to invest that budget. The main ingredients
of the prediction are knowledge about the parallel implementation of the
micro-simulation, knowledge about the characteristics of the partitioning of
the transportation network graph, and knowledge about the interaction of these
quantities with the computer system. In particular, we investigate the
differences between switched and non-switched topologies, and the effects of 10
Mbit, 100 Mbit, and Gbit Ethernet. keywords: Traffic simulation, parallel
computing, transportation planning, TRANSIM
Parallel Computation in Econometrics: A Simplified Approach
Parallel computation has a long history in econometric computing, but is not at all wide spread. We believe that a major impediment is the labour cost of coding for parallel architectures. Moreover, programs for specific hardware often become obsolete quite quickly. Our approach is to take a popular matrix programming language (Ox), and implement a message-passing interface using MPI. Next, object-oriented programming allows us to hide the specific parallelization code, so that a program does not need to be rewritten when it is ported from the desktop to a distributed network of computers. Our focus is on so-called embarrassingly parallel computations, and we address the issue of parallel random number generation.Code optimization; Econometrics; High-performance computing; Matrix-programming language; Monte Carlo; MPI; Ox; Parallel computing; Random number generation.
Distributed computing methodology for training neural networks in an image-guided diagnostic application
Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used
A low-cost parallel implementation of direct numerical simulation of wall turbulence
A numerical method for the direct numerical simulation of incompressible wall
turbulence in rectangular and cylindrical geometries is presented. The
distinctive feature resides in its design being targeted towards an efficient
distributed-memory parallel computing on commodity hardware. The adopted
discretization is spectral in the two homogeneous directions; fourth-order
accurate, compact finite-difference schemes over a variable-spacing mesh in the
wall-normal direction are key to our parallel implementation. The parallel
algorithm is designed in such a way as to minimize data exchange among the
computing machines, and in particular to avoid taking a global transpose of the
data during the pseudo-spectral evaluation of the non-linear terms. The
computing machines can then be connected to each other through low-cost network
devices. The code is optimized for memory requirements, which can moreover be
subdivided among the computing nodes. The layout of a simple, dedicated and
optimized computing system based on commodity hardware is described. The
performance of the numerical method on this computing system is evaluated and
compared with that of other codes described in the literature, as well as with
that of the same code implementing a commonly employed strategy for the
pseudo-spectral calculation.Comment: To be published in J. Comp. Physic
A Novel Cross Entropy Approach for Offloading Learning in Mobile Edge Computing
In this letter, we propose a novel offloading learning approach to compromise energy consumption and latency in a multi-tier network with mobile edge computing. In order to solve this integer programming problem, instead of using conventional optimization tools, we apply a cross entropy approach with iterative learning of the probability of elite solution samples. Compared to existing methods, the proposed one in this network permits a parallel computing architecture and is verified to be computationally very efficient. Specifically, it achieves performance close to the optimal and performs well with different choices of the values of hyperparameters in the proposed learning approach
- …