10,637 research outputs found
Parallel implementation of the TRANSIMS micro-simulation
This paper describes the parallel implementation of the TRANSIMS traffic
micro-simulation. The parallelization method is domain decomposition, which
means that each CPU of the parallel computer is responsible for a different
geographical area of the simulated region. We describe how information between
domains is exchanged, and how the transportation network graph is partitioned.
An adaptive scheme is used to optimize load balancing. We then demonstrate how
computing speeds of our parallel micro-simulations can be systematically
predicted once the scenario and the computer architecture are known. This makes
it possible, for example, to decide if a certain study is feasible with a
certain computing budget, and how to invest that budget. The main ingredients
of the prediction are knowledge about the parallel implementation of the
micro-simulation, knowledge about the characteristics of the partitioning of
the transportation network graph, and knowledge about the interaction of these
quantities with the computer system. In particular, we investigate the
differences between switched and non-switched topologies, and the effects of 10
Mbit, 100 Mbit, and Gbit Ethernet. keywords: Traffic simulation, parallel
computing, transportation planning, TRANSIM
PyCARL: A PyNN Interface for Hardware-Software Co-Simulation of Spiking Neural Network
We present PyCARL, a PyNN-based common Python programming interface for
hardware-software co-simulation of spiking neural network (SNN). Through
PyCARL, we make the following two key contributions. First, we provide an
interface of PyNN to CARLsim, a computationally-efficient, GPU-accelerated and
biophysically-detailed SNN simulator. PyCARL facilitates joint development of
machine learning models and code sharing between CARLsim and PyNN users,
promoting an integrated and larger neuromorphic community. Second, we integrate
cycle-accurate models of state-of-the-art neuromorphic hardware such as
TrueNorth, Loihi, and DynapSE in PyCARL, to accurately model hardware latencies
that delay spikes between communicating neurons and degrade performance. PyCARL
allows users to analyze and optimize the performance difference between
software-only simulation and hardware-software co-simulation of their machine
learning models. We show that system designers can also use PyCARL to perform
design-space exploration early in the product development stage, facilitating
faster time-to-deployment of neuromorphic products. We evaluate the memory
usage and simulation time of PyCARL using functionality tests, synthetic SNNs,
and realistic applications. Our results demonstrate that for large SNNs, PyCARL
does not lead to any significant overhead compared to CARLsim. We also use
PyCARL to analyze these SNNs for a state-of-the-art neuromorphic hardware and
demonstrate a significant performance deviation from software-only simulations.
PyCARL allows to evaluate and minimize such differences early during model
development.Comment: 10 pages, 25 figures. Accepted for publication at International Joint
Conference on Neural Networks (IJCNN) 202
RepFlow: Minimizing Flow Completion Times with Replicated Flows in Data Centers
Short TCP flows that are critical for many interactive applications in data
centers are plagued by large flows and head-of-line blocking in switches.
Hash-based load balancing schemes such as ECMP aggravate the matter and result
in long-tailed flow completion times (FCT). Previous work on reducing FCT
usually requires custom switch hardware and/or protocol changes. We propose
RepFlow, a simple yet practically effective approach that replicates each short
flow to reduce the completion times, without any change to switches or host
kernels. With ECMP the original and replicated flows traverse distinct paths
with different congestion levels, thereby reducing the probability of having
long queueing delay. We develop a simple analytical model to demonstrate the
potential improvement of RepFlow. Extensive NS-3 simulations and Mininet
implementation show that RepFlow provides 50%--70% speedup in both mean and
99-th percentile FCT for all loads, and offers near-optimal FCT when used with
DCTCP.Comment: To appear in IEEE INFOCOM 201
Mobility Study for Named Data Networking in Wireless Access Networks
Information centric networking (ICN) proposes to redesign the Internet by
replacing its host-centric design with information-centric design.
Communication among entities is established at the naming level, with the
receiver side (referred to as the Consumer) acting as the driving force behind
content delivery, by interacting with the network through Interest message
transmissions. One of the proposed advantages for ICN is its support for
mobility, by de-coupling applications from transport semantics. However, so
far, little research has been conducted to understand the interaction between
ICN and mobility of consuming and producing applications, in protocols purely
based on information-centric principles, particularly in the case of NDN. In
this paper, we present our findings on the mobility-based performance of Named
Data Networking (NDN) in wireless access networks. Through simulations, we show
that the current NDN architecture is not efficient in handling mobility and
architectural enhancements needs to be done to fully support mobility of
Consumers and Producers.Comment: to appear in IEEE ICC 201
Scalable Interactive Volume Rendering Using Off-the-shelf Components
This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volu-metric visualization of large rectilinear scalar fields. By employingpipelined associative blending operators in a sort-last configuration a demonstration system with 8 rendering computers sustains 24 to 28 frames per second while interactively rendering large data volumes (1024x256x256 voxels, and 512x512x512 voxels). We believe interactive performance at these frame rates and data sizes is unprecedented. We also believe these results can be extended to other types of structured and unstructured grids and a variety of GL rendering techniques including surface rendering and shadow map-ping. We show how to extend our single-stage crossbar demonstration system to multi-stage networks in order to support much larger data sizes and higher image resolutions. This requires solving a dynamic mapping problem for a class of blending operators that includes Porter-Duff compositing operators
- …