10,637 research outputs found

    Parallel implementation of the TRANSIMS micro-simulation

    Full text link
    This paper describes the parallel implementation of the TRANSIMS traffic micro-simulation. The parallelization method is domain decomposition, which means that each CPU of the parallel computer is responsible for a different geographical area of the simulated region. We describe how information between domains is exchanged, and how the transportation network graph is partitioned. An adaptive scheme is used to optimize load balancing. We then demonstrate how computing speeds of our parallel micro-simulations can be systematically predicted once the scenario and the computer architecture are known. This makes it possible, for example, to decide if a certain study is feasible with a certain computing budget, and how to invest that budget. The main ingredients of the prediction are knowledge about the parallel implementation of the micro-simulation, knowledge about the characteristics of the partitioning of the transportation network graph, and knowledge about the interaction of these quantities with the computer system. In particular, we investigate the differences between switched and non-switched topologies, and the effects of 10 Mbit, 100 Mbit, and Gbit Ethernet. keywords: Traffic simulation, parallel computing, transportation planning, TRANSIM

    PyCARL: A PyNN Interface for Hardware-Software Co-Simulation of Spiking Neural Network

    Full text link
    We present PyCARL, a PyNN-based common Python programming interface for hardware-software co-simulation of spiking neural network (SNN). Through PyCARL, we make the following two key contributions. First, we provide an interface of PyNN to CARLsim, a computationally-efficient, GPU-accelerated and biophysically-detailed SNN simulator. PyCARL facilitates joint development of machine learning models and code sharing between CARLsim and PyNN users, promoting an integrated and larger neuromorphic community. Second, we integrate cycle-accurate models of state-of-the-art neuromorphic hardware such as TrueNorth, Loihi, and DynapSE in PyCARL, to accurately model hardware latencies that delay spikes between communicating neurons and degrade performance. PyCARL allows users to analyze and optimize the performance difference between software-only simulation and hardware-software co-simulation of their machine learning models. We show that system designers can also use PyCARL to perform design-space exploration early in the product development stage, facilitating faster time-to-deployment of neuromorphic products. We evaluate the memory usage and simulation time of PyCARL using functionality tests, synthetic SNNs, and realistic applications. Our results demonstrate that for large SNNs, PyCARL does not lead to any significant overhead compared to CARLsim. We also use PyCARL to analyze these SNNs for a state-of-the-art neuromorphic hardware and demonstrate a significant performance deviation from software-only simulations. PyCARL allows to evaluate and minimize such differences early during model development.Comment: 10 pages, 25 figures. Accepted for publication at International Joint Conference on Neural Networks (IJCNN) 202

    RepFlow: Minimizing Flow Completion Times with Replicated Flows in Data Centers

    Full text link
    Short TCP flows that are critical for many interactive applications in data centers are plagued by large flows and head-of-line blocking in switches. Hash-based load balancing schemes such as ECMP aggravate the matter and result in long-tailed flow completion times (FCT). Previous work on reducing FCT usually requires custom switch hardware and/or protocol changes. We propose RepFlow, a simple yet practically effective approach that replicates each short flow to reduce the completion times, without any change to switches or host kernels. With ECMP the original and replicated flows traverse distinct paths with different congestion levels, thereby reducing the probability of having long queueing delay. We develop a simple analytical model to demonstrate the potential improvement of RepFlow. Extensive NS-3 simulations and Mininet implementation show that RepFlow provides 50%--70% speedup in both mean and 99-th percentile FCT for all loads, and offers near-optimal FCT when used with DCTCP.Comment: To appear in IEEE INFOCOM 201

    Mobility Study for Named Data Networking in Wireless Access Networks

    Full text link
    Information centric networking (ICN) proposes to redesign the Internet by replacing its host-centric design with information-centric design. Communication among entities is established at the naming level, with the receiver side (referred to as the Consumer) acting as the driving force behind content delivery, by interacting with the network through Interest message transmissions. One of the proposed advantages for ICN is its support for mobility, by de-coupling applications from transport semantics. However, so far, little research has been conducted to understand the interaction between ICN and mobility of consuming and producing applications, in protocols purely based on information-centric principles, particularly in the case of NDN. In this paper, we present our findings on the mobility-based performance of Named Data Networking (NDN) in wireless access networks. Through simulations, we show that the current NDN architecture is not efficient in handling mobility and architectural enhancements needs to be done to fully support mobility of Consumers and Producers.Comment: to appear in IEEE ICC 201

    Scalable Interactive Volume Rendering Using Off-the-shelf Components

    Get PDF
    This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volu-metric visualization of large rectilinear scalar fields. By employingpipelined associative blending operators in a sort-last configuration a demonstration system with 8 rendering computers sustains 24 to 28 frames per second while interactively rendering large data volumes (1024x256x256 voxels, and 512x512x512 voxels). We believe interactive performance at these frame rates and data sizes is unprecedented. We also believe these results can be extended to other types of structured and unstructured grids and a variety of GL rendering techniques including surface rendering and shadow map-ping. We show how to extend our single-stage crossbar demonstration system to multi-stage networks in order to support much larger data sizes and higher image resolutions. This requires solving a dynamic mapping problem for a class of blending operators that includes Porter-Duff compositing operators
    corecore