1,859 research outputs found

    NBBS: A Non-blocking Buddy System for Multi-core Machines

    Get PDF
    Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spinlocks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators—the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, where threads performing concurrent allocations/releases do not undergo any spinlock based synchronization. Our solution allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Conflict detection relies on conventional atomic machine instructions in the Read-Modify-Write (RMW) class. Beyond improving scalability and performance, our solution can also avoid wasting clock cycles for spin-lock operations by threads that could in principle carry out their memory allocation/release in full concurrency. Thus, it is resilient to performance degradation—in face of concurrent accesses—independently of the current level of fragmentation of the handled memory blocks

    Optical interconnection networks based on microring resonators

    Get PDF
    Optical microring resonators can be integrated on a chip to perform switching operations directly in the optical domain. Thus they become a building block to create switching elements in on-chip optical interconnection networks, which promise to overcome some of the limitations of current electronic networks. However, the peculiar asymmetric power losses of microring resonators impose new constraints on the design and control of on-chip optical networks. In this work, we study the design of multistage interconnection networks optimized for a particular metric that we name the degradation index, which characterizes the asymmetric behavior of microrings. We also propose a routing control algorithm to maximize the overall throughput, considering the maximum allowed degradation index as a constrain

    Safety Analysis of Parameterised Networks with Non-Blocking Rendez-Vous

    Get PDF
    We consider networks of processes that all execute the same finite-state protocol and communicate via a rendez-vous mechanism. When a process requests a rendez-vous, another process can respond to it and they both change their control states accordingly. We focus here on a specific semantics, called non-blocking, where the process requesting a rendez-vous can change its state even if no process can respond to it. In this context, we study the parameterised coverability problem of a configuration, which consists in determining whether there is an initial number of processes and an execution allowing to reach a configuration bigger than a given one. We show that this problem is EXPSPACE-complete and can be solved in polynomial time if the protocol is partitioned into two sets of states, the states from which a process can request a rendez-vous and the ones from which it can answer one. We also prove that the problem of the existence of an execution bringing all the processes in a final state is undecidable in our context. These two problems can be solved in polynomial time with the classical rendez-vous semantics

    A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

    Get PDF
    Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators - the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Beyond improving scalability and performance it is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks

    Time4: Time for SDN

    Full text link
    With the rise of Software Defined Networks (SDN), there is growing interest in dynamic and centralized traffic engineering, where decisions about forwarding paths are taken dynamically from a network-wide perspective. Frequent path reconfiguration can significantly improve the network performance, but should be handled with care, so as to minimize disruptions that may occur during network updates. In this paper we introduce Time4, an approach that uses accurate time to coordinate network updates. Time4 is a powerful tool in softwarized environments, that can be used for various network update scenarios. Specifically, we characterize a set of update scenarios called flow swaps, for which Time4 is the optimal update approach, yielding less packet loss than existing update approaches. We define the lossless flow allocation problem, and formally show that in environments with frequent path allocation, scenarios that require simultaneous changes at multiple network devices are inevitable. We present the design, implementation, and evaluation of a Time4-enabled OpenFlow prototype. The prototype is publicly available as open source. Our work includes an extension to the OpenFlow protocol that has been adopted by the Open Networking Foundation (ONF), and is now included in OpenFlow 1.5. Our experimental results show the significant advantages of Time4 compared to other network update approaches, and demonstrate an SDN use case that is infeasible without Time4.Comment: This report is an extended version of "Software Defined Networks: It's About Time", which was accepted to IEEE INFOCOM 2016. A preliminary version of this report was published in arXiv in May, 201

    Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training

    Full text link
    Deep Neural Network (DNN) are currently of great inter- est in research and application. The training of these net- works is a compute intensive and time consuming task. To reduce training times to a bearable amount at reasonable cost we extend the popular Caffe toolbox for DNN with an efficient distributed memory communication pattern. To achieve good scalability we emphasize the overlap of computation and communication and prefer fine granu- lar synchronization patterns over global barriers. To im- plement these communication patterns we rely on the the Global address space Programming Interface version 2 (GPI-2) communication library. This interface provides a light-weight set of asynchronous one-sided communica- tion primitives supplemented by non-blocking fine gran- ular data synchronization mechanisms. Therefore, Caf- feGPI is the name of our parallel version of Caffe. First benchmarks demonstrate better scaling behavior com- pared with other extensions, e.g., the Intel TM Caffe. Even within a single symmetric multiprocessing machine with four graphics processing units, the CaffeGPI scales bet- ter than the standard Caffe toolbox. These first results demonstrate that the use of standard High Performance Computing (HPC) hardware is a valid cost saving ap- proach to train large DDNs. I/O is an other bottleneck to work with DDNs in a standard parallel HPC setting, which we will consider in more detail in a forthcoming paper

    MPWide: a light-weight library for efficient message passing over wide area networks

    Full text link
    We present MPWide, a light weight communication library which allows efficient message passing over a distributed network. MPWide has been designed to connect application running on distributed (super)computing resources, and to maximize the communication performance on wide area networks for those without administrative privileges. It can be used to provide message-passing between application, move files, and make very fast connections in client-server environments. MPWide has already been applied to enable distributed cosmological simulations across up to four supercomputers on two continents, and to couple two different bloodflow simulations to form a multiscale simulation.Comment: accepted by the Journal Of Open Research Software, 13 pages, 4 figures, 1 tabl
