4,080 research outputs found

    A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue

    Get PDF
    We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable. We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms

    FPGA Based Data Read-Out System of the Belle 2 Pixel Detector

    Full text link
    The upgrades of the Belle experiment and the KEKB accelerator aim to increase the data set of the experiment by the factor 50. This will be achieved by increasing the luminosity of the accelerator which requires a significant upgrade of the detector. A new pixel detector based on DEPFET technology will be installed to handle the increased reaction rate and provide better vertex resolution. One of the features of the DEPFET detector is a long integration time of 20 {\mu}s, which increases detector occupancy up to 3 %. The detector will generate about 2 GB/s of data. An FPGA-based two-level read-out system, the Data Handling Hybrid, was developed for the Belle 2 pixel detector. The system consists of 40 read-out and 8 controller modules. All modules are built in {\mu}TCA form factor using Xilinx Virtex-6 FPGA and can utilize up to 4 GB DDR3 RAM. The system was successfully tested in the beam test at DESY in January 2014. The functionality and the architecture of the Belle 2 Data Handling Hybrid system as well as the performance of the system during the beam test are presented in the paper.Comment: Transactions on Nuclear Science, Proceedings of the 19th Real Time Conference, Preprin

    Design and Performance of the Data Acquisition System for the NA61/SHINE Experiment at CERN

    Get PDF
    This paper describes the hardware, firmware and software systems used in data acquisition for the NA61/SHINE experiment at the CERN SPS accelerator. Special emphasis is given to the design parameters of the readout electronics for the 40m^3 volume Time Projection Chamber detectors, as these give the largest contribution to event data among all the subdetectors: events consisting of 8bit ADC values from 256 timeslices of 200k electronic channels are to be read out with ~100Hz rate. The data acquisition system is organized in "push-data mode", i.e. local systems transmit data asynchronously. Techniques of solving subevent synchronization are also discussed.Comment: 14 pages, 13 figure

    A 16-channel Digital TDC Chip with internal buffering and selective readout for the DIRC Cherenkov counter of the BABAR experiment

    Full text link
    A 16-channel digital TDC chip has been built for the DIRC Cherenkov counter of the BaBar experiment at the SLAC B-factory (Stanford, USA). The binning is 0.5 ns, the conversion time 32 ns and the full-scale 32 mus. The data driven architecture integrates channel buffering and selective readout of data falling within a programmable time window. The time measuring scale is constantly locked to the phase of the (external) clock. The linearity is better than 80 ps rms. The dead time loss is less than 0.1% for incoherent random input at a rate of 100 khz on each channel. At such a rate the power dissipation is less than 100 mw. The die size is 36 mm2.Comment: Latex, 18 pages, 13 figures (14 .eps files), submitted to NIM

    Boosting the Performance of PC-based Software Routers with FPGA-enhanced Network Interface Cards

    Get PDF
    The research community is devoting increasing attention to software routers based on off-the-shelf hardware and open-source operating systems running on the personalcomputer (PC) architecture. Today's high-end PCs are equipped with peripheral component interconnect (PCI) shared buses enabling them to easily fit into the multi-gigabit-per-second routing segment, for a price much lower than that of commercial routers. However, commercially-available PC network interface cards (NICs) lack programmability, and require not only packets to cross the PCI bus twice, but also to be processed in software by the operating system, strongly reducing the achievable forwarding rate. It is therefore interesting to explore the performance of customizable NICs based on field-programmable gate array (FPGA) logic devices we developed and assess how well they can overcome the limitations of today's commercially-available NIC

    A unified approach to the performance analysis of caching systems

    Get PDF
    We propose a unified methodology to analyse the performance of caches (both isolated and interconnected), by extending and generalizing a decoupling technique originally known as Che's approximation, which provides very accurate results at low computational cost. We consider several caching policies, taking into account the effects of temporal locality. In the case of interconnected caches, our approach allows us to do better than the Poisson approximation commonly adopted in prior work. Our results, validated against simulations and trace-driven experiments, provide interesting insights into the performance of caching systems.Comment: in ACM TOMPECS 20016. Preliminary version published at IEEE Infocom 201

    Life-cycle cost analysis task summary

    Get PDF
    The DSN life cycle cost (LCC) analysis methodology was completed. The LCC analysis methodology goals and objectives are summarized, as well as the issues covered by the methodology, its expected use, and its long range implications
    • 

    corecore