3,189 research outputs found

    FPGA Based Data Read-Out System of the Belle 2 Pixel Detector

    Full text link
    The upgrades of the Belle experiment and the KEKB accelerator aim to increase the data set of the experiment by the factor 50. This will be achieved by increasing the luminosity of the accelerator which requires a significant upgrade of the detector. A new pixel detector based on DEPFET technology will be installed to handle the increased reaction rate and provide better vertex resolution. One of the features of the DEPFET detector is a long integration time of 20 {\mu}s, which increases detector occupancy up to 3 %. The detector will generate about 2 GB/s of data. An FPGA-based two-level read-out system, the Data Handling Hybrid, was developed for the Belle 2 pixel detector. The system consists of 40 read-out and 8 controller modules. All modules are built in {\mu}TCA form factor using Xilinx Virtex-6 FPGA and can utilize up to 4 GB DDR3 RAM. The system was successfully tested in the beam test at DESY in January 2014. The functionality and the architecture of the Belle 2 Data Handling Hybrid system as well as the performance of the system during the beam test are presented in the paper.Comment: Transactions on Nuclear Science, Proceedings of the 19th Real Time Conference, Preprin

    A load-sharing architecture for high performance optimistic simulations on multi-core machines

    Get PDF
    In Parallel Discrete Event Simulation (PDES), the simulation model is partitioned into a set of distinct Logical Processes (LPs) which are allowed to concurrently execute simulation events. In this work we present an innovative approach to load-sharing on multi-core/multiprocessor machines, targeted at the optimistic PDES paradigm, where LPs are speculatively allowed to process simulation events with no preventive verification of causal consistency, and actual consistency violations (if any) are recovered via rollback techniques. In our approach, each simulation kernel instance, in charge of hosting and executing a specific set of LPs, runs a set of worker threads, which can be dynamically activated/deactivated on the basis of a distributed algorithm. The latter relies in turn on an analytical model that provides indications on how to reassign processor/core usage across the kernels in order to handle the simulation workload as efficiently as possible. We also present a real implementation of our load-sharing architecture within the ROme OpTimistic Simulator (ROOT-Sim), namely an open-source C-based simulation platform implemented according to the PDES paradigm and the optimistic synchronization approach. Experimental results for an assessment of the validity of our proposal are presented as well

    Asynchronous Distributed Semi-Stochastic Gradient Optimization

    Full text link
    With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However, existing algorithms either suffer from slow convergence due to the inherent variance of stochastic gradients, or have a fast linear convergence rate but at the expense of poorer solution quality. In this paper, we combine their merits by proposing a fast distributed asynchronous SGD-based algorithm with variance reduction. A constant learning rate can be used, and it is also guaranteed to converge linearly to the optimal solution. Experiments on the Google Cloud Computing Platform demonstrate that the proposed algorithm outperforms state-of-the-art distributed asynchronous algorithms in terms of both wall clock time and solution quality
    • …
    corecore