2,320 research outputs found

    Efficient Implementations of Molecular Dynamics Simulations for Lennard-Jones Systems

    Full text link
    Efficient implementations of the classical molecular dynamics (MD) method for Lennard-Jones particle systems are considered. Not only general algorithms but also techniques that are efficient for some specific CPU architectures are also explained. A simple spatial-decomposition-based strategy is adopted for parallelization. By utilizing the developed code, benchmark simulations are performed on a HITACHI SR16000/J2 system consisting of IBM POWER6 processors which are 4.7 GHz at the National Institute for Fusion Science (NIFS) and an SGI Altix ICE 8400EX system consisting of Intel Xeon processors which are 2.93 GHz at the Institute for Solid State Physics (ISSP), the University of Tokyo. The parallelization efficiency of the largest run, consisting of 4.1 billion particles with 8192 MPI processes, is about 73% relative to that of the smallest run with 128 MPI processes at NIFS, and it is about 66% relative to that of the smallest run with 4 MPI processes at ISSP. The factors causing the parallel overhead are investigated. It is found that fluctuations of the execution time of each process degrade the parallel efficiency. These fluctuations may be due to the interference of the operating system, which is known as OS Jitter.Comment: 33 pages, 19 figures, add references and figures are revise

    Event Stream Processing with Multiple Threads

    Full text link
    Current runtime verification tools seldom make use of multi-threading to speed up the evaluation of a property on a large event trace. In this paper, we present an extension to the BeepBeep 3 event stream engine that allows the use of multiple threads during the evaluation of a query. Various parallelization strategies are presented and described on simple examples. The implementation of these strategies is then evaluated empirically on a sample of problems. Compared to the previous, single-threaded version of the BeepBeep engine, the allocation of just a few threads to specific portions of a query provides dramatic improvement in terms of running time

    On-Chip Transparent Wire Pipelining (invited paper)

    Get PDF
    Wire pipelining has been proposed as a viable mean to break the discrepancy between decreasing gate delays and increasing wire delays in deep-submicron technologies. Far from being a straightforwardly applicable technique, this methodology requires a number of design modifications in order to insert it seamlessly in the current design flow. In this paper we briefly survey the methods presented by other researchers in the field and then we thoroughly analyze the solutions we recently proposed, ranging from system-level wire pipelining to physical design aspects

    Modular Hardware Design with Timeline Types

    Full text link
    Modular design is a key challenge for enabling large-scale reuse of hardware modules. Unlike software, however, hardware designs correspond to physical circuits and inherit constraints from them. Timing constraints -- which cycle a signal arrives, when an input is read -- and structural constraints -- how often a multiplier accepts new inputs -- are fundamental to hardware interfaces. Existing hardware design languages do not provide a way to encode these constraints; a user must read documentation, build scripts, or in the worst case, a module's implementation to understand how to use it. We present Filament, a language for modular hardware design that supports the specification and enforcement of timing and structural constraints for statically scheduled pipelines. Filament uses timeline types, which describe the intervals of clock-cycle time when a given signal is available or required. Filament enables safe composition of hardware modules, ensures that the resulting designs are correctly pipelined, and predictably lowers them to efficient hardware.Comment: Extended version of PLDI '23 pape

    Review of high-speed phase accumulator for direct digital frequency synthesizer

    Get PDF
    A review of high-speed pipelined phase accumulator (PA) is proposed in this paper. The detail explanation of ideas, methods and techniques used in previous researches to improve the PA throughput designs were surveyed. The Brent–Kung (BK) adder was modified in this paper to be applied in pipelined PA architecture. A comparison of different adder circuits, includes a modified BK, ripple carry adder (RCA), Kogge-Stone adder (KS) and other prefix adders were applied to architect the PA based on Pipeline technique. The presented pipelined PA design circuit with multiple frequency control word (FCW) and different adders were coded Verilog hardware description language (HDL) code, compiled and verified with field programmable gate array (FPGA) kit platform. The comparison result shows that the modified BK adder has fast performances. The shifted clocking technique is utilized in the proposed pipelined PA circuit to reduce the unwanted repetitive D-flip flop (DFF) registers (coming from the pipeline technique), while preserving the high speed
    • 

    corecore