11 research outputs found

    Nano-Sim: A Step Wise Equivalent Conductance based Statistical Simulator for Nanotechnology Circuit Design

    Full text link
    New nanotechnology based devices are replacing CMOS devices to overcome CMOS technology's scaling limitations. However, many such devices exhibit non-monotonic I-V characteristics and uncertain properties which lead to the negative differential resistance (NDR) problem and the chaotic performance. This paper proposes a new circuit simulation approach that can effectively simulate nanotechnology devices with uncertain input sources and negative differential resistance (NDR) problem. The experimental results show a 20-30 times speedup comparing with existing simulators.Comment: Submitted on behalf of EDAA (http://www.edaa.com/

    GPU acceleration of a production molecular docking code

    Full text link
    Abstract: Modeling the interactions of biological molecules, or docking, is critical to both understand-ing basic life processes and to designing new drugs. Here we describe the GPU-based acceleration of a recently developed, complex, production docking code. We show how the various functions can be mapped to the GPU and present numerous optimizations. We find which parts of the problem domain are best suited to the different correlation methods. The GPU-accelerated system achieves a speedup of at least 16x for all likely problems sizes. This makes it competitive with FPGA-based systems for small molecule docking, and superior for protein-protein docking.

    RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design

    Full text link
    Software-defined networking (SDN) and software-defined flash (SDF) have been serving as the backbone of modern data centers. They are managed separately to handle I/O requests. At first glance, this is a reasonable design by following the rack-scale hierarchical design principles. However, it suffers from suboptimal end-to-end performance, due to the lack of coordination between SDN and SDF. In this paper, we co-design the SDN and SDF stack by redefining the functions of their control plane and data plane, and splitting up them within a new architecture named RackBlox. RackBlox decouples the storage management functions of flash-based solid-state drives (SSDs), and allow the SDN to track and manage the states of SSDs in a rack. Therefore, we can enable the state sharing between SDN and SDF, and facilitate global storage resource management. RackBlox has three major components: (1) coordinated I/O scheduling, in which it dynamically adjusts the I/O scheduling in the storage stack with the measured and predicted network latency, such that it can coordinate the effort of I/O scheduling across the network and storage stack for achieving predictable end-to-end performance; (2) coordinated garbage collection (GC), in which it will coordinate the GC activities across the SSDs in a rack to minimize their impact on incoming I/O requests; (3) rack-scale wear leveling, in which it enables global wear leveling among SSDs in a rack by periodically swapping data, for achieving improved device lifetime for the entire rack. We implement RackBlox using programmable SSDs and switch. Our experiments demonstrate that RackBlox can reduce the tail latency of I/O requests by up to 5.8x over state-of-the-art rack-scale storage systems.Comment: 14 pages. Published in published in ACM SIGOPS 29th Symposium on Operating Systems Principles (SOSP'23

    Single pass, blast-like, approximate string matching on fpgas

    No full text
    Abstract: Approximate string matching is fundamental to bioinformatics, and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic-programming- (DP) based methods. Our primary contributions are two new algorithms for emulating the seeding and extension phases of BLAST. These operate in a single pass through a database at streaming rate (110 Maa/sec on a VP70 for query sizes up to 600 and 170 Maa/sec on a Virtex4 for query sizes up to 1024), and with no preprocessing other than loading the query string. Further, they use very high sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations.

    Simulation and Design of Nanocircuits With Resonant Tunneling Devices

    No full text

    Computing Models for FPGA-Based Accelerators

    No full text

    Achieving High Performance with FPGA-Based Computing

    No full text
    corecore