408 research outputs found

    On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation

    Get PDF
    In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for today’s many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the network’s switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each output’s own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch

    Miniature mobile sensor platforms for condition monitoring of structures

    Get PDF
    In this paper, a wireless, multisensor inspection system for nondestructive evaluation (NDE) of materials is described. The sensor configuration enables two inspection modes-magnetic (flux leakage and eddy current) and noncontact ultrasound. Each is designed to function in a complementary manner, maximizing the potential for detection of both surface and internal defects. Particular emphasis is placed on the generic architecture of a novel, intelligent sensor platform, and its positioning on the structure under test. The sensor units are capable of wireless communication with a remote host computer, which controls manipulation and data interpretation. Results are presented in the form of automatic scans with different NDE sensors in a series of experiments on thin plate structures. To highlight the advantage of utilizing multiple inspection modalities, data fusion approaches are employed to combine data collected by complementary sensor systems. Fusion of data is shown to demonstrate the potential for improved inspection reliability

    pForest: In-Network Inference with Random Forests

    Full text link
    The concept of "self-driving networks" has recently emerged as a possible solution to manage the ever-growing complexity of modern network infrastructures. In a self-driving network, network devices adapt their decisions in real-time by observing network traffic and by performing in-line inference according to machine learning models. The recent advent of programmable data planes gives us a unique opportunity to implement this vision. One open question though is whether these devices are powerful enough to run such complex tasks? We answer positively by presenting pForest, a system for performing in-network inference according to supervised machine learning models on top of programmable data planes. The key challenge is to design classification models that fit the constraints of programmable data planes (e.g., no floating points, no loops, and limited memory) while providing high accuracy. pForest addresses this challenge in three phases: (i) it optimizes the features selection according to the capabilities of programmable network devices; (ii) it trains random forest models tailored for different phases of a flow; and (iii) it applies these models in real time, on a per-packet basis. We fully implemented pForest in Python (training), and in P4_16 (inference). Our evaluation shows that pForest can classify traffic at line rate for hundreds of thousands of flows, with an accuracy that is on-par with software-based solutions. We further show the practicality of pForest by deploying it on existing hardware devices (Barefoot Tofino)

    Proceedings of the 7th Junior Researcher Workshop on Real-Time Computing: JRWRTC 2013: Sophia Antipolis, France, October 16-18, 2013

    Get PDF
    • …
    corecore