23 research outputs found

    High-level asynchronous system design using the ACK framework

    Get PDF
    Journal ArticleDesigning asynchronous circuits is becoming easier as a number of design styles are making the transition from research projects to real, usable tools. However, designing asynchronous "systems" is still a difficult problem. We define asynchronous systems to be medium to large digital systems whose descriptions include both datapath and control, that may involve non-trivial interface requirements, and whose control is too large to be synthesized in one large controller. ACK is a framework for designing high performance asynchronous systems of this type. In ACK we advocate an approach that begins with procedural level descriptions of control and datapath and results in a hybrid system that mixes a variety of hardware implementation styles including burst-mode AFSMs, macromodule circuits, and programmable control. We present our views on what makes asynchronous high level system design different from lower level circuit design, motivate our ACK approach, and demonstrate using an example system design

    Application specific asynchronous microengines for efficient high-level control

    Get PDF
    technical reportDespite the growing interest in asynchronous circuits programmable asynchronous controllers based on the idea of microprogramming have not been actively pursued Since programmable control is widely used in many commercial ASICs to allow late correction of design errors to easily upgrade product families to meet the time to market and even efficient run time modications to control in adaptive systems we consider it crucial that self timed techniques support efficient programmable control This is especially true given that asynchronous (self-timed) circuits are well suited for realizing reactive and control intensive designs We offer a practical solution to programmable asynchronous control in the form of application-speciffic microprogrammed asynchronous controllers (or microengines). The features of our solution include a modular and easily extensible datapath structure support for two main styles of hand shaking (namely two phase and four phase), and many efficiency measures based on exploiting concurrency between operations and employing efficient circuit structures Our results demonstrate that the proposed microengine can yield high performance-in fact performance close to that offered by automated high level synthesis tools targeting custom hard wired burstmode machines

    Application specific asynchronous microgengines for efficient high-level control

    Get PDF
    technical reportDespite the growing interest in asynchronous circuits, programmable asynchronous controllers based on the idea of microprogramming have not been actively pursued. Since programmable control is widely used in many commercial ASICs to allow late correction of design errors, to easily upgrade product families, to meet the time to market, and even effect run-time modifications to control in adaptive systems, we consider it crucial that self-timed techniques support efficient programmable control. This is especially true given that asynchronous (self-timed) circuits are well suited for realizing reactive and control-intensive designs. We offer a practical solution to programmable asynchronous control in the form of application-specific micro-programmed asynchronous controllers (or microengines). The features of our solution include a modular and easily extensible datapath structure, support for two main styles of handshaking (namely two-phase and four-phase), and many efficiency measures based on exploiting concurrency between operations and employing efficient circuit structures. Our results demonstrate that the proposed microengine can yield high performance?in fact performance close to that offered by automated high-level synthesis tools targeting custom hard-wired burstmode machines

    Network Processors and Next Generation Networks: Design, Applications, and Perspectives

    Get PDF
    Network Processors (NPs) are hardware platforms born as appealing solutions for packet processing devices in networking applications. Nowadays, a plethora of solutions exists, with no agreement on a common architecture. Each vendor has proposed its specific solution and no official standard still exists. The common features of all proposals are a hierarchy of processors, with a general purpose processor and several units specialized for packet processing, a series of memory devices with different sizes and latencies, a low-level programmability. The target is a platform for networking applications with low time to market and high time in market, thanks to a high flexibility and a programmability simpler than that of ASICs, for example. After about ten years since the "birth" of network processors, this research activity wants to make an analytical balance of their development and usage. Many authoritative opinions suggest that NPs have been "outdated" by multicore or manycore systems, which provide general purpose environments and some specialized cores. The main reasons of these negative opinions are the hard programmability of NPs, which often requires the knowledge of private microcode, or the excessive architectural limits, such as reduced memories and minimal instruction store. Our research shows that Network Processors can be appealing for different applications in networking area, and many interesting solutions can be obtained, which present very high performance, outscoring current solutions. However, the issues of hard programming and remarkable limits exist, and they could be alleviated only by providing almost a comprehensive programming environment and a proper design in terms of processing and memory resources. More e cient solutions can be surely provided, but the experience of network processors has produced an important legacy in developing packet processing engines. In this work, we have realized many devices for networking purposes based on NP platform, in order to understand the complexity of programming, the flexibility of design, the complexity of tasks that can be implemented, the maximum depth of packet processing, the performance of such devices, the real usefulness of NPs in network devices. All these features have been accurately analyzed and will be illustrated in this thesis. Many remarkable results have been obtained, which confirm the Network Processors as appealing solutions for network devices. Moreover, the research on NPs have lead us to analyze and solve more general issues, related for instance to multiprocessor systems or to processors with no big available memory. In particular, the latter issue lead us to design many interesting data structures for set representation and membership query, which are based on randomized techniques and allow for big memory savings

    NFComms: A synchronous communication framework for the CPU-NFP heterogeneous system

    Get PDF
    This work explores the viability of using a Network Flow Processor (NFP), developed by Netronome, as a coprocessor for the construction of a CPU-NFP heterogeneous platform in the domain of general processing. When considering heterogeneous platforms involving architectures like the NFP, the communication framework provided is typically represented as virtual network interfaces and is thus not suitable for generic communication. To enable a CPU-NFP heterogeneous platform for use in the domain of general computing, a suitable generic communication framework is required. A feasibility study for a suitable communication medium between the two candidate architectures showed that a generic framework that conforms to the mechanisms dictated by Communicating Sequential Processes is achievable. The resulting NFComms framework, which facilitates inter- and intra-architecture communication through the use of synchronous message passing, supports up to 16 unidirectional channels and includes queuing mechanisms for transparently supporting concurrent streams exceeding the channel count. The framework has a minimum latency of between 15.5 μs and 18 μs per synchronous transaction and can sustain a peak throughput of up to 30 Gbit/s. The framework also supports a runtime for interacting with the Go programming language, allowing user-space processes to subscribe channels to the framework for interacting with processes executing on the NFP. The viability of utilising a heterogeneous CPU-NFP system for use in the domain of general and network computing was explored by introducing a set of problems or applications spanning general computing, and network processing. These were implemented on the heterogeneous architecture and benchmarked against equivalent CPU-only and CPU/GPU solutions. The results recorded were used to form an opinion on the viability of using an NFP for general processing. It is the author’s opinion that, beyond very specific use cases, it appears that the NFP-400 is not currently a viable solution as a coprocessor in the field of general computing. This does not mean that the proposed framework or the concept of a heterogeneous CPU-NFP system should be discarded as such a system does have acceptable use in the fields of network and stream processing. Additionally, when comparing the recorded limitations to those seen during the early stages of general purpose GPU development, it is clear that general processing on the NFP is currently in a similar state

    Fast Packet Processing on High Performance Architectures

    Get PDF
    The rapid growth of Internet and the fast emergence of new network applications have brought great challenges and complex issues in deploying high-speed and QoS guaranteed IP network. For this reason packet classication and network intrusion detection have assumed a key role in modern communication networks in order to provide Qos and security. In this thesis we describe a number of the most advanced solutions to these tasks. We introduce NetFPGA and Network Processors as reference platforms both for the design and the implementation of the solutions and algorithms described in this thesis. The rise in links capacity reduces the time available to network devices for packet processing. For this reason, we show different solutions which, either by heuristic and randomization or by smart construction of state machine, allow IP lookup, packet classification and deep packet inspection to be fast in real devices based on high speed platforms such as NetFPGA or Network Processors

    Cellule: Lightweight Execution Environment for Accelerator-based Systems

    Get PDF
    The increasing prevalence of accelerators is changing the high performance computing (HPC) landscape to one in which future platforms will consist of heterogeneous multi-core chips comprised of both general purpose and specialized cores. Coupled with this trend is increased support for virtualization, which can abstract underlying hardware to aid in dynamically managing its use by HPC applications while at the same time, provide lightweight, efficient, and specialized execution environments (SEE) for applications to maximally exploit the hardware. This paper describes the Cellule architecture which uses virtualization to create high performance, low noise SEEs for accelerators. The paper describes important properties of Cellule and illustrates its advantages with an implementation on the IBM Cell processor. With compute-intensive workloads, performance improvements of up to 60% are attained when using Cellule’s SEE vs. the current Linux-based runtime, resulting in a system architecture that is suitable for future accelerators and specialized cores irrespective of whether they are on-chip or off-chip. A key principle, coordinated resource management for accelerator and general purpose resources, is shown to extend beyond Cell, using experimental results obtained on a different accelerator platform

    HAIL: An Algorithm for the Hardware Accelerated Identification of Languages, Master\u27s Thesis, May 2006

    Get PDF
    This thesis examines in detail the Hardware-Accelerated Identification of Languages (HAIL) project. The goal of HAIL is to provide an accurate means to identify the language and encoding used in streaming content, such as documents passed over a high-speed network. HAIL has been implemented on the Field-programmable Port eXtender (FPX), an open hardware platform developed at Washington University in St. Louis. HAIL can accurately identify the primary languages and encodings used in text at rates much higher than what can be achieved by software algorithms running on microprocessors

    Modern Computational Techniques for the HMMER Sequence Analysis

    Get PDF

    Physics of Microswimmers - Single Particle Motion and Collective Behavior

    Full text link
    Locomotion and transport of microorganisms in fluids is an essential aspect of life. Search for food, orientation toward light, spreading of off-spring, and the formation of colonies are only possible due to locomotion. Swimming at the microscale occurs at low Reynolds numbers, where fluid friction and viscosity dominates over inertia. Here, evolution achieved propulsion mechanisms, which overcome and even exploit drag. Prominent propulsion mechanisms are rotating helical flagella, exploited by many bacteria, and snake-like or whip-like motion of eukaryotic flagella, utilized by sperm and algae. For artificial microswimmers, alternative concepts to convert chemical energy or heat into directed motion can be employed, which are potentially more efficient. The dynamics of microswimmers comprises many facets, which are all required to achieve locomotion. In this article, we review the physics of locomotion of biological and synthetic microswimmers, and the collective behavior of their assemblies. Starting from individual microswimmers, we describe the various propulsion mechanism of biological and synthetic systems and address the hydrodynamic aspects of swimming. This comprises synchronization and the concerted beating of flagella and cilia. In addition, the swimming behavior next to surfaces is examined. Finally, collective and cooperate phenomena of various types of isotropic and anisotropic swimmers with and without hydrodynamic interactions are discussed.Comment: 54 pages, 59 figures, review article, Reports of Progress in Physics (to appear
    corecore