179 research outputs found

    Modular Acquisition and Stimulation System for Timestamp-Driven Neuroscience Experiments

    Full text link
    Dedicated systems are fundamental for neuroscience experimental protocols that require timing determinism and synchronous stimuli generation. We developed a data acquisition and stimuli generator system for neuroscience research, optimized for recording timestamps from up to 6 spiking neurons and entirely specified in a high-level Hardware Description Language (HDL). Despite the logic complexity penalty of synthesizing from such a language, it was possible to implement our design in a low-cost small reconfigurable device. Under a modular framework, we explored two different memory arbitration schemes for our system, evaluating both their logic element usage and resilience to input activity bursts. One of them was designed with a decoupled and latency insensitive approach, allowing for easier code reuse, while the other adopted a centralized scheme, constructed specifically for our application. The usage of a high-level HDL allowed straightforward and stepwise code modifications to transform one architecture into the other. The achieved modularity is very useful for rapidly prototyping novel electronic instrumentation systems tailored to scientific research.Comment: Preprint submitted to ARC 2015. Extended: 16 pages, 10 figures. The final publication is available at link.springer.co

    FOS: A Modular FPGA Operating System for Dynamic Workloads

    Get PDF
    With FPGAs now being deployed in the cloud and at the edge, there is a need for scalable design methods which can incorporate the heterogeneity present in the hardware and software components of FPGA systems. Moreover, these FPGA systems need to be maintainable and adaptable to changing workloads while improving accessibility for the application developers. However, current FPGA systems fail to achieve modularity and support for multi-tenancy due to dependencies between system components and lack of standardised abstraction layers. To solve this, we introduce a modular FPGA operating system -- FOS, which adopts a modular FPGA development flow to allow each system component to be changed and be agnostic to the heterogeneity of EDA tool versions, hardware and software layers. Further, to dynamically maximise the utilisation transparently from the users, FOS employs resource-elastic scheduling to arbitrate the FPGA resources in both time and spatial domain for any type of accelerators. Our evaluation on different FPGA boards shows that FOS can provide performance improvements in both single-tenant and multi-tenant environments while substantially reducing the development time and, at the same time, improving flexibility

    Towards Power- and Energy-Efficient Datacenters

    Full text link
    As the Internet evolves, cloud computing is now a dominant form of computation in modern lives. Warehouse-scale computers (WSCs), or datacenters, comprising the foundation of this cloud-centric web have been able to deliver satisfactory performance to both the Internet companies and the customers. With the increased focus and popularity of the cloud, however, datacenter loads rise and grow rapidly, and Internet companies are in need of boosted computing capacity to serve such demand. Unfortunately, power and energy are often the major limiting factors prohibiting datacenter growth: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. This dissertation aims to investigate the issues of power and energy usage in a modern datacenter environment. We identify the source of power and energy inefficiency at three levels in a modern datacenter environment and provides insights and solutions to address each of these problems, aiming to prepare datacenters for critical future growth. We start at the datacenter-level and find that the peak provisioning and improper service placement in multi-level power delivery infrastructures fragment the power budget inside production datacenters, degrading the compute capacity the existing infrastructure can support. We find that the heterogeneity among datacenter workloads is key to address this issue and design systematic methods to reduce the fragmentation and improve the utilization of the power budget. This dissertation then narrow the focus to examine the energy usage of individual servers running cloud workloads. Especially, we examine the power management mechanisms employed in these servers and find that the coarse time granularity of these mechanisms is one critical factor that leads to excessive energy consumption. We propose an intelligent and low overhead solution on top of the emerging finer granularity voltage/frequency boosting circuit to effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost, improving energy efficiency without sacrificing the quality of services. The final focus of this dissertation takes a further step to investigate how using a fundamentally more efficient computing substrate, field programmable gate arrays (FPGAs), benefit datacenter power and energy efficiency. Different from other types of hardware accelerations, FPGAs can be reconfigured on-the-fly to provide fine-grain control over hardware resource allocation and presents a unique set of challenges for optimal workload scheduling and resource allocation. We aim to design a set coordinated algorithms to manage these two key factors simultaneously and fully explore the benefit of deploying FPGAs in the highly varying cloud environment.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144043/1/hsuch_1.pd

    Improving low latency applications for reconfigurable devices

    Get PDF
    This thesis seeks to improve low latency application performance via architectural improvements in reconfigurable devices. This is achieved by improving resource utilisation and access, and by exploiting the different environments within which reconfigurable devices are deployed. Our first contribution leverages devices deployed at the network level to enable the low latency processing of financial market data feeds. Financial exchanges transmit messages via two identical data feeds to reduce the chance of message loss. We present an approach to arbitrate these redundant feeds at the network level using a Field-Programmable Gate Array (FPGA). With support for any messaging protocol, we evaluate our design using the NASDAQ TotalView-ITCH, OPRA, and ARCA data feed protocols, and provide two simultaneous outputs: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. Our second contribution is a new ring-based architecture for low latency, parallel access to FPGA memory. Traditional FPGA memory is formed by grouping block memories (BRAMs) together and accessing them as a single device. Our architecture accesses these BRAMs independently and in parallel. Targeting memory-based computing, which stores pre-computed function results in memory, we benefit low latency applications that rely on: highly-complex functions; iterative computation; or many parallel accesses to a shared resource. We assess square root, power, trigonometric, and hyperbolic functions within the FPGA, and provide a tool to convert Python functions to our new architecture. Our third contribution extends the ring-based architecture to support any FPGA processing element. We unify E heterogeneous processing elements within compute pools, with each element implementing the same function, and the pool serving D parallel function calls. Our implementation-agnostic approach supports processing elements with different latencies, implementations, and pipeline lengths, as well as non-deterministic latencies. Compute pools evenly balance access to processing elements across the entire application, and are evaluated by implementing eight different neural network activation functions within an FPGA.Open Acces

    Efficient runtime placement management for high performance and reliability in COTS FPGAs

    Get PDF
    Designing high-performance, fault-tolerant multisensory electronic systems for hostile environments such as nuclear plants and outer space within the constraints of cost, power and flexibility is challenging. Issues such as ionizing radiation, extreme temperature and ageing can lead to faults in the electronics of these systems. In addition, the remote nature of these environments demands a level of flexibility and autonomy in their operations. The standard practice of using specially hardened electronic devices for such systems is not only very expensive but also has limited flexibility. This thesis proposes novel techniques that promote the use of Commercial Off-The- Shelf (COTS) reconfigurable devices to meet the challenges of high-performance systems for hostile environments. Reconfigurable hardware such as Field Programmable Gate Arrays (FPGA) have a unique combination of flexibility and high performance. The flexibility offered through features such as dynamic partial reconfiguration (DPR) can be harnessed not only to achieve cost-effective designs as a smaller area can be used to execute multiple tasks, but also to improve the reliability of a system as a circuit on one portion of the device can be physically relocated to another portion in the case of fault occurrence. However, to harness these potentials for high performance and reliability in a cost-effective manner, novel runtime management tools are required. Most runtime support tools for reconfigurable devices are based on ideal models which do not adequately consider the limitations of realistic FPGAs, in particular modern FPGAs which are increasingly heterogeneous. Specifically, these tools lack efficient mechanisms for ensuring a high utilization of FPGA resources, including the FPGA area and the configuration port and clocking resources, in a reliable manner. To ensure high utilization of reconfigurable device area, placement management is a key aspect of these tools. This thesis presents novel techniques for the management of hardware task placement on COTS reconfigurable devices for high performance and reliability. To this end, it addresses design-time issues that affect efficient hardware task placement, with a focus on reliability. It also presents techniques to maximize the utilization of the FPGA area in runtime, including techniques to minimize fragmentation. Fragmentation leads to the creation of unusable areas due to dynamic placement of tasks and the heterogeneity of the resources on the chip. Moreover, this thesis also presents an efficient task reuse mechanism to improve the availability of the internal configuration infrastructure of the FPGA for critical responsibilities like error mitigation. The task reuse scheme, unlike previous approaches, also improves the utilization of the chip area by offering defragmentation. Task relocation, which involves changing the physical location of circuits is a technique for error mitigation and high performance. Hence, this thesis also provides a functionality-based relocation mechanism for improving the number of locations to which tasks can be relocated on heterogeneous FPGAs. As tasks are relocated, clock networks need to be routed to them. As such, a reliability-aware technique of clock network routing to tasks after placement is also proposed. Finally, this thesis offers a prototype implementation and characterization of a placement management system (PMS) which is an integration of the aforementioned techniques. The performance of most of the proposed techniques are tested using data processing tasks of a NASA JPL spectrometer application. The results show that the proposed techniques have potentials to improve the reliability and performance of applications in hostile environment compared to state-of-the-art techniques. The task optimization technique presented leads to better capacity to circumvent permanent faults on COTS FPGAs compared to state-of-the-art approaches (48.6% more errors were circumvented for the JPL spectrometer application). The proposed task reuse scheme leads to approximately 29% saving in the amount of configuration time. This frees up the internal configuration interface for more error mitigation operations. In addition, the proposed PMS has a worst-case latency of less than 50% of that of state-of- the-art runtime placement systems, while maintaining the same level of placement quality and resource overhead

    Solutions and application areas of flip-flop metastability

    Get PDF
    PhD ThesisThe state space of every continuous multi-stable system is bound to contain one or more metastable regions where the net attraction to the stable states can be infinitely-small. Flip-flops are among these systems and can take an unbounded amount of time to decide which logic state to settle to once they become metastable. This problematic behavior is often prevented by placing the setup and hold time conditions on the flip-flop’s input. However, in applications such as clock domain crossing where these constraints cannot be placed flip-flops can become metastable and induce catastrophic failures. These events are fundamentally impossible to prevent but their probability can be significantly reduced by employing synchronizer circuits. The latter grant flip-flops longer decision time at the expense of introducing latency in processing the synchronized input. This thesis presents a collection of research work involving the phenomenon of flip-flop metastability in digital systems. The main contributions include three novel solutions for the problem of synchronization. Two of these solutions are speculative methods that rely on duplicate state machines to pre-compute data-dependent states ahead of the completion of synchronization. Speculation is a core theme of this thesis and is investigated in terms of its functional correctness, cost efficacy and fitness for being automated by electronic design automation tools. It is shown that speculation can outperform conventional synchronization solutions in practical terms and is a viable option for future technologies. The third solution attempts to address the problem of synchronization in the more-specific context of variable supply voltages. Finally, the thesis also identifies a novel application of metastability as a means of quantifying intra-chip physical parameters. A digital sensor is proposed based on the sensitivity of metastable flip-flops to changes in their environmental parameters and is shown to have better precision while being more compact than conventional digital sensors

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Low-Impact Profiling of Streaming, Heterogeneous Applications

    Get PDF
    Computer engineers are continually faced with the task of translating improvements in fabrication process technology: i.e., Moore\u27s Law) into architectures that allow computer scientists to accelerate application performance. As feature-size continues to shrink, architects of commodity processors are designing increasingly more cores on a chip. While additional cores can operate independently with some tasks: e.g. the OS and user tasks), many applications see little to no improvement from adding more processor cores alone. For many applications, heterogeneous systems offer a path toward higher performance. Significant performance and power gains have been realized by combining specialized processors: e.g., Field-Programmable Gate Arrays, Graphics Processing Units) with general purpose multi-core processors. Heterogeneous applications need to be programmed differently than traditional software. One approach, stream processing, fits these systems particularly well because of the segmented memories and explicit expression of parallelism. Unfortunately, debugging and performance tools that support streaming, heterogeneous applications do not exist. This dissertation presents TimeTrial, a performance measurement system that enables performance optimization of streaming applications by profiling the application deployed on a heterogeneous system. TimeTrial performs low-impact measurements by dedicating computing resources to monitoring and by aggressively compressing performance traces into statistical summaries guided by user specification of the performance queries of interest
    • …
    corecore