49 research outputs found

    Improving low latency applications for reconfigurable devices

    Get PDF
    This thesis seeks to improve low latency application performance via architectural improvements in reconfigurable devices. This is achieved by improving resource utilisation and access, and by exploiting the different environments within which reconfigurable devices are deployed. Our first contribution leverages devices deployed at the network level to enable the low latency processing of financial market data feeds. Financial exchanges transmit messages via two identical data feeds to reduce the chance of message loss. We present an approach to arbitrate these redundant feeds at the network level using a Field-Programmable Gate Array (FPGA). With support for any messaging protocol, we evaluate our design using the NASDAQ TotalView-ITCH, OPRA, and ARCA data feed protocols, and provide two simultaneous outputs: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. Our second contribution is a new ring-based architecture for low latency, parallel access to FPGA memory. Traditional FPGA memory is formed by grouping block memories (BRAMs) together and accessing them as a single device. Our architecture accesses these BRAMs independently and in parallel. Targeting memory-based computing, which stores pre-computed function results in memory, we benefit low latency applications that rely on: highly-complex functions; iterative computation; or many parallel accesses to a shared resource. We assess square root, power, trigonometric, and hyperbolic functions within the FPGA, and provide a tool to convert Python functions to our new architecture. Our third contribution extends the ring-based architecture to support any FPGA processing element. We unify E heterogeneous processing elements within compute pools, with each element implementing the same function, and the pool serving D parallel function calls. Our implementation-agnostic approach supports processing elements with different latencies, implementations, and pipeline lengths, as well as non-deterministic latencies. Compute pools evenly balance access to processing elements across the entire application, and are evaluated by implementing eight different neural network activation functions within an FPGA.Open Acces

    The Customizable Virtual FPGA: Generation, System Integration and Configuration of Application-Specific Heterogeneous FPGA Architectures

    Get PDF
    In den vergangenen drei Jahrzehnten wurde die Entwicklung von Field Programmable Gate Arrays (FPGAs) stark von Moore’s Gesetz, Prozesstechnologie (Skalierung) und kommerziellen Märkten beeinflusst. State-of-the-Art FPGAs bewegen sich einerseits dem Allzweck näher, aber andererseits, da FPGAs immer mehr traditionelle Domänen der Anwendungsspezifischen integrierten Schaltungen (ASICs) ersetzt haben, steigen die Effizienzerwartungen. Mit dem Ende der Dennard-Skalierung können Effizienzsteigerungen nicht mehr auf Technologie-Skalierung allein zurückgreifen. Diese Facetten und Trends in Richtung rekonfigurierbarer System-on-Chips (SoCs) und neuen Low-Power-Anwendungen wie Cyber Physical Systems und Internet of Things erfordern eine bessere Anpassung der Ziel-FPGAs. Neben den Trends für den Mainstream-Einsatz von FPGAs in Produkten des täglichen Bedarfs und Services wird es vor allem bei den jüngsten Entwicklungen, FPGAs in Rechenzentren und Cloud-Services einzusetzen, notwendig sein, eine sofortige Portabilität von Applikationen über aktuelle und zukünftige FPGA-Geräte hinweg zu gewährleisten. In diesem Zusammenhang kann die Hardware-Virtualisierung ein nahtloses Mittel für Plattformunabhängigkeit und Portabilität sein. Ehrlich gesagt stehen die Zwecke der Anpassung und der Virtualisierung eigentlich in einem Konfliktfeld, da die Anpassung für die Effizienzsteigerung vorgesehen ist, während jedoch die Virtualisierung zusätzlichen Flächenaufwand hinzufügt. Die Virtualisierung profitiert aber nicht nur von der Anpassung, sondern fügt auch mehr Flexibilität hinzu, da die Architektur jederzeit verändert werden kann. Diese Besonderheit kann für adaptive Systeme ausgenutzt werden. Sowohl die Anpassung als auch die Virtualisierung von FPGA-Architekturen wurden in der Industrie bisher kaum adressiert. Trotz einiger existierenden akademischen Werke können diese Techniken noch als unerforscht betrachtet werden und sind aufstrebende Forschungsgebiete. Das Hauptziel dieser Arbeit ist die Generierung von FPGA-Architekturen, die auf eine effiziente Anpassung an die Applikation zugeschnitten sind. Im Gegensatz zum üblichen Ansatz mit kommerziellen FPGAs, bei denen die FPGA-Architektur als gegeben betrachtet wird und die Applikation auf die vorhandenen Ressourcen abgebildet wird, folgt diese Arbeit einem neuen Paradigma, in dem die Applikation oder Applikationsklasse fest steht und die Zielarchitektur auf die effiziente Anpassung an die Applikation zugeschnitten ist. Dies resultiert in angepassten anwendungsspezifischen FPGAs. Die drei Säulen dieser Arbeit sind die Aspekte der Virtualisierung, der Anpassung und des Frameworks. Das zentrale Element ist eine weitgehend parametrierbare virtuelle FPGA-Architektur, die V-FPGA genannt wird, wobei sie als primäres Ziel auf jeden kommerziellen FPGA abgebildet werden kann, während Anwendungen auf der virtuellen Schicht ausgeführt werden. Dies sorgt für Portabilität und Migration auch auf Bitstream-Ebene, da die Spezifikation der virtuellen Schicht bestehen bleibt, während die physische Plattform ausgetauscht werden kann. Darüber hinaus wird diese Technik genutzt, um eine dynamische und partielle Rekonfiguration auf Plattformen zu ermöglichen, die sie nicht nativ unterstützen. Neben der Virtualisierung soll die V-FPGA-Architektur auch als eingebettetes FPGA in ein ASIC integriert werden, das effiziente und dennoch flexible System-on-Chip-Lösungen bietet. Daher werden Zieltechnologie-Abbildungs-Methoden sowohl für Virtualisierung als auch für die physikalische Umsetzung adressiert und ein Beispiel für die physikalische Umsetzung in einem 45 nm Standardzellen Ansatz aufgezeigt. Die hochflexible V-FPGA-Architektur kann mit mehr als 20 Parametern angepasst werden, darunter LUT-Grösse, Clustering, 3D-Stacking, Routing-Struktur und vieles mehr. Die Auswirkungen der Parameter auf Fläche und Leistung der Architektur werden untersucht und eine umfangreiche Analyse von über 1400 Benchmarkläufen zeigt eine hohe Parameterempfindlichkeit bei Abweichungen bis zu ±95, 9% in der Fläche und ±78, 1% in der Leistung, was die hohe Bedeutung von Anpassung für Effizienz aufzeigt. Um die Parameter systematisch an die Bedürfnisse der Applikation anzupassen, wird eine parametrische Entwurfsraum-Explorationsmethode auf der Basis geeigneter Flächen- und Zeitmodellen vorgeschlagen. Eine Herausforderung von angepassten Architekturen ist der Entwurfsaufwand und die Notwendigkeit für angepasste Werkzeuge. Daher umfasst diese Arbeit ein Framework für die Architekturgenerierung, die Entwurfsraumexploration, die Anwendungsabbildung und die Evaluation. Vor allem ist der V-FPGA in einem vollständig synthetisierbaren generischen Very High Speed Integrated Circuit Hardware Description Language (VHDL) Code konzipiert, der sehr flexibel ist und die Notwendigkeit für externe Codegeneratoren eliminiert. Systementwickler können von verschiedenen Arten von generischen SoC-Architekturvorlagen profitieren, um die Entwicklungszeit zu reduzieren. Alle notwendigen Konstruktionsschritte für die Applikationsentwicklung und -abbildung auf den V-FPGA werden durch einen Tool-Flow für Entwurfsautomatisierung unterstützt, der eine Sammlung von vorhandenen kommerziellen und akademischen Werkzeugen ausnutzt, die durch geeignete Modelle angepasst und durch ein neues Werkzeug namens V-FPGA-Explorer ergänzt werden. Dieses neue Tool fungiert nicht nur als Back-End-Tool für die Anwendungsabbildung auf dem V-FPGA sondern ist auch ein grafischer Konfigurations- und Layout-Editor, ein Bitstream-Generator, ein Architekturdatei-Generator für die Place & Route Tools, ein Script-Generator und ein Testbenchgenerator. Eine Besonderheit ist die Unterstützung der Just-in-Time-Kompilierung mit schnellen Algorithmen für die In-System Anwendungsabbildung. Die Arbeit schliesst mit einigen Anwendungsfällen aus den Bereichen industrielle Prozessautomatisierung, medizinische Bildgebung, adaptive Systeme und Lehre ab, in denen der V-FPGA eingesetzt wird

    On microelectronic self-learning cognitive chip systems

    Get PDF
    After a brief review of machine learning techniques and applications, this Ph.D. thesis examines several approaches for implementing machine learning architectures and algorithms into hardware within our laboratory. From this interdisciplinary background support, we have motivations for novel approaches that we intend to follow as an objective of innovative hardware implementations of dynamically self-reconfigurable logic for enhanced self-adaptive, self-(re)organizing and eventually self-assembling machine learning systems, while developing this new particular area of research. And after reviewing some relevant background of robotic control methods followed by most recent advanced cognitive controllers, this Ph.D. thesis suggests that amongst many well-known ways of designing operational technologies, the design methodologies of those leading-edge high-tech devices such as cognitive chips that may well lead to intelligent machines exhibiting conscious phenomena should crucially be restricted to extremely well defined constraints. Roboticists also need those as specifications to help decide upfront on otherwise infinitely free hardware/software design details. In addition and most importantly, we propose these specifications as methodological guidelines tightly related to ethics and the nowadays well-identified workings of the human body and of its psyche

    Materializing interaction

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, February 2013.Cataloged from PDF version of thesis. "September 2012."Includes bibliographical references (p. 141-148).At the boundary between people, objects and spaces, we encounter a broad range of surfaces. Their properties perform functional roles such as permeability, comfort or illumination, while conveying information such as an object's affordances, composition, or history of use. However, today surfaces are static and can neither adapt to our changing needs, nor communicate dynamic information and sense user input. As technology advances and we progress towards a world imbued with programmable materials, how will designers create physical surfaces that are adaptive and can take full advantage of our sensory apparatus? This dissertation looks at this question through the lens of a three-tier methodology consisting of the development of programmable composites; their application in design and architecture; and contextualization through a broader material and surface taxonomy. The focus is placed primarily on how materials and their aggregate surface properties can be used to engage our senses. A series of design probes and four final implementations are presented, each addressing specific programmable material and surface properties. Surflex, Sprout 1/O, and Shutters are continuous surfaces which can change shape to modify their topology, texture and permeability, and Six-Forty by Four-Eighty is a light-emitting display surface composed of autonomous and reconfigurable physical pixels. The technical and conceptual objectives of these designs are evaluated through exhibitions in a variety of public spaces, such as museums, galleries, fairs, as well as art and design festivals. This dissertation seeks to provide contributions on multiple levels, including: the development of techniques for the creation and control of programmable surfaces; the definition of a vocabulary and taxonomy to describe and compare previous work in this area; and finally, uncovering design principles for the underlying development of future programmable surface aesthetics.by Marcelo Coelho.Ph.D

    Online learning on the programmable dataplane

    Get PDF
    This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observations—and argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the network— runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network. To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasible—to port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms

    Assessment of monthly rain fade in the equatorial region at C & KU-band using measat-3 satellite links

    Get PDF
    C & Ku-band satellite communication links are the most commonly used for equatorial satellite communication links. Severe rainfall rate in equatorial regions can cause a large rain attenuation in real compared to the prediction. ITU-R P. 618 standards are commonly used to predict satellite rain fade in designing satellite communication network. However, the prediction of ITU-R is still found to be inaccurate hence hinder a reliable operational satellite communication link in equatorial region. This paper aims to provide an accurate insight by assessment of the monthly C & Ku-band rain fade performance by collecting data from commercial earth stations using C band and Ku-band antenna with 11 m and 13 m diameter respectively. The antennas measure the C & Ku-band beacon signal from MEASAT-3 under equatorial rain conditions. The data is collected for one year in 2015. The monthly cumulative distribution function is developed based on the 1-year data. RMSE analysis is made by comparing the monthly measured data of C-band and Ku-band to the ITU-R predictions developed based on ITU-R’s P.618, P.837, P.838 and P.839 standards. The findings show that Ku-band produces an average of 25 RMSE value while the C-band rain attenuation produces an average of 2 RMSE value. Therefore, the ITU-R model still under predicts the rain attenuation in the equatorial region and this call for revisit of the fundamental quantity in determining the rain fade for rain attenuation to be re-evaluated

    WEHST: Wearable Engine for Human-Mediated Telepresence

    Get PDF
    This dissertation reports on the industrial design of a wearable computational device created to enable better emergency medical intervention for situations where electronic remote assistance is necessary. The design created for this doctoral project, which assists practices by paramedics with mandates for search-and-rescue (SAR) in hazardous environments, contributes to the field of human-mediated teleparamedicine (HMTPM). Ethnographic and industrial design aspects of this research considered the intricate relationships at play in search-and-rescue operations, which lead to the design of the system created for this project known as WEHST: Wearable Engine for Human-Mediated Telepresence. Three case studies of different teams were carried out, each focusing on making improvements to the practices of teams of paramedics and search-and-rescue technicians who use combinations of ambulance, airplane, and helicopter transport in specific chemical, biological, radioactive, nuclear and explosive (CBRNE) scenarios. The three paramedicine groups included are the Canadian Air Force 442 Rescue Squadron, Nelson Search and Rescue, and the British Columbia Ambulance Service Infant Transport Team. Data was gathered over a seven-year period through a variety of methods including observation, interviews, examination of documents, and industrial design. The data collected included physiological, social, technical, and ecological information about the rescuers. Actor-network theory guided the research design, data analysis, and design synthesis. All of this leads to the creation of the WEHST system. As identified, the WEHST design created in this dissertation project addresses the difficulty case-study participants found in using their radios in hazardous settings. As the research identified, a means of controlling these radios without depending on hands, voice, or speech would greatly improve communication, as would wearing sensors and other computing resources better linking operators, radios, and environments. WEHST responds to this need. WEHST is an instance of industrial design for a wearable “engine” for human-situated telepresence that includes eight interoperable families of wearable electronic modules and accompanying textiles. These make up a platform technology for modular, scalable and adaptable toolsets for field practice, pedagogy, or research. This document details the considerations that went into the creation of the WEHST design

    Playful User Interfaces:Interfaces that Invite Social and Physical Interaction

    Get PDF
    corecore