19 research outputs found

    An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

    Get PDF
    Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler

    Strategies towards high performance (high-resolution/linearity) time-to-digital converters on field-programmable gate arrays

    Get PDF
    Time-correlated single-photon counting (TCSPC) technology has become popular in scientific research and industrial applications, such as high-energy physics, bio-sensing, non-invasion health monitoring, and 3D imaging. Because of the increasing demand for high-precision time measurements, time-to-digital converters (TDCs) have attracted attention since the 1970s. As a fully digital solution, TDCs are portable and have great potential for multichannel applications compared to bulky and expensive time-to-amplitude converters (TACs). A TDC can be implemented in ASIC and FPGA devices. Due to the low cost, flexibility, and short development cycle, FPGA-TDCs have become promising. Starting with a literature review, three original FPGA-TDCs with outstanding performance are introduced. The first design is the first efficient wave union (WU) based TDC implemented in Xilinx UltraScale (20 nm) FPGAs with a bubble-free sub-TDL structure. Combining with other existing methods, the resolution is further enhanced to 1.23 ps. The second TDC has been designed for LiDAR applications, especially in driver-less vehicles. Using the proposed new calibration method, the resolution is adjustable (50, 80, and 100 ps), and the linearity is exceptionally high (INL pk-pk and INL pk-pk are lower than 0.05 LSB). Meanwhile, a software tool has been open-sourced with a graphic user interface (GUI) to predict TDCs’ performance. In the third TDC, an onboard automatic calibration (AC) function has been realized by exploiting Xilinx ZYNQ SoC architectures. The test results show the robustness of the proposed method. Without the manual calibration, the AC function enables FPGA-TDCs to be applied in commercial products where mass production is required.Time-correlated single-photon counting (TCSPC) technology has become popular in scientific research and industrial applications, such as high-energy physics, bio-sensing, non-invasion health monitoring, and 3D imaging. Because of the increasing demand for high-precision time measurements, time-to-digital converters (TDCs) have attracted attention since the 1970s. As a fully digital solution, TDCs are portable and have great potential for multichannel applications compared to bulky and expensive time-to-amplitude converters (TACs). A TDC can be implemented in ASIC and FPGA devices. Due to the low cost, flexibility, and short development cycle, FPGA-TDCs have become promising. Starting with a literature review, three original FPGA-TDCs with outstanding performance are introduced. The first design is the first efficient wave union (WU) based TDC implemented in Xilinx UltraScale (20 nm) FPGAs with a bubble-free sub-TDL structure. Combining with other existing methods, the resolution is further enhanced to 1.23 ps. The second TDC has been designed for LiDAR applications, especially in driver-less vehicles. Using the proposed new calibration method, the resolution is adjustable (50, 80, and 100 ps), and the linearity is exceptionally high (INL pk-pk and INL pk-pk are lower than 0.05 LSB). Meanwhile, a software tool has been open-sourced with a graphic user interface (GUI) to predict TDCs’ performance. In the third TDC, an onboard automatic calibration (AC) function has been realized by exploiting Xilinx ZYNQ SoC architectures. The test results show the robustness of the proposed method. Without the manual calibration, the AC function enables FPGA-TDCs to be applied in commercial products where mass production is required

    Distributed services across the network from edge to core

    Get PDF
    The current internet architecture is evolving from a simple carrier of bits to a platform able to provide multiple complex services running across the entire Network Service Provider (NSP) infrastructure. This calls for increased flexibility in resource management and allocation to provide dedicated, on-demand network services, leveraging a distributed infrastructure consisting of heterogeneous devices. More specifically, NSPs rely on a plethora of low-cost Customer Premise Equipment (CPE), as well as more powerful appliances at the edge of the network and in dedicated data-centers. Currently a great research effort is spent to provide this flexibility through Fog computing, Network Functions Virtualization (NFV), and data plane programmability. Fog computing or Edge computing extends the compute and storage capabilities to the edge of the network, closer to the rapidly growing number of connected devices and applications that consume cloud services and generate massive amounts of data. A complementary technology is NFV, a network architecture concept targeting the execution of software Network Functions (NFs) in isolated Virtual Machines (VMs), potentially sharing a pool of general-purpose hosts, rather than running on dedicated hardware (i.e., appliances). Such a solution enables virtual network appliances (i.e., VMs executing network functions) to be provisioned, allocated a different amount of resources, and possibly moved across data centers in little time, which is key in ensuring that the network can keep up with the flexibility in the provisioning and deployment of virtual hosts in today’s virtualized data centers. Moreover, recent advances in networking hardware have introduced new programmable network devices that can efficiently execute complex operations at line rate. As a result, NFs can be (partially or entirely) folded into the network, speeding up the execution of distributed services. The work described in this Ph.D. thesis aims at showing how various network services can be deployed throughout the NSP infrastructure, accommodating to the different hardware capabilities of various appliances, by applying and extending the above-mentioned solutions. First, we consider a data center environment and the deployment of (virtualized) NFs. In this scenario, we introduce a novel methodology for the modelization of different NFs aimed at estimating their performance on different execution platforms. Moreover, we propose to extend the traditional NFV deployment outside of the data center to leverage the entire NSP infrastructure. This can be achieved by integrating native NFs, commonly available in low-cost CPEs, with an existing NFV framework. This facilitates the provision of services that require NFs close to the end user (e.g., IPsec terminator). On the other hand, resource-hungry virtualized NFs are run in the NSP data center, where they can take advantage of the superior computing and storage capabilities. As an application, we also present a novel technique to deploy a distributed service, specifically a web filter, to leverage both the low latency of a CPE and the computational power of a data center. We then show that also the core network, today dedicated solely to packet routing, can be exploited to provide useful services. In particular, we propose a novel method to provide distributed network services in core network devices by means of task distribution and a seamless coordination among the peers involved. The aim is to transform existing network nodes (e.g., routers, switches, access points) into a highly distributed data acquisition and processing platform, which will significantly reduce the storage requirements at the Network Operations Center and the packet duplication overhead. Finally, we propose to use new programmable network devices in data center networks to provide much needed services to distributed applications. By offloading part of the computation directly to the networking hardware, we show that it is possible to reduce both the network traffic and the overall job completion time

    Split and Shift Methodology: Overcoming Hardware Limitations on Cellular Processor Arrays for Image Processing

    Get PDF
    Na era multimedia, o procesado de imaxe converteuse nun elemento de singular importancia nos dispositivos electrónicos. Dende as comunicacións (p.e. telemedicina), a seguranza (p.e. recoñecemento retiniano) ou control de calidade e de procesos industriais (p.e. orientación de brazos articulados, detección de defectos do produto), pasando pola investigación (p.e. seguimento de partículas elementais) e diagnose médica (p.e. detección de células estrañas, identificaciónn de veas retinianas), hai un sinfín de aplicacións onde o tratamento e interpretación automáticas de imaxe e fundamental. O obxectivo último será o deseño de sistemas de visión con capacidade de decisión. As tendencias actuais requiren, ademais, a combinación destas capacidades en dispositivos pequenos e portátiles con resposta en tempo real. Isto propón novos desafíos tanto no deseño hardware como software para o procesado de imaxe, buscando novas estruturas ou arquitecturas coa menor area e consumo de enerxía posibles sen comprometer a funcionalidade e o rendemento

    Late-bound code generation

    Get PDF
    Each time a function or method is invoked during the execution of a program, a stream of instructions is issued to some underlying hardware platform. But exactly what underlying hardware, and which instructions, is usually left implicit. However in certain situations it becomes important to control these decisions. For example, particular problems can only be solved in real-time when scheduled on specialised accelerators, such as graphics coprocessors or computing clusters. We introduce a novel operator for hygienically reifying the behaviour of a runtime function instance as a syntactic fragment, in a language which may in general differ from the source function definition. Translation and optimisation are performed by recursively invoked, dynamically dispatched code generators. Side-effecting operations are permitted, and their ordering is preserved. We compare our operator with other techniques for pragmatic control, observing that: the use of our operator supports lifting arbitrary mutable objects, and neither requires rewriting sections of the source program in a multi-level language, nor interferes with the interface to individual software components. Due to its lack of interference at the abstraction level at which software is composed, we believe that our approach poses a significantly lower barrier to practical adoption than current methods. The practical efficacy of our operator is demonstrated by using it to offload the user interface rendering of a smartphone application to an FPGA coprocessor, including both statically and procedurally defined user interface components. The generated pipeline is an application-specific, statically scheduled processor-per-primitive rendering pipeline, suitable for place-and-route style optimisation. To demonstrate the compatibility of our operator with existing languages, we show how it may be defined within the Python programming language. We introduce a transformation for weakening mutable to immutable named bindings, termed let-weakening, to solve the problem of propagating information pertaining to named variables between modular code generating units.Open Acces

    Resilience of an embedded architecture using hardware redundancy

    Get PDF
    In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound. Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures. In this work we study the concepts of fault tolerance and dependability and extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators
    corecore