17 research outputs found

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Optimising runtime reconfigurable designs for high performance applications

    Get PDF
    This thesis proposes novel optimisations for high performance runtime reconfigurable designs. For a reconfigurable design, the proposed approach investigates idle resources introduced by static design approaches, and exploits runtime reconfiguration to eliminate the inefficient resources. The approach covers the circuit level, the function level, and the system level. At the circuit level, a method is proposed for tuning reconfigurable designs with two analytical models: a resource model for computational and memory resources and memory bandwidth, and a performance model for estimating execution time. This method is applied to tuning implementations of finite-difference algorithms, optimising arithmetic operators and memory bandwidth based on algorithmic parameters, and eliminating idle resources by runtime reconfiguration. At the function level, a method is proposed to automatically identify and exploit runtime reconfiguration opportunities while optimising resource utilisation. The method is based on Reconfiguration Data Flow Graph, a new hierarchical graph structure enabling runtime reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and runtime solution generation. At the system level, a method is proposed for optimising reconfigurable designs by dynamically adapting the designs to available runtime resources in a reconfigurable system. This method includes two steps: compile-time optimisation and runtime scaling, which enable efficient workload distribution, asynchronous communication scheduling, and domain-specific optimisations. It can be used in developing effective servers for high performance applications.Open Acces

    A Feature-based Configurtor for CAM

    Get PDF

    Protein Design and Structure Determination at High-Precision

    Get PDF
    Due to the complementarity of the protein design and folding problems, progress on either front has consistently advanced the other. Although both problems remain major challenges, computational protein design has benefited amply from protein structure prediction methods. Likewise, the fields of structure prediction and structural biology have widely adopted techniques from the protein design field. The work I present here aims to put forward new protein design as well as structure determination strategies with the objective of achieving maximum precision. Both strategies capitalise on two posits: the first is that localising the sampling problem allows for exhaustive and finer granularity solution searching, while the second is that accelerated temporal dynamics can allow for directed and accurate exploration of energy landscapes. In the presented protein design projects, the level of precision was evaluated by comparing the coordinates from the experimental structures of the designs to their in silico models. Whereas in the structure determination projects, the precision was evaluated by how well a determined structure ensemble reproduces various experimental observables. Since all of the previous design work utilising conserved supersecondary structures has aimed at constructing repeat proteins from amplifying a single fragment, my first project aims at designing an asymmetric globular (i.e. non-repetitive) fold from two unrelated supersecondary structures. I thereby conceive an interface-driven strategy aiming at constructing a viable intramolecular interface across the participating supersecondary structures. I report the successful design of the target fold that agrees with the experimental NMR structure at atomic precision (backbone RMSD of 0.9 Å), where the designed protein was substantially more stable than its closest natural counterpart. Through the second project I aim to demonstrate the capacity of this interface-driven strategy to tackle the more difficult problem of novel fold design. The computational design of novel folds persists as a profound challenge, as in this case the association between structural and sequence features is absent a priori. This has kept most of the previous design efforts within the known fold space. I accordingly have expanded my interface-design methods, with the goal of achieving efficient sampling at maximum topological control. As a demonstration I conceive and design a novel corrugated protein architecture that does not exist in nature. The resulting NMR and X-ray structures for two different designs agree with the in silico models at atomic precision. On the third project I develop a new generalised method for mapping protein conformational populations from NMR data by unravelling the distribution of states that underlie the experimentally acquired average quantities. The CoMAND method does not only provide a quantitative mapping of the probabilities of the constituent microstates, but is also capable of extracting previously untapped structural information and solving structures de novo from a single NOESY experiment. I further present a detailed protocol that produces highly refined, dynamics-based ensembles without any recourse to heuristics or knowledge-based scoring. Finally, I validate the method’s precision by using the refined ensemble to quantitatively predict NMR observables that are orthogonal to the NOESY data

    A distributed architecture for unmanned aerial systems based on publish/subscribe messaging and simultaneous localisation and mapping (SLAM) testbed

    Get PDF
    A dissertation submitted in fulfilment for the degree of Master of Science. School of Computational and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa, November 2017The increased capabilities and lower cost of Micro Aerial Vehicles (MAVs) unveil big opportunities for a rapidly growing number of civilian and commercial applications. Some missions require direct control using a receiver in a point-to-point connection, involving one or very few MAVs. An alternative class of mission is remotely controlled, with the control of the drone automated to a certain extent using mission planning software and autopilot systems. For most emerging missions, there is a need for more autonomous, cooperative control of MAVs, as well as more complex data processing from sensors like cameras and laser scanners. In the last decade, this has given rise to an extensive research from both academia and industry. This research direction applies robotics and computer vision concepts to Unmanned Aerial Systems (UASs). However, UASs are often designed for specific hardware and software, thus providing limited integration, interoperability and re-usability across different missions. In addition, there are numerous open issues related to UAS command, control and communication(C3), and multi-MAVs. We argue and elaborate throughout this dissertation that some of the recent standardbased publish/subscribe communication protocols can solve many of these challenges and meet the non-functional requirements of MAV robotics applications. This dissertation assesses the MQTT, DDS and TCPROS protocols in a distributed architecture of a UAS control system and Ground Control Station software. While TCPROS has been the leading robotics communication transport for ROS applications, MQTT and DDS are lightweight enough to be used for data exchange between distributed systems of aerial robots. Furthermore, MQTT and DDS are based on industry standards to foster communication interoperability of “things”. Both protocols have been extensively presented to address many of today’s needs related to networks based on the internet of things (IoT). For example, MQTT has been used to exchange data with space probes, whereas DDS was employed for aerospace defence and applications of smart cities. We designed and implemented a distributed UAS architecture based on each publish/subscribe protocol TCPROS, MQTT and DDS. The proposed communication systems were tested with a vision-based Simultaneous Localisation and Mapping (SLAM) system involving three Parrot AR Drone2 MAVs. Within the context of this study, MQTT and DDS messaging frameworks serve the purpose of abstracting UAS complexity and heterogeneity. Additionally, these protocols are expected to provide low-latency communication and scale up to meet the requirements of real-time remote sensing applications. The most important contribution of this work is the implementation of a complete distributed communication architecture for multi-MAVs. Furthermore, we assess the viability of this architecture and benchmark the performance of the protocols in relation to an autonomous quadcopter navigation testbed composed of a SLAM algorithm, an extended Kalman filter and a PID controller.XL201

    Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA

    Get PDF
    In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow. The most important achievements of the work presented in this thesis are summarised here. Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes. Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place. Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.EThOS - Electronic Theses Online ServiceThomas Gerald Gray Charitable TrustGBUnited Kingdo

    System-Level Power Estimation Methodology for MPSoC based Platforms

    Get PDF
    Avec l'essor des nouvelles technologies d'intégration sur silicium submicroniques, la consommation de puissance dans les systèmes sur puce multiprocesseur (MPSoC) est devenue un facteur primordial au niveau du flot de conception. La prise en considération de ce facteur clé dès les premières phases de conception, joue un rôle primordial puisqu'elle permet d'augmenter la fiabilité des composants et de réduire le temps d'arrivée sur le marché du produit final.Shifting the design entry point up to the system-level is the most important countermeasure adopted to manage the increasing complexity of Multiprocessor System on Chip (MPSoC). The reason is that decisions taken at this level, early in the design cycle, have the greatest impact on the final design in terms of power and energy efficiency. However, taking decisions at this level is very difficult, since the design space is extremely wide and it has so far been mostly a manual activity. Efficient system-level power estimation tools are therefore necessary to enable proper Design Space Exploration (DSE) based on power/energy and timing.VALENCIENNES-Bib. électronique (596069901) / SudocSudocFranceF
    corecore