7 research outputs found

    Improving Performance Estimation for FPGA-based Accelerators for Convolutional Neural Networks

    Get PDF
    Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application instances. To determine the optimal configuration of an FPGA-based accelerator, it is necessary to explore the design space and an accurate performance prediction plays an important role during the exploration. This work introduces a novel method for fast and accurate estimation of latency based on a Gaussian process parametrised by an analytic approximation and coupled with runtime data. The experiments conducted on three different CNNs on an FPGA-based accelerator on Intel Arria 10 GX 1150 demonstrated a 30.7% improvement in accuracy with respect to the mean absolute error in comparison to a standard analytic method in leave-one-out cross-validation.Comment: This article is accepted for publication at ARC'202

    Visual, navigation and communication aid for visually impaired person

    Get PDF
    The loss of vision restrained the visually impaired people from performing their daily task. This issue has impeded their free-movement and turned them into dependent a person. People in this sector did not face technologies revamping their situations. With the advent of computer vision, artificial intelligence, the situation improved to a great extent. The propounded design is an implementation of a wearable device which is capable of performing a lot of features. It is employed to provide visual instinct by recognizing objects, identifying the face of choices. The device runs a pre-trained model to classify common objects from household items to automobiles items. Optical character recognition and Google translate were executed to read any text from image and convert speech of the user to text respectively. Besides, the user can search for an interesting topic by the command in the form of speech. Additionally, ultrasonic sensors were kept fixed at three positions to sense the obstacle during navigation. The display attached help in communication with deaf person and GPS and GSM module aid in tracing the user. All these features run by voice commands which are passed through the microphone of any earphone. The visual input is received through the camera and the computation task is processed in the raspberry pi board. However, the device seemed to be effective during the test and validation

    Optimising algorithm and hardware for deep neural networks on FPGAs

    Get PDF
    This thesis proposes novel algorithm and hardware optimisation approaches to accelerate Deep Neural Networks (DNNs), including both Convolutional Neural Networks (CNNs) and Bayesian Neural Networks (BayesNNs). The first contribution of this thesis is to propose an adaptable and reconfigurable hardware design to accelerate CNNs. By analysing the computational patterns of different CNNs, a unified hardware architecture is proposed for both 2-Dimension and 3-Dimension CNNs. The accelerator is also designed with runtime adaptability, which adopts different parallelism strategies for different convolutional layers at runtime. The second contribution of this thesis is to propose a novel neural network architecture and hardware design co-optimisation approach, which improves the performance of CNNs at both algorithm and hardware levels. Our proposed three-phase co-design framework decouples network training from design space exploration, which significantly reduces the time-cost of the co-optimisation process. The third contribution of this thesis is to propose an algorithmic and hardware co-optimisation framework for accelerating BayesNNs. At the algorithmic level, three categories of structured sparsity are explored to reduce the computational complexity of BayesNNs. At the hardware level, we propose a novel hardware architecture with the aim of exploiting the structured sparsity for BayesNNs. Both algorithmic and hardware optimisations are jointly applied to push the performance limit.Open Acces

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    Leveraging disaggregated accelerators and non-volatile memories to improve the efficiency of modern datacenters

    Get PDF
    (English) Traditional data centers consist of computing nodes that possess all the resources physically attached. When there was the need to deal with more significant demands, the solution has been to either add more nodes (scaling out) or increase the capacity of existing ones (scaling-up). Workload requirements are traditionally fulfilled by selecting compute platforms from pools that better satisfy their average or maximum resource requirements depending on the price that the user is willing to pay. The amount of processor, memory, storage, and network bandwidth of a selected platform needs to meet or exceed the platform requirements of the workload. Beyond those explicitly required by the workload, additional resources are considered stranded resources (if not used) or bonus resources (if used). Meanwhile, workloads in all market segments have evolved significantly during the last decades. Today, workloads have a larger variety of requirements in terms of characteristics related to the computing platforms. Those workload new requirements include new technologies such as GPU, FPGA, NVMe, etc. These new technologies are more expensive and thus become more limited. It is no longer feasible to increase the number of resources according to potential peak demands, as this significantly raises the total cost of ownership. Software-Defined-Infrastructures (SDI), a new concept for the data center architecture, is being developed to address those issues. The main SDI proposition is to disaggregate all the resources over the fabric to enable the required flexibility. On SDI, instead of pools of computational nodes, the pools consist of individual units of resources (CPU, memory, FPGA, NVMe, GPU, etc.). When an application needs to be executed, SDI identifies the computational requirements and assembles all the resources required, creating a composite node. Resource disaggregation brings new challenges and opportunities that this thesis will explore. This thesis demonstrates that resource disaggregation brings opportunities to increase the efficiency of modern data centers. This thesis demonstrates that resource disaggregation may increase workloads' performance when sharing a single resource. Thus, needing fewer resources to achieve similar results. On the other hand, this thesis demonstrates how through disaggregation, aggregation of resources can be made, increasing a workload's performance. However, to take maximum advantage of those characteristics and flexibility, orchestrators must be aware of them. This thesis demonstrates how workload-aware techniques applied at the resource management level allow for improved quality of service leveraging resource disaggregation. Enabling resource disaggregation, this thesis demonstrates a reduction of up to 49% missed deadlines compared to a traditional schema. This reduction can rise up to 100% when enabling workload awareness. Moreover, this thesis demonstrates that GPU partitioning and disaggregation further enhances the data center flexibility. This increased flexibility can achieve the same results with half the resources. That is, with a single physical GPU partitioned and disaggregated, the same results can be achieved with 2 GPU disaggregated but not partitioned. Finally, this thesis demonstrates that resource fragmentation becomes key when having a limited set of heterogeneous resources, namely NVMe and GPU. For the case of an heterogeneous set of resources, and specifically when some of those resources are highly demanded but limited in quantity. That is, the situation where the demand for a resource is unexpectedly high, this thesis proposes a technique to minimize fragmentation that reduces deadlines missed compared to a disaggregation-aware policy of up to 86%.(Catal脿) Els datacenters tradicionals consisteixen en un seguit de nodes computacionals que contenen al seu interior tots els recursos necessaris. Quan hi ha una necessitat de gestionar demandes superiors la soluci贸 era o afegir m茅s nodes (scale-out) o incrementar la capacitat dels existents (scale-up). Els requisits de les aplicacions tradicionalment s贸n satisfets seleccionant recursos de racks que satisfan millor el seu SLA basats o en la mitjana dels requisits o en el m脿xim possible, en funci贸 del preu que l'usuari estigui disposat a pagar. La quantitat de processadors, mem貌ria, disc, i banda d'ampla d'un rack necessita satisfer o excedir els requisits de l'aplicaci贸. Els recursos addicionals als requerits per les aplicacions s贸n considerats inactius (si no es fan servir) o addicionals (si es fan servir). Per altra banda, les aplicacions en tots els segments de mercat han evolucionat significativament en les 煤ltimes d猫cades. Avui en dia, les aplicacions tenen una gran varietat de requisits en termes de caracter铆stiques que ha de tenir la infraestructura. Aquests nous requisits inclouen tecnologies com GPU, FPGA, NVMe, etc. Aquestes tecnologies s贸n m茅s cares i, per tant, m茅s limitades. Ja no 茅s factible incrementar el nombre de recursos segons el potencial pic de demanda, ja que aix貌 incrementa significativament el cost total de la infraestructura. Software-Defined Infrastructures 茅s un nou concepte per a l'arquitectura de datacenters que s'est脿 desenvolupant per pal路liar aquests problemes. La proposici贸 principal de SDI 茅s desagregar tots els recursos sobre la xarxa per garantir una major flexibilitat. Sota SDI, en comptes de racks de nodes computacionals, els racks consisteix en unitats individuals de recursos (CPU, mem貌ria, FPGA, NVMe, GPU, etc). Quan una aplicaci贸 necessita executar, SDI identifica els requisits computacionals i munta una plataforma amb tots els recursos necessaris, creant un node composat. La desagregaci贸 de recursos porta nous reptes i oportunitats que s'exploren en aquesta tesi. Aquesta tesi demostra que la desagregaci贸 de recursos ens dona l'oportunitat d'incrementar l'efici猫ncia dels datacenters moderns. Aquesta tesi demostra la desagregaci贸 pot incrementar el rendiment de les aplicacions. Per貌 per treure el m脿xim partit a aquestes caracter铆stiques i d'aquesta flexibilitat, els orquestradors n'han de ser conscient. Aquesta tesi demostra que aplicant t猫cniques conscients de l'aplicaci贸 aplicades a la gesti贸 de recursos permeten millorar la qualitat del servei a trav茅s de la desagregaci贸 de recursos. Habilitar la desagregaci贸 de recursos porta a una reducci贸 de fins al 49% els deadlines perduts comparat a una pol铆tica tradicional. Aquesta reducci贸 pot incrementar-se fins al 100% quan s'habilita la consci猫ncia de l'aplicaci贸. A m茅s a m茅s, aquesta tesi demostra que el particionat de GPU combinat amb la desagregaci贸 millora encara m茅s la flexibilitat. Aquesta millora permet aconseguir els mateixos resultats amb la meitat de recursos. 脡s a dir, amb una sola GPU f铆sica particionada i desagregada, els mateixos resultats s贸n obtinguts que utilitzant-ne dues desagregades per貌 no particionades. Finalment, aquesta tesi demostra que la gesti贸 de la fragmentaci贸 de recursos 茅s una pe莽a clau quan la quantitat de recursos 茅s limitada en un conjunt heterogeni de recursos. Pel cas d'un conjunt heterogeni de recursos, i especialment quan aquests recursos tenen molta demanda per貌 s贸n limitats en quantitat. 脡s a dir, quan la demanda pels recursos 茅s inesperadament alta, aquesta tesi proposa una t猫cnica minimitzant la fragmentaci贸 que redueix els deadlines perduts comparats a una pol铆tica de desagregaci贸 de fins al 86%.Arquitectura de computador

    Recognition of objects to grasp and Neuro-Prosthesis control

    Get PDF
    corecore