222 research outputs found

    Towards efficient on-board deployment of DNNs on intelligent autonomous systems

    Get PDF
    With their unprecedented performance in major AI tasks, deep neural networks (DNNs) have emerged as a primary building block in modern autonomous systems. Intelligent systems such as drones, mobile robots and driverless cars largely base their perception, planning and application-specific tasks on DNN models. Nevertheless, due to the nature of these applications, such systems require on-board local processing in order to retain their autonomy and meet latency and throughput constraints. In this respect, the large computational and memory demands of DNN workloads pose a significant barrier on their deployment on the resource-and power-constrained compute platforms that are available on-board. This paper presents an overview of recent methods and hardware architectures that address the system-level challenges of modern DNN-enabled autonomous systems at both the algorithmic and hardware design level. Spanning from latency-driven approximate computing techniques to high-throughput mixed-precision cascaded classifiers, the presented set of works paves the way for the on-board deployment of sophisticated DNN models on robots and autonomous systems

    A throughput-latency co-optimised cascade of convolutional neural network classifiers

    Get PDF
    Convolutional Neural Networks constitute a promi-nent AI model for classification tasks, serving a broad span ofdiverse application domains. To enable their efficient deploymentin real-world tasks, the inherent redundancy of CNNs is fre-quently exploited to eliminate unnecessary computational costs.Driven by the fact that not all inputs require the same amount ofcomputation to drive a confident prediction, multi-precision cas-cade classifiers have been recently introduced. FPGAs comprise apromising platform for the deployment of such input-dependentcomputation models, due to their enhanced customisation ca-pabilities. Current literature, however, is limited to throughput-optimised cascade implementations, employing large batching atthe expense of a substantial latency aggravation prohibiting theirdeployment on real-time scenarios. In this work, we introduce anovel methodology for throughput-latency co-optimised cascadedCNN classification, deployed on a custom FPGA architecturetailored to the target application and deployment platform,with respect to a set of user-specified requirements on accuracyand performance. Our experiments indicate that the proposedapproach achieves comparable throughput gains with relatedstate-of-the-art works, under substantially reduced overhead inlatency, enabling its deployment on latency-sensitive applications

    Large-distance behaviour of the graviton two-point function in de Sitter spacetime

    Full text link
    It is known that the graviton two-point function for the de Sitter invariant "Euclidean" vacuum in a physical gauge grows logarithmically with distance in spatially-flat de Sitter spacetime. We show that this logarithmic behaviour is a gauge artifact by explicitly demonstrating that the same behaviour can be reproduced by a pure-gauge two-point function.Comment: 19 pages, no figures, misprints and minor errors correcte

    A Note on Gradient/Fractional One-Dimensional Elasticity and Viscoelasticity

    Get PDF
    An introductory discussion on a (weakly non-local) gradient generalization of some one-dimensional elastic and viscoelastic models, and their fractional extension is provided. Emphasis is placed on the possible implications of micro-and nano-engineering problems, including small-scale structural mechanics and composite materials, as well as collagen biomechanics and nanomaterials

    On the scalar sector of the covariant graviton two-point function in de Sitter spacetime

    Get PDF
    We examine the scalar sector of the covariant graviton two-point function in de Sitter spacetime. This sector consists of the pure-trace part and another part described by a scalar field. We show that it does not contribute to two-point functions of gauge-invariant quantities. We also demonstrate that the long-distance growth present in some gauges is absent in this sector for a wide range of gauge parameters.Comment: 15 pages, no figures, LaTeX, considerably shortene

    HAPI: Hardware-Aware Progressive Inference

    Full text link
    Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks. Despite their popularity, CNN inference still comes at a high computational cost. A growing body of work aims to alleviate this by exploiting the difference in the classification difficulty among samples and early-exiting at different stages of the network. Nevertheless, existing studies on early exiting have primarily focused on the training scheme, without considering the use-case requirements or the deployment platform. This work presents HAPI, a novel methodology for generating high-performance early-exit networks by co-optimising the placement of intermediate exits together with the early-exit strategy at inference time. Furthermore, we propose an efficient design space exploration algorithm which enables the faster traversal of a large number of alternative architectures and generates the highest-performing design, tailored to the use-case requirements and target hardware. Quantitative evaluation shows that our system consistently outperforms alternative search mechanisms and state-of-the-art early-exit schemes across various latency budgets. Moreover, it pushes further the performance of highly optimised hand-crafted early-exit CNNs, delivering up to 5.11x speedup over lightweight models on imposed latency-driven SLAs for embedded devices.Comment: Accepted at the 39th International Conference on Computer-Aided Design (ICCAD), 202

    SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

    Full text link
    Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.Comment: Accepted at the 26th Annual International Conference on Mobile Computing and Networking (MobiCom), 202
    corecore