224 research outputs found

    Towards efficient on-board deployment of DNNs on intelligent autonomous systems

    Get PDF
    With their unprecedented performance in major AI tasks, deep neural networks (DNNs) have emerged as a primary building block in modern autonomous systems. Intelligent systems such as drones, mobile robots and driverless cars largely base their perception, planning and application-specific tasks on DNN models. Nevertheless, due to the nature of these applications, such systems require on-board local processing in order to retain their autonomy and meet latency and throughput constraints. In this respect, the large computational and memory demands of DNN workloads pose a significant barrier on their deployment on the resource-and power-constrained compute platforms that are available on-board. This paper presents an overview of recent methods and hardware architectures that address the system-level challenges of modern DNN-enabled autonomous systems at both the algorithmic and hardware design level. Spanning from latency-driven approximate computing techniques to high-throughput mixed-precision cascaded classifiers, the presented set of works paves the way for the on-board deployment of sophisticated DNN models on robots and autonomous systems

    A throughput-latency co-optimised cascade of convolutional neural network classifiers

    Get PDF
    Convolutional Neural Networks constitute a promi-nent AI model for classification tasks, serving a broad span ofdiverse application domains. To enable their efficient deploymentin real-world tasks, the inherent redundancy of CNNs is fre-quently exploited to eliminate unnecessary computational costs.Driven by the fact that not all inputs require the same amount ofcomputation to drive a confident prediction, multi-precision cas-cade classifiers have been recently introduced. FPGAs comprise apromising platform for the deployment of such input-dependentcomputation models, due to their enhanced customisation ca-pabilities. Current literature, however, is limited to throughput-optimised cascade implementations, employing large batching atthe expense of a substantial latency aggravation prohibiting theirdeployment on real-time scenarios. In this work, we introduce anovel methodology for throughput-latency co-optimised cascadedCNN classification, deployed on a custom FPGA architecturetailored to the target application and deployment platform,with respect to a set of user-specified requirements on accuracyand performance. Our experiments indicate that the proposedapproach achieves comparable throughput gains with relatedstate-of-the-art works, under substantially reduced overhead inlatency, enabling its deployment on latency-sensitive applications

    Large-distance behaviour of the graviton two-point function in de Sitter spacetime

    Full text link
    It is known that the graviton two-point function for the de Sitter invariant "Euclidean" vacuum in a physical gauge grows logarithmically with distance in spatially-flat de Sitter spacetime. We show that this logarithmic behaviour is a gauge artifact by explicitly demonstrating that the same behaviour can be reproduced by a pure-gauge two-point function.Comment: 19 pages, no figures, misprints and minor errors correcte

    A Note on Gradient/Fractional One-Dimensional Elasticity and Viscoelasticity

    Get PDF
    An introductory discussion on a (weakly non-local) gradient generalization of some one-dimensional elastic and viscoelastic models, and their fractional extension is provided. Emphasis is placed on the possible implications of micro-and nano-engineering problems, including small-scale structural mechanics and composite materials, as well as collagen biomechanics and nanomaterials

    On the scalar sector of the covariant graviton two-point function in de Sitter spacetime

    Get PDF
    We examine the scalar sector of the covariant graviton two-point function in de Sitter spacetime. This sector consists of the pure-trace part and another part described by a scalar field. We show that it does not contribute to two-point functions of gauge-invariant quantities. We also demonstrate that the long-distance growth present in some gauges is absent in this sector for a wide range of gauge parameters.Comment: 15 pages, no figures, LaTeX, considerably shortene

    HAPI: Hardware-Aware Progressive Inference

    Full text link
    Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks. Despite their popularity, CNN inference still comes at a high computational cost. A growing body of work aims to alleviate this by exploiting the difference in the classification difficulty among samples and early-exiting at different stages of the network. Nevertheless, existing studies on early exiting have primarily focused on the training scheme, without considering the use-case requirements or the deployment platform. This work presents HAPI, a novel methodology for generating high-performance early-exit networks by co-optimising the placement of intermediate exits together with the early-exit strategy at inference time. Furthermore, we propose an efficient design space exploration algorithm which enables the faster traversal of a large number of alternative architectures and generates the highest-performing design, tailored to the use-case requirements and target hardware. Quantitative evaluation shows that our system consistently outperforms alternative search mechanisms and state-of-the-art early-exit schemes across various latency budgets. Moreover, it pushes further the performance of highly optimised hand-crafted early-exit CNNs, delivering up to 5.11x speedup over lightweight models on imposed latency-driven SLAs for embedded devices.Comment: Accepted at the 39th International Conference on Computer-Aided Design (ICCAD), 202

    Stochastic Dynamic Analysis of Cultural Heritage Towers up to Collapse

    Get PDF
    This paper deals with the seismic vulnerability of monumental unreinforced masonry (URM) towers, the fragility of which has not yet been sufficiently studied. Thus, the present paper fills this gap by developing models to investigate the seismic response of URM towers up to collapse. On mount Athos, Greece, there exist more than a hundred medieval towers, having served mainly as campaniles or fortifications. Eight representative towers were selected for a thorough investigation to estimate their seismic response characteristics. Their history and architectural features are initially discussed and a two-step analysis follows: (i) limit analysis is performed to estimate the collapse mechanism and the locations of critical cracks, (ii) non-linear explicit dynamic analyses are then carried out, developing finite element (FE) simulations, with cracks modelled as interfacial surfaces to derive the capacity curves. A meaningful definition of the damage states is proposed based on the characteristics of their capacity curves, with the ultimate limit state related to collapse. The onset of slight damage-state is characterised by the formation and development of cracks responsible for the collapse mechanism of the structure. Apart from these two, another two additional limit states are also specified: the moderate damage-state and the extensive one. Fragility and vulnerability curves are finally generated which can help the assessment and preservation of cultural heritage URM towers

    SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

    Full text link
    Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.Comment: Accepted at the 26th Annual International Conference on Mobile Computing and Networking (MobiCom), 202
    corecore