47 research outputs found

    Vertical Optimizations of Convolutional Neural Networks for Embedded Systems

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Dynamic ConvNets on Tiny Devices via Nested Sparsity

    Get PDF
    This work introduces a new training and compression pipeline to build nested sparse convolutional neural networks (ConvNets), a class of dynamic ConvNets suited for inference tasks deployed on resource-constrained devices at the edge of the Internet of Things. A nested sparse ConvNet consists of a single ConvNet architecture, containing N sparse subnetworks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at runtime, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weight subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 microcontroller unit (MCU), nested sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving 1) comparable accuracy; 2) remarkable storage savings; and 3) high performance. Moreover, when compared to state-of-the-art dynamic strategies, such as dynamic pruning and layer width scaling, nested sparse ConvNets turn out to be Pareto optimal in the accuracy versus latency space

    Design and Characterization of a Stand-Alone Merging Unit

    Get PDF
    Merging Units (MUs) play a key role in enhancing the levels of security and the reliability of power systems, allowing for advanced remote diagnostics. Some of the benefits are a more efficient transmission of electricity and a better integration with renewable energy systems. In this article, an implementation of a Stand-Alone Merging Unit (SAMU), compliant with the IEC 61850-9-2 standard and based on a low-cost ARM microcontroller, is described. It acquires two signals, one voltage and one current, and it sends the samples over the ethernet connection. A high-resolution Analogue-to-Digital Converter (ADC), synchronised to the Universal Time Coordinated (UTC) through a Global Positioning System (GPS) disciplined oscillator, is used. The opportune insulation and conditioning stage have been designed. Several tests have been performed, varying amplitude, frequency, and phase of the input signals, in order to evaluate the metrological performance of the proposed SAMU and they are here discussed.</p

    Next-generation HPC models for future Rotorcraft applications

    Get PDF
    Rotorcraft technologies pose great scientific and industrial challenges for numerical computing. As available computational resources approach the exascale, finer scales and therefore more accurate simulations of engineering test cases become accessible. However, shifting legacy workflows and optimizing parallel efficiency and scalability of existing software on new hardware is often demanding. This paper reports preliminary results in CFD and structural dynamics simulations using the T106A Low Pressure Turbine (LPT) blade geometry on Leonardo S.p.A.&#39;s davinci-1 high-performance computing (HPC) facility. Time to solution and scalability are assessed for commercial packages Ansys Fluent, STAR-CCM+, and ABAQUS, and the open-source scientific computing framework PyFR. In direct numerical simulations of compressible fluid flow, normalized time to solution values obtained using PyFR are found to be up to 8 times smaller than those obtained using Fluent and STAR-CCM+. The findings extend to the incompressible case. All models offer weak and strong scaling in tests performed on up to 48 compute nodes, each with 4 Nvidia A100 GPUs. In linear elasticity simulations with ABAQUS, both the iterative solver and the direct solver provide speedup in preliminary scaling tests, with the iterative solver outperforming the direct solver in terms of time-to-solution and memory usage. The results provide a first indication of the potential of HPC architectures in scaling engineering applications towards certification by simulation, and the first step for the Company towards the use of cutting-edge HPC toolkits in the field of Rotorcraft technologies

    SARS-CoV-2 Gamma and Delta Variants of Concern Might Undermine Neutralizing Activity Generated in Response to BNT162b2 mRNA Vaccination

    Get PDF
    The Delta variant raised concern regarding its ability to evade SARS-CoV-2 vaccines. We evaluated a serum neutralizing response of 172 Italian healthcare workers, three months after complete Comirnaty (BNT162b2 mRNA, BioNTech-Pfizer) vaccination, testing their sera against viral isolates of Alpha, Gamma and Delta variants, including 36 subjects with a previous SARS-CoV-2 infection. We assessed whether IgG anti-spike TRIM levels and serum neutralizing activity by seroneutralization assay were associated. Concerning Gamma variant, a two-fold reduction in neutralizing titres compared to the Alpha variant was observed, while a four-fold reduction of Delta virus compared to Alpha was found. A gender difference was observed in neutralizing titres only for the Gamma variant. The serum samples of 36 previously infected SARS-CoV-2 individuals neutralized Alpha, Gamma and Delta variants, demonstrating respectively a nearly three-fold and a five-fold reduction in neutralizing titres compared to Alpha variant. IgG anti-spike TRIM levels were positively correlated with serum neutralizing titres against the three variants. The Comirnaty vaccine provides sustained neutralizing antibody activity towards the Alpha variant, but it is less effective against Gamma and even less against Delta variants

    Investigation on Anthrax in Bangladesh during the Outbreaks of 2011 and Definition of the Epidemiological Correlations

    Get PDF
    In 2011, in Bangladesh, 11 anthrax outbreaks occurred in six districts of the country. Different types of samples were collected from May to September in the six districts where anthrax had occurred in order to detect and type Bacillus anthracis (B. anthracis) strains. Anthrax was detected in 46.6% of the samples analysed, in particular in soils, but also in bone samples, water, animal feed, and rumen ingesta of dead animals. Canonical single nucleotide polymorphisms (CanSNPs) analysis showed that all the isolates belonged to the major lineage A, sublineage A.Br.001/002 of China and Southeast Asia while the multi-locus variable number of tandem repeats (VNTRs) analysis (MLVA) with 15 VNTRs demonstrated the presence of five genotypes, of which two resulted to be new genotypes. The single nucleotide repeats (SNRs) analysis showed 13 SNR types; nevertheless, due to its higher discriminatory power, the presence of two isolates with different SNR-type polymorphisms was detected within two MLVA genotypes. This study assumes that soil is not the only reason for the spread of the disease in Bangladesh; contaminated feed and water can also play an important role in the epidemiology of anthrax. Possible explanations for these epidemiological relationships are discussed

    COVID-19 in rheumatic diseases in Italy: first results from the Italian registry of the Italian Society for Rheumatology (CONTROL-19)

    Get PDF
    OBJECTIVES: Italy was one of the first countries significantly affected by the coronavirus disease 2019 (COVID-19) epidemic. The Italian Society for Rheumatology promptly launched a retrospective and anonymised data collection to monitor COVID-19 in patients with rheumatic and musculoskeletal diseases (RMDs), the CONTROL-19 surveillance database, which is part of the COVID-19 Global Rheumatology Alliance. METHODS: CONTROL-19 includes patients with RMDs and proven severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) updated until May 3rd 2020. In this analysis, only molecular diagnoses were included. The data collection covered demographic data, medical history (general and RMD-related), treatments and COVID-19 related features, treatments, and outcome. In this paper, we report the first descriptive data from the CONTROL-19 registry. RESULTS: The population of the first 232 patients (36% males) consisted mainly of elderly patients (mean age 62.2 years), who used corticosteroids (51.7%), and suffered from multi-morbidity (median comorbidities 2). Rheumatoid arthritis was the most frequent disease (34.1%), followed by spondyloarthritis (26.3%), connective tissue disease (21.1%) and vasculitis (11.2%). Most cases had an active disease (69.4%). Clinical presentation of COVID-19 was typical, with systemic symptoms (fever and asthenia) and respiratory symptoms. The overall outcome was severe, with high frequencies of hospitalisation (69.8%), respiratory support oxygen (55.7%), non-invasive ventilation (20.9%) or mechanical ventilation (7.5%), and 19% of deaths. Male patients typically manifested a worse prognosis. Immunomodulatory treatments were not significantly associated with an increased risk of intensive care unit admission/mechanical ventilation/death. CONCLUSIONS: Although the report mainly includes the most severe cases, its temporal and spatial trend supports the validity of the national surveillance system. More complete data are being acquired in order to both test the hypothesis that RMD patients may have a different outcome from that of the general population and determine the safety of immunomodulatory treatments

    A CoSimulation Framework for Assessment of Power Knobs in Deep-Learning Accelerators

    No full text
    This thesis provides a tool able to meet the demands of a collaborative work between machine-learning experts and digital designers, with a particular attention to hardware accelerators implementing a spatial architecture. The latter consists of a large number of processing elements, interconnected with a network-on-chip allowing the sharing of operands and to carry on computations spatially. In particular, the contribution of this work has consists in the development of a co-simulation framework able to: • Evaluate the effective energy efficiency of realistic workload of spatial accelerators avoiding the simulation of the entire accelerator micro-architecture. • Explore the design space to evaluate pros and cons of the designated HW architecture and power-management strategy. • Enable an early efficiency testing of energy-aware neural network model, without waiting for the complete design of the accelerator, thanks to an accurate estimation of the energy profile of the real hardware platform. The tool has been designed in order to be easily interfaced with common frameworks for machine learning and with the industrial ASIC design flow. The general philosophy behind the co-simulation framework is to have a behavioral neural network inferential engine that communicates with a gate-level simulator: the inferential engine can provide stimuli to the circuit, collect responses, status signals and modify the configuration of power knobs. The aim is to simulate the system also from a non-functional perspective, thus the need for a gate level simulator, but with only the minimum hardware required to verify the impact of a specific power management strategy on the network accuracy. In particular, the effect of power knobs on the system is emulated through a library of SDF files, one for each working condition, which can be loaded by the gate-level simulator when a power-context switch is performed

    Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile

    No full text
    Most of today's computer vision pipelines are built around deep neural networks, where convolution operations require most of the generally high compute effort. The Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm, reducing the operation count by a factor of 2.25x for 3x3 convolutions when using the version with 2x2-sized tiles F2. Even though the gain is significant, the Winograd algorithm with larger tile sizes, i.e., F4, offers even more potential in improving throughput and energy efficiency, as it reduces the required MACs by 4x. Unfortunately, the Winograd algorithm with larger tile sizes introduces numerical issues that prevent its use on integer domain-specific accelerators and higher computational overhead to transform input and output data between spatial and Winograd domains. To unlock the full potential of Winograd F4, we propose a novel tap-wise quantization method that overcomes the numerical issues of using larger tiles, enabling integer-only inference. Moreover, we present custom hardware units that process the Winograd transformations in a power- and area-efficient way, and we show how to integrate such custom modules in an industrial-grade, programmable DSA. An extensive experimental evaluation on a large set of state-of-the-art computer vision benchmarks reveals that the tap-wise quantization algorithm makes the quantized Winograd F4 network almost as accurate as the FP32 baseline. The Winograd-enhanced DSA achieves up to 1.85x gain in energy efficiency and up to 1.83x end-to-end speed-up for state-of-the-art segmentation and detection networks
    corecore