19 research outputs found

    Impact of NCFET on Neural Network Accelerators

    Get PDF
    This is the first work to investigate the impact that Negative Capacitance Field-Effect Transistor (NCFET) brings on the efficiency and accuracy of future Neural Networks (NN). NCFET is at the forefront of emerging technologies, especially after it has become compatible with the existing fabrication process of CMOS. Neural Network inference accelerators are becoming ubiquitous in modern SoCs and there is an ever-increasing demand for tighter and tighter throughput constraints and lower energy consumption. To explore the benefits that NCFET brings to NN inference regarding frequency, energy, and accuracy, we investigate different configurations of the multiply-add (MADD) circuit, which is the core computational unit in any NN accelerator. We demonstrate that, compared to the baseline 7nm FinFET technology, its negative capacitance counterpart reduces the energy by 55%, without any frequency reduction. In addition, it enables leveraging higher computational precision, which results to a considerable improvement in the inference accuracy. Importantly, the achieved accuracy improvement comes also together with a significant energy reduction and without any loss in frequency

    Abnormal mitochondrial respiration in skeletal muscle in patients with peripheral arterial disease

    Get PDF
    AbstractObjectiveDiscrete morphologic, enzymatic and functional changes in skeletal muscle mitochondria have been demonstrated in patients with peripheral arterial disease (PAD). We examined mitochondrial respiration in the gastrocnemius muscle of nine patients (10 legs) with advanced PAD and in nine control patients (nine legs) without evidence of PAD.MethodsMitochondrial respiratory rates were determined with a Clark electrode in an oxygraph cell containing saponin-skinned muscle bundles. Muscle samples were obtained from the anteromedial aspect of the gastrocnemius muscle, at a level 10 cm distal to the tibial tuberosity. Mitochondria respiratory rate, calculated as nanoatoms of oxygen consumed per minute per milligram of noncollagen protein, were measured at baseline (V0), after addition of substrates (malate and glutamate; (VSUB), after addition of adenosine diphosphate (ADP) (VADP), and finally, after adenine nucleotide translocase inhibition with atractyloside (VAT). The acceptor control ratio, a sensitive indicator of overall mitochondrial function, was calculated as the ratio of the respiratory rate after the addition of ADP to the respiratory rate after adenine nucleotide translocase inhibition with atractyloside (VADP/ VAT).ResultsRespiratory rate in muscle mitochondria from patients with PAD were not significantly different from control values at baseline (0.31 ± 0.06 vs 0.55 ± 0.12; P = .09), but Vsub was significantly lower in patients with PAD compared with control subjects (0.43 ± 0.07 vs 0.89 ± 0.20; P < .05), as was VADP (0.69 ± 0.13 vs 1.24 ± 0.20; P < .05). Respiratory rates after atractyloside inhibition in patients with PAD were no different from those in control patients (0.47 ± 0.07 vs 0.45 ± P = .08). Compared with control values, mitochondria from patients with PAD had a significantly lower acceptor control ratio (1.41 ± 0.10 vs 2.90 ± 0.20; P < .001).ConclusionMitochondrial respiratory activity is abnormal in lower extremity skeletal muscle in patients with PAD. When considered in concert with the ultrastructural and enzymatic abnormalities previously documented in mitochondria of chronically ischemic muscle, these data support the concept of defective mitochondrial function as a pathophysiologic component of PAD

    Современное состояние электрификации России

    Get PDF
    В статье показано, что современное развитие электрификации РФ в сопоставлении с государствами, входящими в G8, очевидно недостающее. При этом есть большой потенциал электросбережения в секторах экономики. Потребление электроэнергии населением существенно находится в зависимости от значения их денежных доходов и темпов роста тарифов на электричество

    OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

    Full text link
    Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board

    Improving GPU Performance with a Power-Aware Streaming Multiprocessor Allocation Methodology

    No full text
    Graphics processing units (GPUs) are extensively used as accelerators across multiple application domains, ranging from general purpose applications to neural networks, and cryptocurrency mining. The initial utilization paradigm for GPUs was one application accessing all the resources of the GPU. In recent years, time sharing is broadly used among applications of a GPU, nevertheless, spatial sharing is not fully explored. When concurrent applications share the computational resources of a GPU, performance can be improved by eliminating idle resources. Additionally, the incorporation of GPUs in embedded and mobile devices increases the demand for power efficient computation due to battery limitations. In this article, we present an allocation methodology for streaming multiprocessors (SMs). The presented methodology works for two concurrent applications on a GPU and determines an allocation scheme that will provide power efficient application execution, combined with improved GPU performance. Experimental results show that the developed methodology yields higher throughput while achieving improved power efficiency, compared to other SM power-aware and performance-aware policies. If the presented methodology is adopted, it will lead to higher performance of applications that are concurrently executing on a GPU. This will lead to a faster and more efficient acceleration of execution, even for devices with restrained energy sources

    Trusted computing for embedded systems

    No full text
    This book describes the state-of-the-art in trusted computing for embedded systems. It shows how a variety of security and trusted computing problems are addressed currently and what solutions are expected to emerge in the coming years. The discussion focuses on attacks aimed at hardware and software for embedded systems, and the authors describe specific solutions to create security features. Case studies are used to present new techniques designed as industrial security solutions. Coverage includes development of tamper resistant hardware and firmware mechanisms for lightweight embedded devices, as well as those serving as security anchors for embedded platforms required by applications such as smart power grids, smart networked and home appliances, environmental and infrastructure sensor networks, etc. ·         Enables readers to address a variety of security threats to embedded hardware and software; ·         Describes design of secure wireless sensor networks, to address secure authentication of trusted portable devices for embedded systems; ·         Presents secure solutions for the design of smart-grid applications and their deployment in large-scale networked and systems.

    Run-time resource management and application customization for many-core embedded platforms

    No full text
    149 σ.Στην παρούσα διδακτορική διατριβή, παρουσιάζουμε (i) μεθολογίες επιτάχυνσης και παραμετροποίησης της διαχείρισης μνήμης σε επίπεδο middleware για την εφαρμογή προσαρμοσμένων Δυναμικών Διαχειριστών Μνήμης (ΔΔΜ) και (ii) μεθοδολογίες κατανεμημένης διαχείρισης πόρων σε πολυπύρηνες ενσωματωμένες πλατφόρμες. Αρχικά, η παραμετροποίηση επιτυγχάνεται με την εφαρμογή προσαρμοσμένων ΔΔΜ σε μικροκώδικα (microcode). Επιπλέον, η διαχείριση των πόρων της πλατφόρμας κατά τη φάση εκτέλεσης επιτυγχάνεται με τα προτεινόμενα μεθοδολογικά πλαίσια με την χρήση πολλών πυρήνων σε διαφορετικούς ρόλους και την ανάπτυξη επικοινωνιακών πλαισίων για τη μείωση του συνολικού φορτίου στο ολοκληρωμένο. Οι προτεινόμενες μεθοδoλογίες έδειξαν ότι η προσέγγιση σε μικροκώδικα αποτελεί μια καλή εναλλακτική λύση για να ξεπεραστεί το δίλημμα απόδοσης και ευελιξίας, προσφέροντας μια προγραμματιζόμενη και ευέλικτη λύση για την επιτάχυνση μεγάλου φάσματος εφαρμογών. Eπιπλέον, προσφέρουν ευελιξία στο θέμα της κατανεμημένης απεικόνισης των εφαρμογών στη φάση εκτέλεσης καθώς βασίζονται στο γεγονός ότι μπορούν να πετύχουν διαφορετικά επίπεδα αξιοποίησης των πόρων της πλατφόρμας ανάλογα με τις ανάγκες των εφαρμογών και χωρίς να υπάρχει κάποιο κεντρικό σημείο αποτυχίας. Όσον αφορά τις υπηρεσίες διαχείρισης μνήμης σε επίπεδο μικροκώδικα, τα πειραματικά αποτελέσματα δείχνουν ότι το κέρδος της προτεινόμενης προσέγγισης, για το σχεδιασμό εξατομικευμένων ΔΔΜ, ήταν περίπου 7x μεγαλύτερο με μια μικρή αύξηση, της τάξεως του 14%, στην καταναλισκόμενη ισχύ. Το πλαίσιο για την απεικόνιση εφαρμογών στη φάση εκτέλεσης προσαρμόζεται στις ανάγκες και στους περιορισμούς των εφαρμογών και παράγει κατά μέσο όρο 21% και 10% καλύτερο κόστος επικοινωνίας για ομογενείς και ετερογενείς πλατφόρμες αντίστοιχα. Τέλος, όσον αφορά τις πράλληλες εφαρμογές το προτεινόμενο πλαίσιο έχει κατά μέσο όρο 70% λιγότερα μηνύματα, 64% μικρότερο μέγεθος μηνυμάτων και 20% κέρδος στην επιτάχυνση των εφαρμογών.In this Ph.D. Thesis, we present (i) memory management middleware acceleration and customization methodologies for applying customized dynamic memory managers (allocators) and (ii) frameworks for distributed run-time resource management on many-core platforms. Firtsly, the customization is achieved by applying, on the middleware level, custom microcoded memory allocators. Secondly, the run-time resource management on the platform is achieved by using cores in different roles and by applying a distributed on-chip communication scheme. The proposed methodologies showed that the microcode approach is a good alternative to overcome the performanceflexibility dilemma, offering a programmable and flexible solution for accelerating a wide range of applications. Thus, we adopt the microcoded approach to address memory management issues on Distributed Shared Memory (DSM) many-core embedded platforms, aiming for hardware performance but maintaining the flexibility of programs. Also, the developed framework provides a flexible solution in the run-time mapping problem offering different levels of platform utilization according to application’s needs and without a central point of failure. Concerning microcoded memory management services, experimental results show that the gain, of the proposed approaches for designing customized microcoded memory managers, was approximately 7x for served allocation requests with a small increase of approximately 14% to average energy consumption per allocation. The run-time resource management framework adapts to application’s needs and application’s execution restrictions by using the matching factor parameter and produces on average 21% and 10% better on-chip communication cost for homogeneous and heterogeneous platforms respectively. Last, concerning the malleable parallel applications, the developed framework has on average 70% less messages, 64% smaller message size and 20% application speed-up gain.Ηρακλής Ν. Αναγνωστόπουλο
    corecore