Search CORE

247 research outputs found

An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration

Author: Ascheid Gerd
Bytyn Andreas
Leupers Rainer
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

In recent years, neural networks have surpassed classical algorithms in areas such as object recognition, e.g. in the well-known ImageNet challenge. As a result, great effort is being put into developing fast and efficient accelerators, especially for Convolutional Neural Networks (CNNs). In this work we present ConvAix, a fully C-programmable processor, which -- contrary to many existing architectures -- does not rely on a hard-wired array of multiply-and-accumulate (MAC) units. Instead it maps computations onto independent vector lanes making use of a carefully designed vector instruction set. The presented processor is targeted towards latency-sensitive applications and is capable of executing up to 192 MAC operations per cycle. ConvAix operates at a target clock frequency of 400 MHz in 28nm CMOS, thereby offering state-of-the-art performance with proper flexibility within its target domain. Simulation results for several 2D convolutional layers from well known CNNs (AlexNet, VGG-16) show an average ALU utilization of 72.5% using vector instructions with 16 bit fixed-point arithmetic. Compared to other well-known designs which are less flexible, ConvAix offers competitive energy efficiency of up to 497 GOP/s/W while even surpassing them in terms of area efficiency and processing speed.Comment: Accepted for publication in the proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Enabling multi-segment 5G service provisioning and maintenance through network slicing

Author: Agraz Bujan Fernando
Montero Herrera Rafael
Pagès Cruz Albert
Spadaro Salvatore
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/01/2020
Field of study

This is a post-peer-review, pre-copyedit version of an article published in Journal of Network and Systems Management . The final authenticated version is available online at: http://dx.doi.org/10.1007/s10922-019-09509-9The current deployment of 5G networks in a way to support the highly demanding service types defined for 5G, has brought the need for using new techniques to accommodate legacy networks to such requirements. Network Slicing in turn, enables sharing the same underlying physical infrastructure among services with different requirements, thus providing a level of isolation between them to guarantee their proper functionality. In this work, we analyse from an architectural point of view, the required coordination for the provisioning of 5G services over multiple network segments/domains by means of network slicing, considering as well the use of sensors and actuators to maintain slices performance during its lifetime. We set up an experimental multi-segment testbed to demonstrate end-to-end service provisioning and its guarantee in terms of specific QoS parameters, such as latency, throughput and Virtual Network Function (VNF) CPU/RAM consumption. The results provided, demonstrate the workflow between different network components to coordinate the deployment of slices, besides providing a set of examples for slice maintenance through service monitoring and the use of policy-based actuations.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Programmatic Orchestration of {WiFi} Networks

Author: Feldmann A.
Hühn T.
Merz R.
Sarrar N.
Schulz-Zander J.
Suresh L.
Publication venue
Publication date: 01/01/2014
Field of study

MPG.PuRe

Χρήση μοντέλου παράλληλου προγραμματισμού για σύνθεση αρχιτεκτονικών

Author: Owaida Muhsen
Publication venue
Publication date: 01/01/2012
Field of study

The problem of automatically generating hardware modules from high level application representations has been at the forefront of EDA research during the last few years. In this Dissertation we introduce a methodology to automatically synthesize hardware accelerators from OpenCL applications. OpenCL is a recent industry supported standard for writing programs that execute on multicore platforms and accelerators such as GPUs. Our methodology maps OpenCL kernels into hardware accelerators based on architectural templates that explicitly decouple computation from memory communication whenever this is possible. The templates can be tuned to provide a wide repertoire of accelerators that meet user performance requirements and FPGA device characteristics. Furthermore a set of high- and low-level compiler optimizations is applied to generate optimized accelerators. Our experimental evaluation shows that the generated accelerators are tuned efficiently to match the applications memory access pattern and computational complexity and to achieve user performance requirements. An important objective of our tool is to expand the FPGA development user base to software engineers thereby expanding the scope of FPGAs beyond the realm of hardware design.To πρόβλημα της αυτόματης δημιουργίας μονάδων υλικό από παραστάσεις υψηλού επιπέδου εφαρμογής είναι στην πρώτη γραμμή της EDA έρευνας κατά τη διάρκεια των τελευταίων ετών. Σε αυτή την διατριβή παρουσιάζουμε μια μεθοδολογία για τη αυτόματη σύνθεση επιταχυντές υλικού από εφαρμογές OpenCL. OpenCL είναι ένα πρόσφατο πρότυπο για τη σύνταξη των προγραμμάτων που εκτελούνται σε πλατφόρμες πολλαπλών πυρήνων και επιταχυντές όπως GPUs. Η μεθοδολογία μας μετατρέπει προγράμματα OpenCL σε επιταχυντές υλικού με βάση αρχιτεκτονικά πρότυπα που ρητά αποσυνδέει τους υπολογισμούς από την μεταφορά δεδομένων από/προς την μνήμη όποτε αυτό είναι δυνατό. Τα πρότυπα μπορούν να συντονιστούν ώστε να παρέχουν ένα ευρύ ρεπερτόριο από επιταχυντές που πληρούν τις απαιτήσεις απόδοσης των χρηστών και τα χαρακτηριστικά της συσκευής FPGA. Επιπλέον ένα σύνολο υψηλής και χαμηλής στάθμης βελτιστοποιήσεις μεταγλωττιστή εφαρμόζεται για να παράγει βελτιστοποιημένα επιταχυντές. Η πειραματική αξιολόγηση δείχνει ότι οι επιταχυντές που δημιουργούνται αποτελεσματικά συντονισμένοι για να ταιριάζει με το μοτίβο πρόσβασης στην μνήμη κάθε εφαρμογής και την υπολογιστική πολυπλοκότητα και να επιτύχουν τις απαιτήσεις απόδοσης των χρηστών. Ένας σημαντικός στόχος του εργαλείου μας είναι η επέκταση της βάσης χρηστών πλατφόρμες FPGA για μηχανικούς λογισμικού ώστε να γίνει ανάπτυξη FPGA συστήματα από μηχανικούς λογισμικού χωρίς την ανάγκη για εμπειρία σχεδιασμού υλικού

Hellenic National Archive of Doctoral Dissertations

University of Thessaly Institutional Repository

Optimization and implementation of a Viterbi decoder under flexibility constraints

Author: Anderson John B
Kamuf Matthias
Öwall Viktor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This paper discusses the impact of flexibility when designing a Viterbi decoder for both convolutional and TCM codes. Different trade-offs have to be considered in choosing the right architecture for the processing blocks and the resulting hardware penalty is evaluated. We study the impact of symbol quantization that degrades performance and affects the wordlength of the rate-flexible trellis datapath. A radix-2-based architecture for this datapath relaxes the hardware requirements on the branch metric and survivor path blocks substantially. The cost of flexibility in terms of cell area and power consumption is explored by an investigation of synthesized designs that provide different transmission rates. Two designs are fabricated in a digital 0.13-

mu{hbox {m}}

CMOS process. Based on post-layout simulations, a symbol baud rate of 168 Mbaud/s is achieved in TCM mode, equivalent to a maximum throughput of 840 Mbit/s using a 64-QAM constellation

Lund University Publications

FLEX Testbed : a platform for 4G/5G wireless networking research

Author: Garcia Perez CA
Korakis T
Lyberopoulos G
Maglogiannis Vasilis
Makris N
Merino Gomez P
Milosevic N
Naudts Dries
Nikaein N
Seskar I
Spirou S
Theodoropoulou E
Tosic M
Publication venue: {ICT} {F}ire {B}ook, {N}ovember 2016
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

A Tutorial on the Implementation of Block Ciphers: Software and Hardware Applications

Author: Howard M. Heys
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 13/12/2020
Field of study

In this article, we discuss basic strategies that can be used to implement block ciphers in both software and hardware environments. As models for discussion, we use substitution-permutation networks which form the basis for many practical block cipher structures. For software implementation, we discuss approaches such as table lookups and bit-slicing, while for hardware implementation, we examine a broad range of architectures from high speed structures like pipelining, to compact structures based on serialization. To illustrate different implementation concepts, we present example data associated with specific methods and discuss sample designs that can be employed to realize different implementation strategies. We expect that the article will be of particular interest to researchers, scientists, and engineers that are new to the field of cryptographic implementation

Cryptology ePrint Archive

Biologically Inspired Vision Architectures: a Software/Hardware Perspective

Author: Antonio Gentile
Francesco S. Fabiano
Marco La Cascia
Roberto Pirrone
Publication venue: 'IntechOpen'
Publication date: 01/01/2007
Field of study

IntechOpen

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Palermo