Search CORE

411 research outputs found

MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge

Author: Nikolaidis Sokratis
Venieris Iakovos S.
Venieris Stylianos I.
Publication venue
Publication date: 22/06/2023
Field of study

Cascade systems comprise a two-model sequence, with a lightweight model processing all samples and a heavier, higher-accuracy model conditionally refining harder samples to improve accuracy. By placing the light model on the device side and the heavy model on a server, model cascades constitute a widely used distributed inference approach. With the rapid expansion of intelligent indoor environments, such as smart homes, the new setting of Multi-Device Cascade is emerging where multiple and diverse devices are to simultaneously use a shared heavy model on the same server, typically located within or close to the consumer environment. This work presents MultiTASC, a multi-tenancy-aware scheduler that adaptively controls the forwarding decision functions of the devices in order to maximize the system throughput, while sustaining high accuracy and low latency. By explicitly considering device heterogeneity, our scheduler improves the latency service-level objective (SLO) satisfaction rate by 20-25 percentage points (pp) over state-of-the-art cascade methods in highly heterogeneous setups, while serving over 40 devices, showcasing its scalability.Comment: Accepted at 28th IEEE Symposium on Computers and Communications (ISCC), 202

arXiv.org e-Print Archive

Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices

Author: Nikolaidis Sokratis
Panopoulos Ioannis
Venieris Iakovos S.
Venieris Stylianos I.
Publication venue
Publication date: 20/06/2023
Field of study

Deep learning (DL) is characterised by its dynamic nature, with new deep neural network (DNN) architectures and approaches emerging every few years, driving the field's advancement. At the same time, the ever-increasing use of mobile devices (MDs) has resulted in a surge of DNN-based mobile applications. Although traditional architectures, like CNNs and RNNs, have been successfully integrated into MDs, this is not the case for Transformers, a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges. In this work, we aim to make steps towards bridging this gap by examining the current state of Transformers' on-device execution. To this end, we construct a benchmark of representative models and thoroughly evaluate their performance across MDs with different computational capabilities. Our experimental results show that Transformers are not accelerator-friendly and indicate the need for software and hardware optimisations to achieve efficient deployment.Comment: Accepted at the 3rd IEEE International Workshop on Distributed Intelligent Systems (DistInSys), 202

arXiv.org e-Print Archive

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Author: Bouganis Christos-Savvas
Kouris Alexandros
Venieris Stylianos I.
Publication venue
Publication date: 19/02/2018
Field of study

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Approximate FPGA-based LSTMs under Computation Time Constraints

Author: Bouganis Christos-Savvas
Kouris Alexandros
Rizakis Michalis
Venieris Stylianos I.
Publication venue
Publication date: 05/01/2018
Field of study

Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art accuracy in several emerging Artificial Intelligence tasks. However, the models are becoming increasingly demanding in terms of computational and memory load. Emerging latency-sensitive applications including mobile robots and autonomous vehicles often operate under stringent computation time constraints. In this paper, we address the challenge of deploying computationally demanding LSTMs at a constrained time budget by introducing an approximate computing scheme that combines iterative low-rank compression and pruning, along with a novel FPGA-based LSTM architecture. Combined in an end-to-end framework, the approximation method's parameters are optimised and the architecture is configured to address the problem of high-performance LSTM execution in time-constrained applications. Quantitative evaluation on a real-life image captioning application indicates that the proposed methods required up to 6.5x less time to achieve the same application-level accuracy compared to a baseline method, while achieving an average of 25x higher accuracy under the same computation time constraints.Comment: Accepted at the 14th International Symposium in Applied Reconfigurable Computing (ARC) 201

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

A methodological approach to BISDN signalling performance

Author: Hou X.
Kalogeropoulos N.D.
Lekkou M.E.
Niemegeers I.G.
Venieris I.S.
Publication venue: Wiley
Publication date: 01/01/1994
Field of study

Sophisticated signalling protocols are required to properly handle the complex multimedia, multiparty services supported by the forthcoming BISDN. The implementation feasibility of these protocols should be evaluated during their design phase, so that possible performance bottlenecks are identified and removed. In this paper we present a methodology for evaluating the performance of BISDN signalling systems under design. New performance parameters are introduced and their network-dependent values are extracted through a message flow model which has the capability to describe the impact of call and bearer control separation on the signalling performance. Signalling protocols are modelled through a modular decomposition of the seven OSI layers including the service user to three submodels. The workload model is user descriptive in the sense that it does not approximate the direct input traffic required for evaluating the performance of a layer protocol; instead, through a multi-level approach, it describes the actual implications of user signalling activity for the general signalling traffic. The signalling protocol model is derived from the global functional model of the signalling protocols and information flows using a network of queues incorporating synchronization and dependency functions. The same queueing approach is followed for the signalling transfer network which is used to define processing speed and signalling bandwidth requirements and to identify possible performance bottlenecks stemming from the realization of the related protocols

DSpace at NTUA

University of Twente Research Information

Preprocessing algorithm for source localisation in a multipath environment

Author: Manikas A
Venieris E
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Several methods have been developed which allow the estimation of the location of an existing source with considerable accuracy in the absence of multipaths. However, if, in addition to the Line-of-Sight (LOS) path, non-LOS (NLOS) paths are also present, then all existing localisation algorithms dramatically fail to estimate the location of the source. In this paper, a passive array processing algorithm is proposed, which, if used prior to a localisation approach, suppresses all the multipath contributions in the received signal except for that of the LOS path. The performance of the proposed algorithm is evaluated through computer simulation studies

Crossref

Spiral - Imperial College Digital Repository