Search CORE

14 research outputs found

Virtual Infrastructure Optimisation

Author: A Bulut
A Chebotko
A Kritikakou
A Nussbaum
A Papageorgiou
A Taal
C Müller
D Downey
D Kreutz
D Li
G Casale
H Zhou
I Foster
J Wang
MA Rodriguez
N Laranjeiro
N Serrano
P Ingwersen
P Štefanič
PA Laplante
S Abrishami
S Alawneh
S Koulouzis
S Koulouzis
S Taherizadeh
SE Dashti
X Liao
Y Hu
Y Hu
Z Cai
Z Fu
Z Usmani
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures

Author: A. Kritikakou
B Moon
DF Bacon
F Desprez
G Shobaki
HR Arabnia
HR Arabnia
HR Arabnia
HR Arabnia
HR Arabnia
Iosif Mporas
J Kurzak
K Goto
KD Cooper
M Hattori
M Kulkarni
M Stephenson
M Tartara
MA Wani
N Binkert
N Nethercote
P Bjørstad
P Kulkarni
PA Kulkarni
R Nath
RC Whaley
RC Whaley
RD Blumofe
SM Bhandarkar
SM Bhandarkar
SM Bhandarkar
SS Pinter
T Austin
V Strassen
Vasilios Kelefouras
Vasilios Kolonias
VI Kelefouras
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Current compilers cannot generate code that can compete with hand-tuned code in efficiency, even for a simple kernel like matrix–matrix multiplication (MMM). A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and number of levels of tiling. The scheduling parameter values selection is a very difficult and time-consuming task, since parameter values depend on each other; this is why they are found by using searching methods and empirical techniques. To overcome this problem, the scheduling sub-problems must be optimized together, as one problem and not separately. In this paper, an MMM methodology is presented where the optimum scheduling parameters are found by decreasing the search space theoretically, while the major scheduling sub-problems are addressed together as one problem and not separately according to the hardware architecture parameters and input size; for different hardware architecture parameters and/or input sizes, a different implementation is produced. This is achieved by fully exploiting the software characteristics (e.g., data reuse) and hardware architecture parameters (e.g., data caches sizes and associativities), giving high-quality solutions and a smaller search space. This methodology refers to a wide range of CPU and GPU architectures

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Sheffield Hallam University Research Archive

HAL Descartes

University of Hertfordshire Research Archive

Hal-Diderot

HAL-Rennes 1

Conclusions and Future Directions

Author: A Kritikakou
A. Kritikakou
F Catthoor
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Experimental evaluation of neutron-induced errors on a multicore RISC-V platform

Author: Dos Santos FF
Kritikakou A
Sentieys O
Publication venue
Publication date: 01/01/2022
Field of study

ePubs: the open archive for STFC research publications

Neutrons Sensitivity of Deep Reinforcement Learning Policies on EdgeAI Accelerators

Author: Bodmann PR
Kritikakou A
Rech P
Saveriano M
Publication venue
Publication date: 01/01/2024
Field of study

ePubs: the open archive for STFC research publications

Authentication with RIPEMD-160 and other alternatives: A Hardware Design Perspective 103 X Authentication with RIPEMD-160 and other alternatives: A Hardware Design Perspective

Author: A Gregoriades
A Kritikakou
C Goutis
G Athanasiou
H Michail
V Kelefouras
Publication venue
Publication date: 03/04/2020
Field of study

Abstract Taking into consideration the rapid evolution of communication standards that include message authentication and integrity verification, it is realized that constructions like MAC and HMAC, are widely used in the most popular cryptographic schemes since provision of a way to check the integrity of information transmitted over or stored in an unreliable medium is a prime necessity in the world of open computing and communications. MACs are used so as to protect both a message's integrity as well as its authenticity, by allowing verifiers (who also possess the secret key) to detect any changes to the message content. In every modern cryptographic scheme that is used to secure a crucial application that calls for security, a keyed-hash message authentication code, or HMAC, is incorporated. Beyond HMAC, a block cipher algorithm is also incorporated (i.e like AES), thus resulting to the whole security scheme. The proposed hardware design invokes a number of optimizing techniques like pipeline, evaluation-based partial unrolling, certain algorithmic transformations in space and time and computational re-ordering, leading to a highthroughput and low-power design for the whole HMAC construction. Finally, a new algorithm, CMAC, for producing message authenticating codes (MACs) which was recently proposed by NIST, is also described. The proposed security scheme incorporates a FIPS approved and a secure block cipher algorithm (that might have already been deployed in the security scheme) and was standardized by NIST in May, 2005. This work concludes with an efficient hardware implementation of the CMAC standard

CiteSeerX

Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints

Author: Angeliki Kritikakou
Costas Goutis
Dimond R.
Ferrandi F.
Francky Catthoor
George S. Athanasiou
Hennessy J.
Jozwiak L.
Kritikakou A.
Liao J.
Vasilios Kelefouras
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2013
Field of study

A systematic methodology for near-optimal software/hardware codesign mapping onto an FPGA platform with microprocessor and HW accelerators is proposed. The mapping steps deal with the inter-organization, the foreground memory management, and the datapath mapping. A step is described by parameters and equations combined in a scalable template. Mapping decisions are propagated as design constraints to prune suboptimal options in next steps. Several performance-area Pareto points are produced by instantiating the parameters. To evaluate our methodology we map a real-time bio-imaging application and loop-dominated benchmarks

Crossref

Sheffield Hallam University Research Archive

Authentication with RIPEMD-160 and other alternatives: A Hardware Design Perspective

Author: A. Gregoriades
A. Kritikakou
C. Goutis
G. Athanasiou
H. Michail
V. Kelefouras
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

Taking into consideration the rapid evolution of communication standards that include message authentication and integrity verification, it is realized that constructions like MAC and HMAC, are widely used in the most popular cryptographic schemes since provision of a way to check the integrity of information transmitted over or stored in an unreliable medium is a prime necessity in the world of open computing and communications. MACs are used so as to protect both a message's integrity as well as its authenticity, by allowing verifiers (who also possess the secret key) to detect any changes to the message content. In every modern cryptographic scheme that is used to secure a crucial application that calls for security, a keyed-hash message authentication code, or HMAC, is incorporated. Beyond HMAC, a block cipher algorithm is also incorporated (i.e like AES), thus resulting to the whole security scheme. The proposed hardware design invokes a number of optimizing techniques like pipeline, evaluation-based partial unrolling, certain algorithmic transformations in space and time and computational re-ordering, leading to a highthroughput and low-power design for the whole HMAC construction. Finally, a new algorithm, CMAC, for producing message authenticating codes (MACs) which was recently proposed by NIST, is also described. The proposed security scheme incorporates a FIPS approved and a secure block cipher algorithm (that might have already been deployed in the security scheme) and was standardized by NIST in May, 2005. This work concludes with an efficient hardware implementation of the CMAC standard

IntechOpen

Crossref

Ktisis

Sheffield Hallam University Research Archive

A scalable and near-optimal representation of access schemes for memory management

Author: Angeliki Kritikakou
Bartzas A.
Catthoor F.
Costas Goutis
Danckaert K.
Francky Catthoor
Janjusic T.
Jha P. K.
Kritikakou A.
Lee C.
Lippens P. E. R.
Nachtergaele L.
So B.
Swaaij M.
Vasilios Kelefouras
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Memory management searches for the resources required to store the concurrently alive elements. The solution quality is affected by the representation of the element accesses: a sub-optimal representation leads to overestimation and a non-scalable representation increases the exploration time. We propose a methodology to near-optimal and scalable represent regular and irregular accesses. The representation consists of a set of pattern entries to compactly describe the behavior of the memory accesses and of pattern operations to consistently combine the pattern entries. The result is a final sequence of pattern entries which represents the global access scheme without unnecessary overestimation

Lirias

Crossref

Sheffield Hallam University Research Archive

Ultra low energy domain specific instruction-set processor for on-line surveillance

Author: Catthoor F
Huisken JA Jos
Kritikakou A
Novo D
Perre L Van Der
Raghavan P
Publication venue: IEEE Computer Society
Publication date: 01/01/2010
Field of study

\u3cp\u3eMany signal processing applications demand for highly energy efficient flexible implementations. In this paper, we propose a novel Domain Specific Instruction-set Processor (DSIP) architecture template which is tuned to deploy in the targeted domain of on-line surveillance. The architectur e, when implemented using a 40-nm CMOS standard cell library, executes a representative test vehicle with an energy efficiency of near ly 900 MOPS/mW including instruction and data memor ies. This is about 20 times higher than a state-of-the-ar t low power DSP architecture and less than a factor 2 below a heavily optimized ASIC realization for the same application benchmark.\u3c/p\u3

Repository TU/e