Search CORE

14 research outputs found

Recommended from our members

COMET: Communication-optimised multi-threaded error-detection technique

Author: Jones TM
Mitropoulou K
Porpodas V
Publication venue: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2016
Publication date: 01/01/2016
Field of study

© 2016 ACM. Relentless technology scaling has made transistors more vulnerable to soft, or transient, errors. To keep systems robust against these, current error detection techniques use different types of redundancy at the hardware or the software level. A consequence of these additional protection mechanisms is that these systems tend to become slower. In particular, software error-detection techniques degrade performance considerably, limiting their uptake. This paper focuses on software redundant multi-threading error detection, a compiler-based technique that makes use of redundant cores within a multi-core system to perform error checking. Implementations of this scheme feature two threads that execute almost the same code: the main thread runs the original code and the checker thread executes code to verify the correctness of the original. The main thread communicates the values that require checking to the checker thread to use in its comparisons. We identify a major performance bottleneck in existing schemes: poorly performing inter-core communication and the generated code associated with it. Our study shows this is a major performance impediment within existing techniques since the two threads require extremely fine-grained communication, on the order of every few instructions. We alleviate this bottleneck with a series of code generation optimisations at the compiler level. We propose COMET (Communication-Optimised Multi-threaded Error-detection Technique), which improves performance across the NAS parallel benchmarks by 31.4% (on average) compared to the state-of-the-art, without affecting fault-coverage

Apollo (Cambridge)

Recommended from our members

Lynx: Using OS and hardware support for fast fine-grained inter-core communication

Author: Jones TM
Mitropoulou K
Porpodas V
Zhang X
Publication venue: Proceedings of the International Conference on Supercomputing
Publication date: 01/06/2016
Field of study

Designing high-performance software queues for fast intercore communication is challenging, but critical for maximising software parallelism. State-of-the-art single-producer / single-consumer queues for streaming applications contain multiple sections, requiring the producer and consumer to operate independently on different sections from each other. While these queues perform well for coarse-grained data transfers, they perform poorly in the fine-grained case. This paper proposes Lynx, a novel SP/SC queue, specifically tuned for fine-grained communication. Lynx is built from the ground up, reducing the generated code on the critical-path to just two operations per enqueue and dequeue. To achieve this it relies on existing commodity processor hardware and operating system exception handling support to deal with infrequent queue maintenance operations. Lynx outperforms the state-of-the art by up to 1.57× in total 64-bit throughput reaching a peak throughput of 15.7GB/s on a common desktop system. Real applications using Lynx get a performance improvement of up to 1.4×.This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), through grant reference EP/K026399/1.This is the author accepted manuscript. The final version is available from Association for Computing Machinery via http://dx.doi.org/10.1145/2925426.2926274

Apollo (Cambridge)

Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs

Author: Franke B.
Jones Timothy M.
Porpodas V.
Sundararajan K.T.
Topham N.P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2012
Field of study

Edinburgh Research Explorer

A Common Left Occipito-Temporal Dysfunction in Developmental Dyslexia and Acquired Letter-By-Letter Reading?

Author: A Mechelli
AL Bokde
B Hagtvet
BD McCandliss
C Beaulieu
C Henry
CD Porpodas
CJ Fiebach
D Spinelli
Denise Sturm
DL Share
DL Share
F Cao
F Richlan
Fabio Richlan
G Silani
GK Deutsch
Gunther Ladurner
H Lyytinen
H Wimmer
H Wimmer
Hans P Op de Beeck
Heinz Wimmer
J Bergmann
JC Ziegler
JC Ziegler
JEJ González
JF Démonet
JL Bruno
JR Binder
K Landerl
K Landerl
K Tsapkini
KJ Friston
KR Pugh
KR Pugh
L Barca
L Cohen
L Cohen
L Cohen
M Aro
M Ben-Shachar
M Coltheart
M De Luca
M De Luca
M Kronbichler
M Kronbichler
M Kronbichler
M Kronbichler
M Schurz
MA Eckert
Martin Kronbichler
Matthias Schurz
MJ Snowling
P Zoccolotti
P Zoccolotti
PH Seymour
R Davies
R Gaillard
R Sandak
R Yap
RE Frye
RH Baayen
RNA Henson
S Epelbaum
S Hawelka
S Heim
S Van der Mark
SE Shaywitz
SM Brambati
T Bitan
T Klingberg
T Richards
TD Wager
U Maurer
U Tewes
V Blau
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

We used fMRI to examine functional brain abnormalities of German-speaking dyslexics who suffer from slow effortful reading but not from a reading accuracy problem. Similar to acquired cases of letter-by-letter reading, the developmental cases exhibited an abnormal strong effect of length (i.e., number of letters) on response time for words and pseudowords.Corresponding to lesions of left occipito-temporal (OT) regions in acquired cases, we found a dysfunction of this region in our developmental cases who failed to exhibit responsiveness of left OT regions to the length of words and pseudowords. This abnormality in the left OT cortex was accompanied by absent responsiveness to increased sublexical reading demands in phonological inferior frontal gyrus (IFG) regions. Interestingly, there was no abnormality in the left superior temporal cortex which--corresponding to the onological deficit explanation--is considered to be the prime locus of the reading difficulties of developmental dyslexia cases.The present functional imaging results suggest that developmental dyslexia similar to acquired letter-by-letter reading is due to a primary dysfunction of left OT regions

Paris Lodron University of Salzburg

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

COMET: Communication-optimised multi-threaded error-detection technique

Author: Jones Timothy M.
Mitropoulou K
Porpodas V
Publication venue: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2016
Publication date: 06/07/2016
Field of study

Apollo (Cambridge)

Vectorization-aware loop unrolling with seed forwarding

Author: Allen John R
Anderson Andrew
Andrade Diego
Callahan David
Carr Steve
Carr Steve
Davidson Jack W.
Davies James
Eichenberger Alexandre E.
Ferrer Roger
Guthaus M. R.
Huang J. C.
Huh Joonmoo
Hwang J.
Karrenberg R.
Karrenberg Ralf
Kisuki Toru
Knijnenburg P. M. W.
Kuck D. J.
Larsen S.
Leather Hugh
Liu J.
Maleki Saeed
Masten Matt
Mendis Charith
Moll Simon
Nuzman Dorit
Nuzman Dorit
Petkov D.
Porpodas Vasileios
Porpodas Vasileios
Porpodas Vasileios
Porpodas Vasileios
Porpodas Vasileios
Porpodas Vasileios
Ren Gang
Sarkar Vivek
Shin J.
Stephenson Mark
Stock Kevin
van Engelen Robert
Wolfe Michael
Zhou Hao
Zima Eugene V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/12/2019
Field of study

Loop unrolling is a widely adopted loop transformation, commonly used for enabling subsequent optimizations. Straight-line-code vectorization (SLP) is an optimization that benefits from unrolling. SLP converts isomorphic instruction sequences into vector code. Since unrolling generates repeatead isomorphic instruction sequences, it enables SLP to vectorize more code. However, most production compilers apply these optimizations independently and uncoordinated. Unrolling is commonly tuned to avoid code bloat, not maximizing the potential for vectorization, leading to missed vectorization opportunities. We are proposing VALU, a novel loop unrolling heuristic that takes vectorization into account when making unrolling decisions. Our heuristic is powered by an analysis that estimates the potential benefit of SLP vectorization for the unrolled version of the loop. Our heuristic then selects the unrolling factor that maximizes the utilization of the vector units. VALU also forwards the vectorizable code to SLP, allowing it to bypass its greedy search for vectorizable seed instructions, exposing more vectorization opportunities. Our evaluation on a production compiler shows that VALU uncovers many vectorization opportunities that were missed by the default loop unroller and vectorizers. This results in more vectorized code and significant performance speedups for 17 of the kernels of the TSVC benchmarks suite, reaching up to 2× speedup over the already highly optimized -O3. Our evaluation on full benchmarks from FreeBench and MiBench shows that VALU results in a geo-mean speedup of 1.06×

Crossref

Edinburgh Research Explorer

The University of Manchester - Institutional Repository

White Rose Research Online