Search CORE

3,536 research outputs found

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Author: Liu Weifeng
Vinter Brian
Publication venue: 'Elsevier BV'
Publication date: 14/09/2015
Field of study

Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

arXiv.org e-Print Archive

Copenhagen University Research Information System

E-QED: Electrical Bug Localization During Post-Silicon Validation Enabled by Quick Error Detection and Formal Methods

Author: B Vermeulen
D Lin
E Clarke
FM Paula De
HF Ko
M Abramovici
M Dusanapudi
NR Saxena
PH Bardell
PN Sanda
RB Jones
S-B Park
T Larrabee
Publication venue
Publication date: 23/07/2017
Field of study

During post-silicon validation, manufactured integrated circuits are extensively tested in actual system environments to detect design bugs. Bug localization involves identification of a bug trace (a sequence of inputs that activates and detects the bug) and a hardware design block where the bug is located. Existing bug localization practices during post-silicon validation are mostly manual and ad hoc, and, hence, extremely expensive and time consuming. This is particularly true for subtle electrical bugs caused by unexpected interactions between a design and its electrical state. We present E-QED, a new approach that automatically localizes electrical bugs during post-silicon validation. Our results on the OpenSPARC T2, an open-source 500-million-transistor multicore chip design, demonstrate the effectiveness and practicality of E-QED: starting with a failed post-silicon test, in a few hours (9 hours on average) we can automatically narrow the location of the bug to (the fan-in logic cone of) a handful of candidate flip-flops (18 flip-flops on average for a design with ~ 1 Million flip-flops) and also obtain the corresponding bug trace. The area impact of E-QED is ~2.5%. In contrast, deter-mining this same information might take weeks (or even months) of mostly manual work using traditional approaches

arXiv.org e-Print Archive

Crossref

Improved Architectures for Secure Intra-process Isolation

Author: Connor Richard J, III
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2021
Field of study

Intra-process memory isolation can improve security by enforcing least-privilege at a finer granularity than traditional operating system controls without the context-switch overhead associated with inter-process communication. Because the process has traditionally been a fundamental security boundary, assigning different levels of trust to components within a process is a fundamental change in secure systems design. However, so far there has been little research on the challenges of securely implementing intra-process isolation on top of existing operating system abstractions. We find that frequently-used assumptions in secure system design do not precisely hold under realistic conditions, and that these discrepancies lead to exploitable vulnerabilities. We evaluate two recently-proposed memory isolation systems and show that both are vulnerable to the same generic attacks that break their security model. We then extend a subset of these attacks by applying them to a fully-precise model of control-flow integrity, demonstrating a data-only attack that bypasses both static and dynamic control-flow integrity enforcement by overwriting executable code in-memory even under typical w^x assumptions. From these two results, we propose a set of kernel modifications called Xlock that systemically addresses weaknesses in memory permissions enforcement on Linux, bringing them into line with w^x assumptions. Finally, we present modifications to intra-process isolation systems that preserve efficient userspace component transitions while drastically reducing risk of accidental kernel mismanagement by modeling intra-process components as separate processes from the kernel\u27s perspective. Taken together, these mitigations represent a more robust architecture for efficient and secure intra-process isolation

University of Tennessee, Knoxville: Trace

A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing

Author: Ayachit U
Burstedde C
Carslaw HS
Cole KD
Ern A
Kaufman L
Kergaßner A
Lindgren LE
Mozaffar M
Schroeder WJ
Wohlers Associates Inc
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

This work introduces an innovative parallel, fully-distributed finite element framework for growing geometries and its application to metal additive manufacturing. It is well-known that virtual part design and qualification in additive manufacturing requires highly-accurate multiscale and multiphysics analyses. Only high performance computing tools are able to handle such complexity in time frames compatible with time-to-market. However, efficiency, without loss of accuracy, has rarely held the centre stage in the numerical community. Here, in contrast, the framework is designed to adequately exploit the resources of high-end distributed-memory machines. It is grounded on three building blocks: (1) Hierarchical adaptive mesh refinement with octree-based meshes; (2) a parallel strategy to model the growth of the geometry; (3) state-of-the-art parallel iterative linear solvers. Computational experiments consider the heat transfer analysis at the part scale of the printing process by powder-bed technologies. After verification against a 3D benchmark, a strong-scaling analysis assesses performance and identifies major sources of parallel overhead. A third numerical example examines the efficiency and robustness of (2) in a curved 3D shape. Unprecedented parallelism and scalability were achieved in this work. Hence, this framework contributes to take on higher complexity and/or accuracy, not only of part-scale simulations of metal or polymer additive manufacturing, but also in welding, sedimentation, atherosclerosis, or any other physical problem where the physical domain of interest grows in time

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

Scipedia

An integrated framework to support remote IEEE 1149.1 /1149.4 design for test experiments

Author: António M. Cardoso
José M. M. Ferreira
Manuel G. O. Gericota
Publication venue
Publication date: 01/01/2006
Field of study

Remote experiments for academic purposes can only achieve their educational goals if an appropriate framework is able to provide a basic set of features, namely remote laboratory management, collaborative learning tools and content management and delivery. This paper presents a framework developed to support remote experiments in a design for test class offered to final year students at the Electrical and Computer Engineering degree at the University of Porto. The proposed solution combines a test language command interpreter and various virtual instruments (VIs), with a demonstration board that comprises a boundary-scan IEEE 1149.1 / 1149.4 test infrastructure. The experiments are presented as embedded learning objects, with no distinction from other e-learning contents (e.g. lessons, lecture notes, etc.)

Directory of Open Access Journals

Repositório Aberto da Universidade do Porto

Online-Journals.org (International Association of Online Engineering)

Comprehensive analysis of high-performance computing methods for filtered back-projection

Author: Boulanger Pierre
Eliuk Steven
Mendl Christian B.
Noga Michelle
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2013
Field of study

This paper provides an extensive runtime, accuracy, and noise analysis of Computed To-mography (CT) reconstruction algorithms using various High-Performance Computing (HPC) frameworks such as: "conventional" multi-core, multi threaded CPUs, Compute Unified Device Architecture (CUDA), and DirectX or OpenGL graphics pipeline programming. The proposed algorithms exploit various built-in hardwired features of GPUs such as rasterization and texture filtering. We compare implementations of the Filtered Back-Projection (FBP) algorithm with fan-beam geometry for all frameworks. The accuracy of the reconstruction is validated using an ACR-accredited phantom, with the raw attenuation data acquired by a clinical CT scanner. Our analysis shows that a single GPU can run a FBP reconstruction 23 time faster than a 64-core multi-threaded CPU machine for an image of 1024 X 1024. Moreover, directly programming the graphics pipeline using DirectX or OpenGL can further increases the performance compared to a CUDA implementation

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Revistes Catalanes amb Accés Obert

Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)

Diposit Digital de Documents de la UAB