Search CORE

7,034 research outputs found

Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices

Author: A Dziekonski
AV Knyazev
AV Knyazev
C Yang
G Ortega
JW Choi
M Knap
M Shao
P Maris
P Maris
X Yang
Y Wang
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Achieving high performance and performance portability for large-scale scientific applications is a major challenge on heterogeneous computing systems such as many-core CPUs and accelerators like GPUs. In this work, we implement a widely used block eigensolver, Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG), using two popular directive based programming models (OpenMP and OpenACC) for GPU-accelerated systems. Our work differs from existing work in that it adopts a holistic approach that optimizes the full solver performance rather than narrowing the problem into small kernels (e.g., SpMM, SpMV). Our LOPBCG GPU implementation achieves a 2.8

{\times }

–4.3

{\times }

speedup over an optimized CPU implementation when tested with four different input matrices. The evaluated configuration compared one Skylake CPU to one Skylake CPU and one NVIDIA V100 GPU. Our OpenMP and OpenACC LOBPCG GPU implementations gave nearly identical performance. We also consider how to create an efficient LOBPCG solver that can solve problems larger than GPU memory capacity. To this end, we create microbenchmarks representing the two dominant kernels (inner product and SpMM kernel) in LOBPCG and then evaluate performance when using two different programming approaches: tiling the kernels, and using Unified Memory with the original kernels. Our tiled SpMM implementation achieves a 2.9

{\times }

and 48.2

{\times }

speedup over the Unified Memory implementation on supercomputers with PCIe Gen3 and NVLink 2.0 CPU to GPU interconnects, respectively

Crossref

eScholarship - University of California

A formal verification framework and associated tools for enterprise modeling : application to UEML

Author: Allen
B. Kamsu-Foguem
Bernus
Booch
Bérard
Chapurlat
Chapurlat
Chen
Chen
Chen
Chen
Chrorafas
Doumeingts
F. Prunet
Fox
Jackson
Kamsu-Foguem
Kosanke
Mann
Manna
Menzel
Molina
Nijssen
Pearl
Revelle
Scheer
Schekkerman
Sowa
Sowa
Tixier
Uschold
V. Chapurlat
Van Lamsweerde
Vernadat
Vernadat
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

The aim of this paper is to propose and apply a verification and validation approach to Enterprise Modeling that enables the user to improve the relevance and correctness, the suitability and coherence of a model by using properties specification and formal proof of properties

Crossref

Open Archive Toulouse Archive Ouverte

HAL Descartes

Hal-Diderot

Castell: a heterogeneous cmp architecture scalable to hundreds of processors

Author: Cabarcas Jaramillo Felipe
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

Technology improvements and power constrains have taken multicore architectures to dominate microprocessor designs over uniprocessors. At the same time, accelerator based architectures have shown that heterogeneous multicores are very efficient and can provide high throughput for parallel applications, but with a high-programming effort. We propose Castell a scalable chip multiprocessor architecture that can be programmed as uniprocessors, and provides the high throughput of accelerator-based architectures. Castell relies on task-based programming models that simplify software development. These models use a runtime system that dynamically finds, schedules, and adds hardware-specific features to parallel tasks. One of these features is DMA transfers to overlap computation and data movement, which is known as double buffering. This feature allows applications on Castell to tolerate large memory latencies and lets us design the memory system focusing on memory bandwidth. In addition to provide programmability and the design of the memory system, we have used a hierarchical NoC and added a synchronization module. The NoC design distributes memory traffic efficiently to allow the architecture to scale. The synchronization module is a consequence of the large performance degradation of application for large synchronization latencies. Castell is mainly an architecture framework that enables the definition of domain-specific implementations, fine-tuned to a particular problem or application. So far, Castell has been successfully used to propose heterogeneous multicore architectures for scientific kernels, video decoding (using H.264), and protein sequence alignment (using Smith-Waterman and clustalW). It has also been used to explore a number of architecture optimizations such as enhanced DMA controllers, and architecture support for task-based programming models. ii

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

The complexity of asynchronous model based testing

Author: AboElFotoh
Aho
Aho
Barnett
Bhateja
Boreale
Boreale
Castellani
Chen
Chen
Chen
Chow
da Silva Simão
Farchi
Grieskamp
Grieskamp
Haar
Harel
Hierons
Hierons
Hierons
Hierons
Hierons
Hierons
Huo
Huo
Jard
Jard
Jefferson Offutt
Lee
Lee
Luo
Owe
Petrenko
Petrenko
Petrenko
Rabin
Reif
Robert M. Hierons
Tarjan
Verhaard
von Bochmann
Weiglhofer
Weyuker
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/09/2012
Field of study

This is the post-print version of the final paper published in Theoretical Computer Science. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2012 Elsevier B.V.In model based testing (MBT), testing is based on a model MM that typically is expressed using a state-based language such as an input output transition system (IOTS). Most approaches to MBT assume that communications between the system under test (SUT) and its environment are synchronous. However, many systems interact with their environment through asynchronous channels and the presence of such channels changes the nature of testing. In this paper we investigate the situation in which the SUT interacts with its environment through asynchronous channels and the problems of producing test cases to reach a state, execute a transition, or to distinguish two states. In addition, we investigate the Oracle Problem. All four problems are explored for both FIFO and non-FIFO channels. It is known that the Oracle Problem can be solved in polynomial time for FIFO channels but we also show that the three test case generation problems can also be solved in polynomial time in the case where the IOTS is observable but the general test generation problems are EXPTIME-hard. For non-FIFO channels we prove that all of the test case generation problems are EXPTIME-hard and the Oracle Problem in NP-hard, even if we restrict attention to deterministic IOTSs

Crossref

Elsevier - Publisher Connector

Brunel University Research Archive

Aggregation of Descriptive Regularization Methods with Hardware/Software Co-Design for Remote Sensing Imaging

Author: Castillo-Atoche Alejandro
Villalón-Turrubiates Iván E.
Vázquez-Castillo Javier
Publication venue: Nova Science Publishers, Inc.
Publication date: 01/01/2010
Field of study

This study consider the problem of high-resolution imaging of the remote sensing (RS) environment formalized in terms of a nonlinear ill- posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the wavefield scattered from an extended remotely sensed scene (referred to as the scene image). However, the remote sensing techniques for reconstructive imaging in many RS application areas are relatively unacceptable for being implemented in a (near) real time implementation. In this work, we address a new aggregated descriptive-regularization (DR) method and the Hardware/Software (HW/SW) co-design for the SSP reconstruction from the uncertain speckle-corrupted measurement data in a computationally efficient parallel fashion that meets the (near) real time image processing requirements. The hardware design is performed via efficient systolic arrays (SAs). Finally, the efficiency both in resolution enhancement and in computational complexity reduction metrics of the aggregated descriptive-regularized and the HW/SW co-design method is presented via numerical simulations and by the performance analysis of the implementation based on a Xilinx Field Programmable Gate Array (FPGA) XC4VSX35-10ff668.Universidad de GuadalajaraUniversidad Autónoma de YucatánInstituto Tecnológico de Mérid

Repositorio Institucional del ITESO

Towards the use of sequence diagrams as a learning aid

Author: Barros João Paulo
Biscaia Luís
Vitória Miguel
Publication venue
Publication date: 01/01/2011
Field of study

Comunicação apresentada no "Sixth Program Visualization Workshop", Darmstadt (Alemanha), Junho de 2011Compared to imperative programing, object-oriented programming brings additional complexities. These complexities are especially challenging for the novice and, as a consequence to the teacher. Hence, it is no surprise that the teaching and learning of object-oriented programming is an extremely popular topic in computer science education research. This work in progress paper presents the objectives and structure of a tool under development for novice object-oriented programmers that intends to ease code understanding. That is accomplished through the use of sequence diagrams, one of the most popular behavior diagrams in the Unified Modeling Language (OMG, 2011), the de facto standard for object- oriented modelling. More specifically, the tool allows the generation of execution traces as sequence diagrams: for a given program run, the student is able to visualize the respective execution as a sequence diagram. Next, we present the Java2Sequence tool

Repositório Comum

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Repositório Digital IPBeja

Autonomic Computing

Author: Sterritt R
Publication venue
Publication date: 01/01/2005
Field of study

Autonomic computing (AC) has as its vision the creation of self-managing systems to address today’s con-cerns of complexity and total cost of ownership while meeting tomorrow’s needs for pervasive and ubiquitous computation and communication. This paper reports on the latest auto-nomic systems research and technologies to influence the industry; it looks behind AC, summarising what it is, the current state-of-the-art research, related work and initiatives, highlights research and technology transfer issues and concludes with further and recommended reading

CiteSeerX

Ulster University's Research Portal