7,034 research outputs found

    Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices

    Get PDF
    Achieving high performance and performance portability for large-scale scientific applications is a major challenge on heterogeneous computing systems such as many-core CPUs and accelerators like GPUs. In this work, we implement a widely used block eigensolver, Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG), using two popular directive based programming models (OpenMP and OpenACC) for GPU-accelerated systems. Our work differs from existing work in that it adopts a holistic approach that optimizes the full solver performance rather than narrowing the problem into small kernels (e.g., SpMM, SpMV). Our LOPBCG GPU implementation achieves a 2.8×{\times }–4.3×{\times } speedup over an optimized CPU implementation when tested with four different input matrices. The evaluated configuration compared one Skylake CPU to one Skylake CPU and one NVIDIA V100 GPU. Our OpenMP and OpenACC LOBPCG GPU implementations gave nearly identical performance. We also consider how to create an efficient LOBPCG solver that can solve problems larger than GPU memory capacity. To this end, we create microbenchmarks representing the two dominant kernels (inner product and SpMM kernel) in LOBPCG and then evaluate performance when using two different programming approaches: tiling the kernels, and using Unified Memory with the original kernels. Our tiled SpMM implementation achieves a 2.9×{\times } and 48.2×{\times } speedup over the Unified Memory implementation on supercomputers with PCIe Gen3 and NVLink 2.0 CPU to GPU interconnects, respectively

    A formal verification framework and associated tools for enterprise modeling : application to UEML

    Get PDF
    The aim of this paper is to propose and apply a verification and validation approach to Enterprise Modeling that enables the user to improve the relevance and correctness, the suitability and coherence of a model by using properties specification and formal proof of properties

    Castell: a heterogeneous cmp architecture scalable to hundreds of processors

    Get PDF
    Technology improvements and power constrains have taken multicore architectures to dominate microprocessor designs over uniprocessors. At the same time, accelerator based architectures have shown that heterogeneous multicores are very efficient and can provide high throughput for parallel applications, but with a high-programming effort. We propose Castell a scalable chip multiprocessor architecture that can be programmed as uniprocessors, and provides the high throughput of accelerator-based architectures. Castell relies on task-based programming models that simplify software development. These models use a runtime system that dynamically finds, schedules, and adds hardware-specific features to parallel tasks. One of these features is DMA transfers to overlap computation and data movement, which is known as double buffering. This feature allows applications on Castell to tolerate large memory latencies and lets us design the memory system focusing on memory bandwidth. In addition to provide programmability and the design of the memory system, we have used a hierarchical NoC and added a synchronization module. The NoC design distributes memory traffic efficiently to allow the architecture to scale. The synchronization module is a consequence of the large performance degradation of application for large synchronization latencies. Castell is mainly an architecture framework that enables the definition of domain-specific implementations, fine-tuned to a particular problem or application. So far, Castell has been successfully used to propose heterogeneous multicore architectures for scientific kernels, video decoding (using H.264), and protein sequence alignment (using Smith-Waterman and clustalW). It has also been used to explore a number of architecture optimizations such as enhanced DMA controllers, and architecture support for task-based programming models. ii

    The complexity of asynchronous model based testing

    Get PDF
    This is the post-print version of the final paper published in Theoretical Computer Science. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2012 Elsevier B.V.In model based testing (MBT), testing is based on a model MM that typically is expressed using a state-based language such as an input output transition system (IOTS). Most approaches to MBT assume that communications between the system under test (SUT) and its environment are synchronous. However, many systems interact with their environment through asynchronous channels and the presence of such channels changes the nature of testing. In this paper we investigate the situation in which the SUT interacts with its environment through asynchronous channels and the problems of producing test cases to reach a state, execute a transition, or to distinguish two states. In addition, we investigate the Oracle Problem. All four problems are explored for both FIFO and non-FIFO channels. It is known that the Oracle Problem can be solved in polynomial time for FIFO channels but we also show that the three test case generation problems can also be solved in polynomial time in the case where the IOTS is observable but the general test generation problems are EXPTIME-hard. For non-FIFO channels we prove that all of the test case generation problems are EXPTIME-hard and the Oracle Problem in NP-hard, even if we restrict attention to deterministic IOTSs

    Aggregation of Descriptive Regularization Methods with Hardware/Software Co-Design for Remote Sensing Imaging

    Get PDF
    This study consider the problem of high-resolution imaging of the remote sensing (RS) environment formalized in terms of a nonlinear ill- posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the wavefield scattered from an extended remotely sensed scene (referred to as the scene image). However, the remote sensing techniques for reconstructive imaging in many RS application areas are relatively unacceptable for being implemented in a (near) real time implementation. In this work, we address a new aggregated descriptive-regularization (DR) method and the Hardware/Software (HW/SW) co-design for the SSP reconstruction from the uncertain speckle-corrupted measurement data in a computationally efficient parallel fashion that meets the (near) real time image processing requirements. The hardware design is performed via efficient systolic arrays (SAs). Finally, the efficiency both in resolution enhancement and in computational complexity reduction metrics of the aggregated descriptive-regularized and the HW/SW co-design method is presented via numerical simulations and by the performance analysis of the implementation based on a Xilinx Field Programmable Gate Array (FPGA) XC4VSX35-10ff668.Universidad de GuadalajaraUniversidad Autónoma de YucatánInstituto Tecnológico de Mérid

    Towards the use of sequence diagrams as a learning aid

    Get PDF
    Comunicação apresentada no "Sixth Program Visualization Workshop", Darmstadt (Alemanha), Junho de 2011Compared to imperative programing, object-oriented programming brings additional complexities. These complexities are especially challenging for the novice and, as a consequence to the teacher. Hence, it is no surprise that the teaching and learning of object-oriented programming is an extremely popular topic in computer science education research. This work in progress paper presents the objectives and structure of a tool under development for novice object-oriented programmers that intends to ease code understanding. That is accomplished through the use of sequence diagrams, one of the most popular behavior diagrams in the Unified Modeling Language (OMG, 2011), the de facto standard for object- oriented modelling. More specifically, the tool allows the generation of execution traces as sequence diagrams: for a given program run, the student is able to visualize the respective execution as a sequence diagram. Next, we present the Java2Sequence tool

    Autonomic Computing

    Get PDF
    Autonomic computing (AC) has as its vision the creation of self-managing systems to address today’s con-cerns of complexity and total cost of ownership while meeting tomorrow’s needs for pervasive and ubiquitous computation and communication. This paper reports on the latest auto-nomic systems research and technologies to influence the industry; it looks behind AC, summarising what it is, the current state-of-the-art research, related work and initiatives, highlights research and technology transfer issues and concludes with further and recommended reading
    corecore