187 research outputs found

    Optimizing compilation with preservation of structural code coverage metrics to support software testing

    Get PDF
    Code-coverage-based testing is a widely-used testing strategy with the aim of providing a meaningful decision criterion for the adequacy of a test suite. Code-coverage-based testing is also mandated for the development of safety-critical applications; for example, the DO178b document requires the application of the modified condition/decision coverage. One critical issue of code-coverage testing is that structural code coverage criteria are typically applied to source code whereas the generated machine code may result in a different code structure because of code optimizations performed by a compiler. In this work, we present the automatic calculation of coverage profiles describing which structural code-coverage criteria are preserved by which code optimization, independently of the concrete test suite. These coverage profiles allow to easily extend compilers with the feature of preserving any given code-coverage criteria by enabling only those code optimizations that preserve it. Furthermore, we describe the integration of these coverage profile into the compiler GCC. With these coverage profiles, we answer the question of how much code optimization is possible without compromising the error-detection likelihood of a given test suite. Experimental results conclude that the performance cost to achieve preservation of structural code coverage in GCC is rather low.Peer reviewedSubmitted Versio

    pocl: A Performance-Portable OpenCL Implementation

    Get PDF
    OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. Our results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary OpenCL implementation for the platform at hand.Comment: This article was published in 2015; it is now openly accessible via arxi

    Evolutionary Algorithms for

    Get PDF
    Many real-world problems involve two types of problem difficulty: i) multiple, conflicting objectives and ii) a highly complex search space. On the one hand, instead of a single optimal solution competing goals give rise to a set of compromise solutions, generally denoted as Pareto-optimal. In the absence of preference information, none of the corresponding trade-offs can be said to be better than the others. On the other hand, the search space can be too large and too complex to be solved by exact methods. Thus, efficient optimization strategies are required that are able to deal with both difficulties. Evolutionary algorithms possess several characteristics that are desirable for this kind of problem and make them preferable to classical optimization methods. In fact, various evolutionary approaches to multiobjective optimization have been proposed since 1985, capable of searching for multiple Paretooptimal solutions concurrently in a single simulation run. However, in spite of this variety, there is a lack of extensive comparative studies in the literature. Therefore, it has remained open up to now

    The SeaHorn Verification Framework

    Get PDF
    In this paper, we present SeaHorn, a software verification framework. The key distinguishing feature of SeaHorn is its modular design that separates the concerns of the syntax of the programming language, its operational semantics, and the verification semantics. SeaHorn encompasses several novelties: it (a) encodes verification conditions using an efficient yet precise inter-procedural technique, (b) provides flexibility in the verification semantics to allow different levels of precision, (c) leverages the state-of-the-art in software model checking and abstract interpretation for verification, and (d) uses Horn-clauses as an intermediate language to represent verification conditions which simplifies interfacing with multiple verification tools based on Horn-clauses. SeaHorn provides users with a powerful verification tool and researchers with an extensible and customizable framework for experimenting with new software verification techniques. The effectiveness and scalability of SeaHorn are demonstrated by an extensive experimental evaluation using benchmarks from SV-COMP 2015 and real avionics code

    Managing Overheads in Asynchronous Many-Task Runtime Systems

    Get PDF
    Asynchronous Many-Task (AMT) runtime systems are based on the idea of dividing an algorithm into small units of work, known as tasks. The runtime system is then responsible for scheduling and executing these tasks in an efficient manner by taking into account the resources provided to it and the associated data dependencies between the tasks. One of the primary challenges faced by AMTs is managing such fine-grained parallelism and the overheads associated with creating, scheduling and executing tasks. This work develops methodologies for assessing and managing overheads associated with fine-grained task execution in HPX, our exemplar Asynchronous Many-Task runtime system. Known optimization techniques, viz. active message coalescing, task inlining and parallel loop iteration chunking are applied to HPX. Active message coalescing, where messages bound to the same destination are aggregated into a single message, is presented as a solution to minimize overheads associated with fine-grained communications. Methodologies and metrics for analyzing fine-grained communication overheads are developed. The metrics identified and implemented in this research aid in evaluating network efficiency by giving us an intrinsic view of the underlying network overhead that would be difficult to measure using conventional methods. Task inlining, a method that allows runtime systems to manage the overheads introduced by a large number of tasks by merging tasks together into one thread of execution, is presented as a technique for minimizing fine-grained task overheads. A runtime policy that dynamically decides whether to inline a task is developed and evaluated on different processor architectures. A methodology to derive a largely machine independent constant that allows controlling task granularity is developed. Finally, the machine independent constant derived in the context of task inlining is applied to chunking of parallel loop iterations, which confirms its applicability to reduce overheads, in the context of finding the optimal chunk size of the combined loop iterations

    Implementation And Optimizaton Of Real-time H.264 Baseline Encoder On Tms320dm642 Dsp

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2007Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2007Günümüzde sayısal video kodlama sayısal gözetim sistemleri, video konferans, mobil uygulamalar ve video yayını gibi bir çok uygulamada zorunlu hale gelmiştir. Uluslararası bir video sıkıştırma standardı olan H.264/MPEG-4 bölüm 10, daha önceki standartlara göre kodlama verimini iyileştirmek amacıyla geliştirilmiştir. Fakat, bu kodlama geliştirmesi beraberinde kodlama karmaşıklığının da artmasına yol açmaktadır. Bu tez çalışmasında Texas Instruments TMS320DM642 sayısal sinyal işleyici üzerinde H.264 temel profil kodlayıcı gerçeklenmiştir. DM642 DSP çekirdeği üzerindeki gerçek zamanlı H.264/AVC kodlayıcı uygulaması hata esnekliği araçları ve çeyrek piksel hareket dengeleme dışında standart tüm H.264/AVC temel profil kodlama araçlarını sunmaktadır. Çeyrek piksel hareket dengelem yerine, tüm parlaklılık ve renklik bileşenleri için tam sayı ve yarım piksel pozisyonlarında hareket kestirim ve dengeleme gerçeklenmiştir. Kullanılan DM642 DSP çekirdeği platformu, 2-seviyeli bellek/önbellek aşama düzenine sahip ve VLIW içeren yüksek performanslı sayısal işlemci olarak tasarlanmıştır. Sunulan H.264 temel kodlayıcı sistemin gerçeklenmesi ve eniyilemesi bu tezin konusudur. Üstelik, algoritma bazlı, mimari ve bellek stratejilerini içeren eniyileme çalışma fazları detaylarıyla açıklanmaktadır. H.264/AVC video kodlayıcının hem geliştirme ortamında hem de DM642 EVM donanım ortamında çalışması doğrulanmıştır. Kısaca, kodlayıcı sisteme giriş olan CIF çözünürlükte sıkıştırılmamış YUV video dizisi H.264 Annex-B dosya biçiminde ve de ekrana video çıktı verilerek sıkıştırılmaktadır. Ek olarak, kodlayıcı çıktısı H.264 referans yazılımla doğruluğu kontrol edilmiş ve uyumluluğu kanıtlanmıştır.Recently, digital video coding is mandatory in many applications such as digital surveillance systems, video conferencing, mobile applications as well as video broadcasts. The H.264/MPEG-4 Part 10, an international video compression standard, is developed for improving the coding efficiency compared to previous standards. However, the coding improvement comes with an increase in coding complexity. In this thesis, an H.264 baseline profile encoder is implemented on Texas Instruments TMS320DM642 digital signal processor. The real-time implementation of the H.264/AVC encoder on DM642 DSP core offers most of the standard H.264/AVC baseline profile coding tools except error resiliency tools and quarter-pel motion estimation. Instead of quarter-pel motion compensation, integer and half pixel position motion estimation and compensation for all luminance and chrominance components are implemented. The target platform, DM64 DSP core, is designed as a high-performance digital media processor with two-level memory/cache hierarchy and VLIW architecture. The subject of the thesis is H.264 baseline encoder system realization and optimization on the target platform. Moreover, the study of optimization phases covering algorithmic, architectural and memory strategies are clarified in details. The H.264/AVC encoder system is verified both to execute on the development workstation and DM642 EVM (Evaluation Module) hardware platform. Briefly, the uncompressed input of a YUV video sequence with CIF resolution to the encoder system is compressed to H.264 Annex-B file format and displayed on screen. Additionally, the encoder output is verified with H.264 reference software and the compliancy is proven.Yüksek LisansM.Sc

    A Static Analyzer for Large Safety-Critical Software

    Get PDF
    We show that abstract interpretation-based static program analysis can be made efficient and precise enough to formally verify a class of properties for a family of large programs with few or no false alarms. This is achieved by refinement of a general purpose static analyzer and later adaptation to particular programs of the family by the end-user through parametrization. This is applied to the proof of soundness of data manipulation operations at the machine level for periodic synchronous safety critical embedded software. The main novelties are the design principle of static analyzers by refinement and adaptation through parametrization, the symbolic manipulation of expressions to improve the precision of abstract transfer functions, the octagon, ellipsoid, and decision tree abstract domains, all with sound handling of rounding errors in floating point computations, widening strategies (with thresholds, delayed) and the automatic determination of the parameters (parametrized packing)

    Petascale computations for Large-scale Atomic and Molecular collisions

    Full text link
    Petaflop architectures are currently being utilized efficiently to perform large scale computations in Atomic, Molecular and Optical Collisions. We solve the Schroedinger or Dirac equation for the appropriate collision problem using the R-matrix or R-matrix with pseudo-states approach. We briefly outline the parallel methodology used and implemented for the current suite of Breit-Pauli and DARC codes. Various examples are shown of our theoretical results compared with those obtained from Synchrotron Radiation facilities and from Satellite observations. We also indicate future directions and implementation of the R-matrix codes on emerging GPU architectures.Comment: 14 pages, 5 figures, 3 tables, Chapter in: Workshop on Sustained Simulated Performance 2013, Published by Springer, 2014, edited by Michael Resch, Yevgeniya Kovalenko, Eric Focht, Wolfgang Bez and Hiroaki Kobaysah
    corecore