348 research outputs found

    Simultaneous Multithreading Applied to Real Time (Artifact)

    Get PDF
    Existing models used in real-time scheduling are inadequate to take advantage of simultaneous multithreading (SMT), which has been shown to improve performance in many areas of computing, but has seen little application to real-time systems. The SMART task model, which allows for combining SMT and real time by accounting for the variable task execution costs caused by SMT, is introduced, along with methods and conditions for scheduling SMT tasks under global earliest-deadline-first scheduling. The benefits of using SMT are demonstrated through a large-scale schedulability study in which we show that task systems with utilizations 30% larger than what would be schedulable without SMT can be correctly scheduled. This artifact includes benchmark experiments used to compare execution times with and without SMT and code to duplicate the reported schedulability experiments

    Simultaneous Multithreading and Hard Real Time: Can it be Safe? (Artifact)

    Get PDF

    Simultaneous Multithreading Applied to Real Time

    Get PDF
    Existing models used in real-time scheduling are inadequate to take advantage of simultaneous multithreading (SMT), which has been shown to improve performance in many areas of computing, but has seen little application to real-time systems. The SMART task model, which allows for combining SMT and real time by accounting for the variable task execution costs caused by SMT, is introduced, along with methods and conditions for scheduling SMT tasks under global earliest-deadline-first scheduling. The benefits of using SMT are demonstrated through a large-scale schedulability study in which we show that task systems with utilizations 30% larger than what would be schedulable without SMT can be correctly scheduled

    Simultaneous Multithreading and Hard Real Time: Can It Be Safe?

    Get PDF
    The applicability of Simultaneous Multithreading (SMT) to real-time systems has been hampered by the difficulty of obtaining reliable execution costs in an SMT-enabled system. This problem is addressed by introducing a scheduling framework, called CERT-MT, that combines scheduling-aware timing analysis with a cyclic-executive scheduler in a way that minimizes SMT-related timing variations. The proposed scheduling-aware timing analysis is based on maximum observed execution times and accounts for the uncertainty inherent in measurement-based timing analysis. The timing analysis is found to work for tasks with and without SMT, though some adjustments are required in the former case. A large-scale schedulability study is presented that shows CERT-MT can schedule systems with total utilizations approaching 1.4 times the core count, without sacrificing safety

    Spatio-temporal wavelet regularization for parallel MRI reconstruction: application to functional MRI

    Get PDF
    Parallel MRI is a fast imaging technique that enables the acquisition of highly resolved images in space or/and in time. The performance of parallel imaging strongly depends on the reconstruction algorithm, which can proceed either in the original k-space (GRAPPA, SMASH) or in the image domain (SENSE-like methods). To improve the performance of the widely used SENSE algorithm, 2D- or slice-specific regularization in the wavelet domain has been deeply investigated. In this paper, we extend this approach using 3D-wavelet representations in order to handle all slices together and address reconstruction artifacts which propagate across adjacent slices. The gain induced by such extension (3D-Unconstrained Wavelet Regularized -SENSE: 3D-UWR-SENSE) is validated on anatomical image reconstruction where no temporal acquisition is considered. Another important extension accounts for temporal correlations that exist between successive scans in functional MRI (fMRI). In addition to the case of 2D+t acquisition schemes addressed by some other methods like kt-FOCUSS, our approach allows us to deal with 3D+t acquisition schemes which are widely used in neuroimaging. The resulting 3D-UWR-SENSE and 4D-UWR-SENSE reconstruction schemes are fully unsupervised in the sense that all regularization parameters are estimated in the maximum likelihood sense on a reference scan. The gain induced by such extensions is illustrated on both anatomical and functional image reconstruction, and also measured in terms of statistical sensitivity for the 4D-UWR-SENSE approach during a fast event-related fMRI protocol. Our 4D-UWR-SENSE algorithm outperforms the SENSE reconstruction at the subject and group levels (15 subjects) for different contrasts of interest (eg, motor or computation tasks) and using different parallel acceleration factors (R=2 and R=4) on 2x2x3mm3 EPI images.Comment: arXiv admin note: substantial text overlap with arXiv:1103.353

    Automated Experiments for Deriving Performance-relevant Properties of Software Execution Environments

    Get PDF
    The execution environment can play a crucial role when analyzing the performance of a software system. However, detecting execution environment properties and integrating such properties into performance analyses is a manual, error-prone task. In this thesis, a novel approach for detecting performance-relevant properties of the software execution environment is presented. These properties are automatically detected using predefined experiments and integrated into performance prediction tools

    DSPSR: Digital Signal Processing Software for Pulsar Astronomy

    Full text link
    DSPSR is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, DSPSR is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.Comment: 15 pages, 10 figures, to be published in PAS

    Design and validation of a simultaneous multi-threaded DLX processor

    Get PDF
    technical reportModern day computer systems rely on two forms of parallelism to achieve high performance, parallelism between individual instructions of a program (ILP) and parallelism between individual threads (TLP). Superscalar processors exploit ILP by issuing several instructions per clock, and multiprocessors (MP) exploit TLP by running different threads in parallel on different processors. A fundamental imitation of these approaches to exploit parallelism is that processor resources are statically partitioned. If TLP is low, processors in a MP system will be idle, and if ILP is low, issue slots in a superscalar processor will be wasted. As a consequence, the hardware cannot adapt to changing levels of ILP and TLP and resource utilization tend to be low. Since resource utilization is low there is potential to achieve higher performance if somehow useful instructions could be found to fill up the wasted issue slots. This paper explores a method called simultaneous multithreading (SMT) that addresses the utilization problem by letting multiple threads compete for the resources of a single processor each clock cycle thus increasing the potential ILP available

    An application of parallel computation to Collaborative Optimization

    Get PDF
    Multidisciplinary Design Optimization (MDO) has evolved as a discipline which provides a body of methods and techniques to assist engineers in solving large scale design problems. There are many frameworks for formulating MDO problems. These frameworks can be broadly classified as single-level or bi-level formulations. Collaborative Optimization (CO) is one of the popular bi-level formulations to solve an MDO problem. There are numerous design optimization problems which are highly CPU time intensive and require a long simulation time. With the advent of cheaper and faster available PC’s, distributed parallel computer clusters have become very popular. These clusters provide large computing power and can be used to solve problems faster and more efficiently. This research is an attempt to take advantage of the computational power of parallel computers in the field of design Optimization. The robust design optimization of an Internal Combustion Engine has been formulated using CO and implemented using parallel computers. Considerable savings in Wall Time has been achieved. A generic strategy for solving similar problems has also been devised. A benchmarking program has also been developed to assess theoretical speedup for any problem size. This program uses the Collaborative Optimization framework and simulates a design optimization on distributed memory clusters
    • …
    corecore