348 research outputs found
Simultaneous Multithreading Applied to Real Time (Artifact)
Existing models used in real-time scheduling are inadequate to take advantage of simultaneous multithreading (SMT), which has been shown to improve performance in many areas of computing, but has seen little application to real-time systems. The SMART task model, which allows for combining SMT and real time by accounting for the variable task execution costs caused by SMT, is introduced, along with methods and conditions for scheduling SMT tasks under global earliest-deadline-first scheduling. The benefits of using SMT are demonstrated through a large-scale schedulability study in which we show that task systems with utilizations 30% larger than what would be schedulable without SMT can be correctly scheduled. This artifact includes benchmark experiments used to compare execution times with and without SMT and code to duplicate the reported schedulability experiments
Simultaneous Multithreading Applied to Real Time
Existing models used in real-time scheduling are inadequate to take advantage of simultaneous multithreading (SMT), which has been shown to improve performance in many areas of computing, but has seen little application to real-time systems. The SMART task model, which allows for combining SMT and real time by accounting for the variable task execution costs caused by SMT, is introduced, along with methods and conditions for scheduling SMT tasks under global earliest-deadline-first scheduling. The benefits of using SMT are demonstrated through a large-scale schedulability study in which we show that task systems with utilizations 30% larger than what would be schedulable without SMT can be correctly scheduled
Simultaneous Multithreading and Hard Real Time: Can It Be Safe?
The applicability of Simultaneous Multithreading (SMT) to real-time systems has been hampered by the difficulty of obtaining reliable execution costs in an SMT-enabled system. This problem is addressed by introducing a scheduling framework, called CERT-MT, that combines scheduling-aware timing analysis with a cyclic-executive scheduler in a way that minimizes SMT-related timing variations. The proposed scheduling-aware timing analysis is based on maximum observed execution times and accounts for the uncertainty inherent in measurement-based timing analysis. The timing analysis is found to work for tasks with and without SMT, though some adjustments are required in the former case. A large-scale schedulability study is presented that shows CERT-MT can schedule systems with total utilizations approaching 1.4 times the core count, without sacrificing safety
Spatio-temporal wavelet regularization for parallel MRI reconstruction: application to functional MRI
Parallel MRI is a fast imaging technique that enables the acquisition of
highly resolved images in space or/and in time. The performance of parallel
imaging strongly depends on the reconstruction algorithm, which can proceed
either in the original k-space (GRAPPA, SMASH) or in the image domain
(SENSE-like methods). To improve the performance of the widely used SENSE
algorithm, 2D- or slice-specific regularization in the wavelet domain has been
deeply investigated. In this paper, we extend this approach using 3D-wavelet
representations in order to handle all slices together and address
reconstruction artifacts which propagate across adjacent slices. The gain
induced by such extension (3D-Unconstrained Wavelet Regularized -SENSE:
3D-UWR-SENSE) is validated on anatomical image reconstruction where no temporal
acquisition is considered. Another important extension accounts for temporal
correlations that exist between successive scans in functional MRI (fMRI). In
addition to the case of 2D+t acquisition schemes addressed by some other
methods like kt-FOCUSS, our approach allows us to deal with 3D+t acquisition
schemes which are widely used in neuroimaging. The resulting 3D-UWR-SENSE and
4D-UWR-SENSE reconstruction schemes are fully unsupervised in the sense that
all regularization parameters are estimated in the maximum likelihood sense on
a reference scan. The gain induced by such extensions is illustrated on both
anatomical and functional image reconstruction, and also measured in terms of
statistical sensitivity for the 4D-UWR-SENSE approach during a fast
event-related fMRI protocol. Our 4D-UWR-SENSE algorithm outperforms the SENSE
reconstruction at the subject and group levels (15 subjects) for different
contrasts of interest (eg, motor or computation tasks) and using different
parallel acceleration factors (R=2 and R=4) on 2x2x3mm3 EPI images.Comment: arXiv admin note: substantial text overlap with arXiv:1103.353
Automated Experiments for Deriving Performance-relevant Properties of Software Execution Environments
The execution environment can play a crucial role when analyzing the performance of a software system. However, detecting execution environment properties and integrating such properties into performance analyses is a manual, error-prone task. In this thesis, a novel approach for detecting performance-relevant properties of the software execution environment is presented. These properties are automatically detected using predefined experiments and integrated into performance prediction tools
DSPSR: Digital Signal Processing Software for Pulsar Astronomy
DSPSR is a high-performance, open-source, object-oriented, digital signal
processing software library and application suite for use in radio pulsar
astronomy. Written primarily in C++, the library implements an extensive range
of modular algorithms that can optionally exploit both multiple-core processors
and general-purpose graphics processing units. After over a decade of research
and development, DSPSR is now stable and in widespread use in the community.
This paper presents a detailed description of its functionality, justification
of major design decisions, analysis of phase-coherent dispersion removal
algorithms, and demonstration of performance on some contemporary
microprocessor architectures.Comment: 15 pages, 10 figures, to be published in PAS
Design and validation of a simultaneous multi-threaded DLX processor
technical reportModern day computer systems rely on two forms of parallelism to achieve high performance, parallelism between individual instructions of a program (ILP) and parallelism between individual threads (TLP). Superscalar processors exploit ILP by issuing several instructions per clock, and multiprocessors (MP) exploit TLP by running different threads in parallel on different processors. A fundamental imitation of these approaches to exploit parallelism is that processor resources are statically partitioned. If TLP is low, processors in a MP system will be idle, and if ILP is low, issue slots in a superscalar processor will be wasted. As a consequence, the hardware cannot adapt to changing levels of ILP and TLP and resource utilization tend to be low. Since resource utilization is low there is potential to achieve higher performance if somehow useful instructions could be found to fill up the wasted issue slots. This paper explores a method called simultaneous multithreading (SMT) that addresses the utilization problem by letting multiple threads compete for the resources of a single processor each clock cycle thus increasing the potential ILP available
An application of parallel computation to Collaborative Optimization
Multidisciplinary Design Optimization (MDO) has evolved as a discipline which provides a body of methods and techniques to assist engineers in solving large scale design problems. There are many frameworks for formulating MDO problems. These frameworks can be broadly classified as single-level or bi-level formulations. Collaborative Optimization (CO) is one of the popular bi-level formulations to solve an MDO problem. There are numerous design optimization problems which are highly CPU time intensive and require a long simulation time. With the advent of cheaper and faster available PC’s, distributed parallel computer clusters have become very popular. These clusters provide large computing power and can be used to solve problems faster and more efficiently. This research is an attempt to take advantage of the computational power of parallel computers in the field of design Optimization. The robust design optimization of an Internal Combustion Engine has been formulated using CO and implemented using parallel computers. Considerable savings in Wall Time has been achieved. A generic strategy for solving similar problems has also been devised. A benchmarking program has also been developed to assess theoretical speedup for any problem size. This program uses the Collaborative Optimization framework and simulates a design optimization on distributed memory clusters
- …