5,509 research outputs found
Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling
Though the GPGPU concept is well-known
in image processing, much more work remains to be done
to fully exploit GPUs as an alternative computation
engine. This paper investigates the computation-to-core
mapping strategies to probe the efficiency and scalability
of the robust facet image modeling algorithm on GPUs.
Our fine-grained computation-to-core mapping scheme
shows a significant performance gain over the standard
pixel-wise mapping scheme. With in-depth performance
comparisons across the two different mapping schemes,
we analyze the impact of the level of parallelism on
the GPU computation and suggest two principles for
optimizing future image processing applications on the
GPU platform
Packing Sporadic Real-Time Tasks on Identical Multiprocessor Systems
In real-time systems, in addition to the functional correctness recurrent
tasks must fulfill timing constraints to ensure the correct behavior of the
system. Partitioned scheduling is widely used in real-time systems, i.e., the
tasks are statically assigned onto processors while ensuring that all timing
constraints are met. The decision version of the problem, which is to check
whether the deadline constraints of tasks can be satisfied on a given number of
identical processors, has been known -complete in the strong sense.
Several studies on this problem are based on approximations involving resource
augmentation, i.e., speeding up individual processors. This paper studies
another type of resource augmentation by allocating additional processors, a
topic that has not been explored until recently. We provide polynomial-time
algorithms and analysis, in which the approximation factors are dependent upon
the input instances. Specifically, the factors are related to the maximum ratio
of the period to the relative deadline of a task in the given task set. We also
show that these algorithms unfortunately cannot achieve a constant
approximation factor for general cases. Furthermore, we prove that the problem
does not admit any asymptotic polynomial-time approximation scheme (APTAS)
unless when the task set has constrained deadlines, i.e.,
the relative deadline of a task is no more than the period of the task.Comment: Accepted and to appear in ISAAC 2018, Yi-Lan, Taiwa
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
Techniques for the realization of ultrareliable spaceborne computers Interim scientific report
Error-free ultrareliable spaceborne computer
On the periodic behavior of real-time schedulers on identical multiprocessor platforms
This paper is proposing a general periodicity result concerning any
deterministic and memoryless scheduling algorithm (including
non-work-conserving algorithms), for any context, on identical multiprocessor
platforms. By context we mean the hardware architecture (uniprocessor,
multicore), as well as task constraints like critical sections, precedence
constraints, self-suspension, etc. Since the result is based only on the
releases and deadlines, it is independent from any other parameter. Note that
we do not claim that the given interval is minimal, but it is an upper bound
for any cycle of any feasible schedule provided by any deterministic and
memoryless scheduler
Very Large-Scale Neighborhoods with Performance Guarantees for Minimizing Makespan on Parallel Machines
We study the problem of minimizing the makespan on m parallel machines. We introduce a very large-scale neighborhood of exponential size (in the number of machines) that is based on a matching in a complete graph. The idea is to partition the jobs assigned to the same machine into two sets. This partitioning is done for every machine with some chosen rule to receive 2m parts. A new assignment is received by putting to every machine exactly two parts. The neighborhood Nsplit consists of all possible rearrangements of the parts to the machines. The best assignment of Nsplit can be calculated in time O(mlogm) by determining the perfect matching having minimum maximal edge weight in an improvement graph, where the vertices correspond to parts and the weights on the edges correspond to the sum of the processing times of the jobs belonging to the parts. Additionally, we examine local optima in this neighborhood and in combinations with other neighborhoods. We derive performance guarantees for these local optima
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited
- …