5,509 research outputs found

    Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling

    Get PDF
    Though the GPGPU concept is well-known in image processing, much more work remains to be done to fully exploit GPUs as an alternative computation engine. This paper investigates the computation-to-core mapping strategies to probe the efficiency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme shows a significant performance gain over the standard pixel-wise mapping scheme. With in-depth performance comparisons across the two different mapping schemes, we analyze the impact of the level of parallelism on the GPU computation and suggest two principles for optimizing future image processing applications on the GPU platform

    Packing Sporadic Real-Time Tasks on Identical Multiprocessor Systems

    Get PDF
    In real-time systems, in addition to the functional correctness recurrent tasks must fulfill timing constraints to ensure the correct behavior of the system. Partitioned scheduling is widely used in real-time systems, i.e., the tasks are statically assigned onto processors while ensuring that all timing constraints are met. The decision version of the problem, which is to check whether the deadline constraints of tasks can be satisfied on a given number of identical processors, has been known NP{\cal NP}-complete in the strong sense. Several studies on this problem are based on approximations involving resource augmentation, i.e., speeding up individual processors. This paper studies another type of resource augmentation by allocating additional processors, a topic that has not been explored until recently. We provide polynomial-time algorithms and analysis, in which the approximation factors are dependent upon the input instances. Specifically, the factors are related to the maximum ratio of the period to the relative deadline of a task in the given task set. We also show that these algorithms unfortunately cannot achieve a constant approximation factor for general cases. Furthermore, we prove that the problem does not admit any asymptotic polynomial-time approximation scheme (APTAS) unless P=NP{\cal P}={\cal NP} when the task set has constrained deadlines, i.e., the relative deadline of a task is no more than the period of the task.Comment: Accepted and to appear in ISAAC 2018, Yi-Lan, Taiwa

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Parallel processing and expert systems

    Get PDF
    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited

    A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

    Full text link
    Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

    On the periodic behavior of real-time schedulers on identical multiprocessor platforms

    Full text link
    This paper is proposing a general periodicity result concerning any deterministic and memoryless scheduling algorithm (including non-work-conserving algorithms), for any context, on identical multiprocessor platforms. By context we mean the hardware architecture (uniprocessor, multicore), as well as task constraints like critical sections, precedence constraints, self-suspension, etc. Since the result is based only on the releases and deadlines, it is independent from any other parameter. Note that we do not claim that the given interval is minimal, but it is an upper bound for any cycle of any feasible schedule provided by any deterministic and memoryless scheduler

    Very Large-Scale Neighborhoods with Performance Guarantees for Minimizing Makespan on Parallel Machines

    Get PDF
    We study the problem of minimizing the makespan on m parallel machines. We introduce a very large-scale neighborhood of exponential size (in the number of machines) that is based on a matching in a complete graph. The idea is to partition the jobs assigned to the same machine into two sets. This partitioning is done for every machine with some chosen rule to receive 2m parts. A new assignment is received by putting to every machine exactly two parts. The neighborhood Nsplit consists of all possible rearrangements of the parts to the machines. The best assignment of Nsplit can be calculated in time O(mlogm) by determining the perfect matching having minimum maximal edge weight in an improvement graph, where the vertices correspond to parts and the weights on the edges correspond to the sum of the processing times of the jobs belonging to the parts. Additionally, we examine local optima in this neighborhood and in combinations with other neighborhoods. We derive performance guarantees for these local optima

    Parallel processing and expert systems

    Get PDF
    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited
    corecore