3,582 research outputs found
From a Competition for Self-Driving Miniature Cars to a Standardized Experimental Platform: Concept, Models, Architecture, and Evaluation
Context: Competitions for self-driving cars facilitated the development and
research in the domain of autonomous vehicles towards potential solutions for
the future mobility.
Objective: Miniature vehicles can bridge the gap between simulation-based
evaluations of algorithms relying on simplified models, and those
time-consuming vehicle tests on real-scale proving grounds.
Method: This article combines findings from a systematic literature review,
an in-depth analysis of results and technical concepts from contestants in a
competition for self-driving miniature cars, and experiences of participating
in the 2013 competition for self-driving cars.
Results: A simulation-based development platform for real-scale vehicles has
been adapted to support the development of a self-driving miniature car.
Furthermore, a standardized platform was designed and realized to enable
research and experiments in the context of future mobility solutions.
Conclusion: A clear separation between algorithm conceptualization and
validation in a model-based simulation environment enabled efficient and
riskless experiments and validation. The design of a reusable, low-cost, and
energy-efficient hardware architecture utilizing a standardized
software/hardware interface enables experiments, which would otherwise require
resources like a large real-scale test track.Comment: 17 pages, 19 figues, 2 table
Conceptual roles of data in program: analyses and applications
Program comprehension is the prerequisite for many software evolution and maintenance tasks. Currently, the research falls short in addressing how to build tools that can use domain-specific knowledge to provide powerful capabilities for extracting valuable information for facilitating program comprehension. Such capabilities are critical for working with large and complex program where program comprehension often is not possible without the help of domain-specific knowledge.;Our research advances the state-of-art in program analysis techniques based on domain-specific knowledge. The program artifacts including variables and methods are carriers of domain concepts that provide the key to understand programs. Our program analysis is directed by domain knowledge stored as domain-specific rules. Our analysis is iterative and interactive. It is based on flexible inference rules and inter-exchangeable and extensible information storage. We designed and developed a comprehensive software environment SeeCORE based on our knowledge-centric analysis methodology. The SeeCORE tool provides multiple views and abstractions to assist in understanding complex programs. The case studies demonstrate the effectiveness of our method. We demonstrate the flexibility of our approach by analyzing two legacy programs in distinct domains
Heterogeneous computing with an algorithmic skeleton framework
The Graphics Processing Unit (GPU) is present in almost every modern day personal
computer. Despite its specific purpose design, they have been increasingly used for general
computations with very good results. Hence, there is a growing effort from the community
to seamlessly integrate this kind of devices in everyday computing. However, to
fully exploit the potential of a system comprising GPUs and CPUs, these devices should
be presented to the programmer as a single platform.
The efficient combination of the power of CPU and GPU devices is highly dependent
on each deviceās characteristics, resulting in platform specific applications that cannot
be ported to different systems. Also, the most efficient work balance among devices is
highly dependable on the computations to be performed and respective data sizes.
In this work, we propose a solution for heterogeneous environments based on the
abstraction level provided by algorithmic skeletons. Our goal is to take full advantage of
the power of all CPU and GPU devices present in a system, without the need for different
kernel implementations nor explicit work-distribution.To that end, we extended Marrow,
an algorithmic skeleton framework for multi-GPUs, to support CPU computations and
efficiently balance the work-load between devices. Our approach is based on an offline
training execution that identifies the ideal work balance and platform configurations for
a given application and input data size.
The evaluation of this work shows that the combination of CPU and GPU devices can
significantly boost the performance of our benchmarks in the tested environments, when
compared to GPU-only executions
Toward optimised skeletons for heterogeneous parallel architecture with performance cost model
High performance architectures are increasingly heterogeneous with shared and
distributed memory components, and accelerators like GPUs. Programming such
architectures is complicated and performance portability is a major issue as the
architectures evolve. This thesis explores the potential for algorithmic skeletons
integrating a dynamically parametrised static cost model, to deliver portable
performance for mostly regular data parallel programs on heterogeneous archi-
tectures.
The rst contribution of this thesis is to address the challenges of program-
ming heterogeneous architectures by providing two skeleton-based programming
libraries: i.e. HWSkel for heterogeneous multicore clusters and GPU-HWSkel
that enables GPUs to be exploited as general purpose multi-processor devices.
Both libraries provide heterogeneous data parallel algorithmic skeletons including
hMap, hMapAll, hReduce, hMapReduce, and hMapReduceAll.
The second contribution is the development of cost models for workload dis-
tribution. First, we construct an architectural cost model (CM1) to optimise
overall processing time for HWSkel heterogeneous skeletons on a heterogeneous
system composed of networks of arbitrary numbers of nodes, each with an ar-
bitrary number of cores sharing arbitrary amounts of memory. The cost model
characterises the components of the architecture by the number of cores, clock
speed, and crucially the size of the L2 cache. Second, we extend the HWSkel cost
model (CM1) to account for GPU performance. The extended cost model (CM2)
is used in the GPU-HWSkel library to automatically nd a good distribution
for both a single heterogeneous multicore/GPU node, and clusters of heteroge-
neous multicore/GPU nodes. Experiments are carried out on three heterogeneous
multicore clusters, four heterogeneous multicore/GPU clusters, and three single
heterogeneous multicore/GPU nodes. The results of experimental evaluations for
four data parallel benchmarks, i.e. sumEuler, Image matching, Fibonacci, and
Matrix Multiplication, show that our combined heterogeneous skeletons and cost
models can make good use of resources in heterogeneous systems. Moreover using
cores together with a GPU in the same host can deliver good performance either
on a single node or on multiple node architectures
Improving Utility of GPU in Accelerating Industrial Applications with User-centred Automatic Code Translation
SMEs (Small and medium-sized enterprises), particularly those whose business is focused on developing innovative produces, are limited by a major bottleneck on the speed of computation in many applications. The recent developments in GPUs have been the marked increase in their versatility in many computational areas. But due to the lack of specialist GPU (Graphics processing units) programming skills, the explosion of GPU power has not been fully utilized in general SME applications by inexperienced users. Also, existing automatic CPU-to-GPU code translators are mainly designed for research purposes with poor user interface design and hard-to-use. Little attentions have been paid to the applicability, usability and learnability of these tools for normal users. In this paper, we present an online automated CPU-to-GPU source translation system, (GPSME) for inexperienced users to utilize GPU capability in accelerating general SME applications. This system designs and implements a directive programming model with new kernel generation scheme and memory management hierarchy to optimize its performance. A web-service based interface is designed for inexperienced users to easily and flexibly invoke the automatic resource translator. Our experiments with non-expert GPU users in 4 SMEs reflect that GPSME system can efficiently accelerate real-world applications with at least 4x and have a better applicability, usability and learnability than existing automatic CPU-to-GPU source translators
GSWO: A Programming Model for GPU-enabled Parallelization of Sliding Window Operations in Image Processing
Sliding Window Operations (SWOs) are widely used in image processing applications. They often have to be performed repeatedly across the target image, which can demand significant computing resources when processing large images with large windows. In applications in which real-time performance is essential, running these filters on a CPU often fails to deliver results within an acceptable timeframe. The emergence of sophisticated graphic processing units (GPUs) presents an opportunity to address this challenge. However, GPU programming requires a steep learning curve and is error-prone for novices, so the availability of a tool that can produce a GPU implementation automatically from the original CPU source code can provide an attractive means by which the GPU power can be harnessed effectively. This paper presents a GPUenabled programming model, called GSWO, which can assist GPU novices by converting their SWO-based image processing applications from the original C/C++ source code to CUDA code in a highly automated manner. This model includes a new set of simple SWO pragmas to generate GPU kernels and to support effective GPU memory management. We have implemented this programming model based on a CPU-to-GPU translator (C2GPU). Evaluations have been performed on a number of typical SWO image filters and applications. The experimental results show that the GSWO model is capable of efficiently accelerating these applications, with improved applicability and a speed-up of performance compared to several leading CPU-to- GPU source-to-source translators
- ā¦