2,910 research outputs found
Machine learning based mapping of data and streaming parallelism to multi-cores
Multi-core processors are now ubiquitous and are widely seen as the most viable means
of delivering performance with increasing transistor densities. However, this potential
can only be realised if the application programs are suitably parallel. Applications
can either be written in parallel from scratch or converted from existing sequential
programs. Regardless of how applications are parallelised, the code must be efficiently
mapped onto the underlying platform to fully exploit the hardware’s potential.
This thesis addresses the problem of finding the best mappings of data and streaming
parallelism—two types of parallelism that exist in broad and important domains
such as scientific, signal processing and media applications. Despite significant
progress having been made over the past few decades, state-of-the-art mapping approaches
still largely rely upon hand-crafted, architecture-specific heuristics. Developing
a heuristic by hand, however, often requiresmonths of development time. Asmulticore
designs become increasingly diverse and complex, manually tuning a heuristic
for a wide range of architectures is no longer feasible. What are needed are innovative
techniques that can automatically scale with advances in multi-core technologies.
In this thesis two distinct areas of computer science, namely parallel compiler design
and machine learning, are brought together to develop new compiler-based mapping
techniques. Using machine learning, it is possible to automatically build highquality
mapping schemes, which adapt to evolving architectures, with little human
involvement.
First, two techniques are proposed to find the best mapping of data parallelism.
The first technique predicts whether parallel execution of a data parallel candidate is
profitable on the underlying architecture. On a typical multi-core platform, it achieves
almost the same (and sometimes a better) level of performance when compared to the
manually parallelised code developed by independent experts. For a profitable candidate,
the second technique predicts how many threads should be used to execute
the candidate across different program inputs. The second technique achieves, on average,
over 96% of the maximum available performance on two different multi-core
platforms.
Next, a new approach is developed for partitioning stream applications. This approach
predicts the ideal partitioning structure for a given stream application. Based
on the prediction, a compiler can rapidly search the program space (without executing
any code) to generate a good partition. It achieves, on average, a 1.90x speedup over
the already tuned partitioning scheme of a state-of-the-art streaming compiler
Engineering Automation for Reliable Software Interim Progress Report (10/01/2000 - 09/30/2001)
Prepared for: U.S. Army Research Office
P.O. Box 12211
Research Triangle Park, NC 27709-2211The objective of our effort is to develop a scientific basis for producing reliable
software that is also flexible and cost effective for the DoD distributed software domain.
This objective addresses the long term goals of increasing the quality of service provided
by complex systems while reducing development risks, costs, and time. Our work focuses on
"wrap and glue" technology based on a domain specific distributed prototype model. The key
to making the proposed approach reliable, flexible, and cost-effective is the automatic
generation of glue and wrappers based on a designer's specification. The "wrap and glue"
approach allows system designers to concentrate on the difficult interoperability problems
and defines solutions in terms of deeper and more difficult interoperability issues, while
freeing designers from implementation details. Specific research areas for the proposed
effort include technology enabling rapid prototyping, inference for design checking,
automatic program generation, distributed real-time scheduling, wrapper and glue
technology, and reliability assessment and improvement. The proposed technology will be
integrated with past research results to enable a quantum leap forward in the state of the
art for rapid prototyping.U. S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-22110473-MA-SPApproved for public release; distribution is unlimited
Vision technology/algorithms for space robotics applications
The thrust of automation and robotics for space applications has been proposed for increased productivity, improved reliability, increased flexibility, higher safety, and for the performance of automating time-consuming tasks, increasing productivity/performance of crew-accomplished tasks, and performing tasks beyond the capability of the crew. This paper provides a review of efforts currently in progress in the area of robotic vision. Both systems and algorithms are discussed. The evolution of future vision/sensing is projected to include the fusion of multisensors ranging from microwave to optical with multimode capability to include position, attitude, recognition, and motion parameters. The key feature of the overall system design will be small size and weight, fast signal processing, robust algorithms, and accurate parameter determination. These aspects of vision/sensing are also discussed
Visual object-oriented development of parallel applications
PhD ThesisDeveloping software for parallel architectures is a notoriously difficult task, compounded further by the range of available parallel architectures. There has been little research effort invested in how to engineer parallel applications for more general problem domains than the traditional numerically intensive domain. This thesis addresses these issues. An object-oriented paradigm for the development of general-purpose parallel applications, with full lifecycle support, is proposed and investigated, and a visual programming language to support that paradigm is developed. This thesis presents experiences and results from experiments with this new model for parallel application development.Engineering and Physical Sciences Research Council
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
Online Modeling and Tuning of Parallel Stream Processing Systems
Writing performant computer programs is hard. Code for high performance applications is profiled, tweaked, and re-factored for months specifically for the hardware for which it is to run. Consumer application code doesn\u27t get the benefit of endless massaging that benefits high performance code, even though heterogeneous processor environments are beginning to resemble those in more performance oriented arenas. This thesis offers a path to performant, parallel code (through stream processing) which is tuned online and automatically adapts to the environment it is given. This approach has the potential to reduce the tuning costs associated with high performance code and brings the benefit of performance tuning to consumer applications where otherwise it would be cost prohibitive. This thesis introduces a stream processing library and multiple techniques to enable its online modeling and tuning. Stream processing (also termed data-flow programming) is a compute paradigm that views an application as a set of logical kernels connected via communications links or streams. Stream processing is increasingly used by computational-x and x-informatics fields (e.g., biology, astrophysics) where the focus is on safe and fast parallelization of specific big-data applications. A major advantage of stream processing is that it enables parallelization without necessitating manual end-user management of non-deterministic behavior often characteristic of more traditional parallel processing methods. Many big-data and high performance applications involve high throughput processing, necessitating usage of many parallel compute kernels on several compute cores. Optimizing the orchestration of kernels has been the focus of much theoretical and empirical modeling work. Purely theoretical parallel programming models can fail when the assumptions implicit within the model are mis-matched with reality (i.e., the model is incorrectly applied). Often it is unclear if the assumptions are actually being met, even when verified under controlled conditions. Full empirical optimization solves this problem by extensively searching the range of likely configurations under native operating conditions. This, however, is expensive in both time and energy. For large, massively parallel systems, even deciding which modeling paradigm to use is often prohibitively expensive and unfortunately transient (with workload and hardware). In an ideal world, a parallel run-time will re-optimize an application continuously to match its environment, with little additional overhead. This work presents methods aimed at doing just that through low overhead instrumentation, modeling, and optimization. Online optimization provides a good trade-off between static optimization and online heuristics. To enable online optimization, modeling decisions must be fast and relatively accurate. Online modeling and optimization of a stream processing system first requires the existence of a stream processing framework that is amenable to the intended type of dynamic manipulation. To fill this void, we developed the RaftLib C++ template library, which enables usage of the stream processing paradigm for C++ applications (it is the run-time which is the basis of almost all the work within this dissertation). An application topology is specified by the user, however almost everything else is optimizable by the run-time. RaftLib takes advantage of the knowledge gained during the design of several prior streaming languages (notably Auto-Pipe). The resultant framework enables online migration of tasks, auto-parallelization, online buffer-reallocation, and other useful dynamic behaviors that were not available in many previous stream processing systems. Several benchmark applications have been designed to assess the performance gains through our approaches and compare performance to other leading stream processing frameworks. Information is essential to any modeling task, to that end a low-overhead instrumentation framework has been developed which is both dynamic and adaptive. Discovering a fast and relatively optimal configuration for a stream processing application often necessitates solving for buffer sizes within a finite capacity queueing network. We show that a generalized gain/loss network flow model can bootstrap the process under certain conditions. Any modeling effort, requires that a model be selected; often a highly manual task, involving many expensive operations. This dissertation demonstrates that machine learning methods (such as a support vector machine) can successfully select models at run-time for a streaming application. The full set of approaches are incorporated into the open source RaftLib framework
- …