388 research outputs found
Doctor of Philosophy
dissertationThe embedded system space is characterized by a rapid evolution in the complexity and functionality of applications. In addition, the short time-to-market nature of the business motivates the use of programmable devices capable of meeting the conflicting constraints of low-energy, high-performance, and short design times. The keys to achieving these conflicting constraints are specialization and maximally extracting available application parallelism. General purpose processors are flexible but are either too power hungry or lack the necessary performance. Application-specific integrated circuits (ASICS) efficiently meet the performance and power needs but are inflexible. Programmable domain-specific architectures (DSAs) are an attractive middle ground, but their design requires significant time, resources, and expertise in a variety of specialties, which range from application algorithms to architecture and ultimately, circuit design. This dissertation presents CoGenE, a design framework that automates the design of energy-performance-optimal DSAs for embedded systems. For a given application domain and a user-chosen initial architectural specification, CoGenE consists of a a Compiler to generate execution binary, a simulator Generator to collect performance/energy statistics, and an Explorer that modifies the current architecture to improve energy-performance-area characteristics. The above process repeats automatically until the user-specified constraints are achieved. This removes or alleviates the time needed to understand the application, manually design the DSA, and generate object code for the DSA. Thus, CoGenE is a new design methodology that represents a significant improvement in performance, energy dissipation, design time, and resources. This dissertation employs the face recognition domain to showcase a flexible architectural design methodology that creates "ASIC-like" DSAs. The DSAs are instruction set architecture (ISA)-independent and achieve good energy-performance characteristics by coscheduling the often conflicting constraints of data access, data movement, and computation through a flexible interconnect. This represents a significant increase in programming complexity and code generation time. To address this problem, the CoGenE compiler employs integer linear programming (ILP)-based 'interconnect-aware' scheduling techniques for automatic code generation. The CoGenE explorer employs an iterative technique to search the complete design space and select a set of energy-performance-optimal candidates. When compared to manual designs, results demonstrate that CoGenE produces superior designs for three application domains: face recognition, speech recognition and wireless telephony. While CoGenE is well suited to applications that exhibit a streaming behavior, multithreaded applications like ray tracing present a different but important challenge. To demonstrate its generality, CoGenE is evaluated in designing a novel multicore N-wide SIMD architecture, known as StreamRay, for the ray tracing domain. CoGenE is used to synthesize the SIMD execution cores, the compiler that generates the application binary, and the interconnection subsystem. Further, separating address and data computations in space reduces data movement and contention for resources, thereby significantly improving performance compared to existing ray tracing approaches
A survey of techniques and technologies for web-based real-time interactive rendering
When exploring a virtual environment, realism depends mainly on two factors: realistic images and
real-time feedback (motions, behaviour etc.). In this context, photo realism and physical validity of
computer generated images required by emerging applications, such as advanced e-commerce, still
impose major challenges in the area of rendering research whereas the complexity of lighting
phenomena further requires powerful and predictable computing if time constraints must be attained.
In this technical report we address the state-of-the-art on rendering, trying to put the focus on
approaches, techniques and technologies that might enable real-time interactive web-based clientserver
rendering systems. The focus is on the end-systems and not the networking technologies used
to interconnect client(s) and server(s).Siemens; Bertelsmann mediaSystems GmbH; Eptron Multimedia; Instituto Politécnico do Porto - ISEP-IPP; Institute Laboratory for Mixed Realities at the Academy of Media Arts Cologne, LMR; Mälardalen Real-Time Research Centre (MRTC) at Mälardalen University in Västerås; Q-Systems
A framework for efficient execution of data parallel irregular applications on heterogeneous systems
Exploiting the computing power of the diversity of resources available on heterogeneous
systems is mandatory but a very challenging task. The diversity of architectures, execution
models and programming tools, together with disjoint address spaces and di erent
computing capabilities, raise a number of challenges that severely impact on application
performance and programming productivity. This problem is further compounded in the
presence of data parallel irregular applications.
This paper presents a framework that addresses development and execution of data
parallel irregular applications in heterogeneous systems. A uni ed task-based programming
and execution model is proposed, together with inter and intra-device scheduling,
which, coupled with a data management system, aim to achieve performance scalability
across multiple devices, while maintaining high programming productivity. Intradevice
scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels,
which, by allowing dynamic generation and rescheduling of new work units, enable
balancing irregular workloads and increase resource utilization.
Results show that regular and irregular applications scale well with the number of
devices, while requiring minimal programming e ort. Consumer-producer kernels are
able to sustain signi cant performance gains as long as the workload per basic work
unit is enough to compensate overheads associated with intra-device scheduling. This
not being the case, consumer kernels can still be used for the irregular application.
Comparisons with an alternative framework, StarPU, which targets regular workloads,
consistently demonstrate signi cant speedups. This is, to the best of our knowledge, the
rst published integrated approach that successfully handles irregular workloads over
heterogeneous systems.This work is funded by National Funds through the FCT - Fundação para a Ciência
e a Tecnologia (Portuguese Foundation for Science and Technology) and by ERDF -
European Regional Development Fund through the COMPETE Programme (operational
programme for competitiveness) within projects PEst-OE/EEI/UI0752/2014
and FCOMP-01-0124-FEDER-010067. Also by the School of Engineering, Universidade
do Minho within project P2SHOCS - Performance Portability on Scalable
Heterogeneous Computing Systems
Interactive ray tracing for volume visualization
Journal ArticleWe present a brute-force ray tracing system for interactive volume visualization, The system runs on a conventional (distributed) shared-memory multiprocessor machine. For each pixel we trace a ray through a volume to compute the color for that pixel. Although this method has high intrinsic computational cost, its simplicity and scalability make it ideal for large datasets on current high-end parallel systems
Exploiting spatial and temporal coherence in GPU-based volume rendering
Effizienz spielt eine wichtige Rolle bei der Darstellung von Volumendaten, selbst wenn leistungsstarke Grafikhardware zur Verfügung steht, da steigende Datensatzgrößen und höhere Anforderungen an Visualisierungstechniken Fortschritte bei Grafikprozessoren ausgleichen. In dieser Dissertation wird untersucht, wie räumliche und zeitliche Kohärenz in Volumendaten zur Optimierung von Volumenrendering genutzt werden kann. Es werden mehrere neue Ansätze für statische und zeitvariante Daten eingeführt, die verschieden Arten von Kohärenz in verschiedenen Stufen der Volumenrendering-Pipeline ausnutzen. Zu den vorgestellten Beschleunigungstechniken gehört Empty Space Skipping mittels Occlusion Frustums, eine auf Slabs basierende Cachestruktur für Raycasting und ein verlustfreies Kompressionsscheme für zeitvariante Daten. Die Algorithmen wurden zur Verwendung mit GPU-basiertem Volumen-Raycasting entworfen und nutzen die Fähigkeiten moderner Grafikprozessoren, insbesondere Stream Processing. Efficiency is a key aspect in volume rendering, even if powerful
graphics hardware is employed, since increasing data set sizes and
growing demands on visualization techniques outweigh improvements in
graphics processor performance. This dissertation examines how spatial
and temporal coherence in volume data can be used to optimize volume
rendering. Several new approaches for static as well as for time-varying
data sets are introduced, which exploit different types of coherence in
different stages of the volume rendering pipeline. The presented
acceleration algorithms include empty space skipping using occlusion
frustums, a slab-based cache structure for raycasting, and a lossless
compression scheme for time-varying data. The algorithms were designed
for use with GPU-based volume raycasting and to efficiently exploit the
features of modern graphics processors, especially stream processing
Time-varying volume visualization
Volume rendering is a very active research field in Computer Graphics because of its wide range of applications in various sciences, from medicine to flow mechanics. In this report, we survey a state-of-the-art on time-varying volume rendering. We state several basic concepts and then we establish several criteria to classify the studied works: IVR versus DVR, 4D versus 3D+time, compression techniques, involved architectures, use of parallelism and image-space versus object-space coherence. We also address other related problems as transfer functions and 2D cross-sections computation of time-varying volume data. All the papers reviewed are classified into several tables based on the mentioned classification and, finally, several conclusions are presented.Preprin
Applying the finite-difference time-domain to the modelling of large-scale radio channels
A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of Doctor of Philosophy (PhD)Finite-difference models have been used for nearly 40 years to solve electromagnetic problems of heterogeneous nature. Further, these techniques are well known for being computationally expensive, as well as subject to various numerical artifacts. However, little is yet understood about the errors arising in the simulation of wideband sources with the finitedifference time-domain (FDTD) method. Within this context, the focus of this thesis is on two different problems. On the one hand, the speed and accuracy of current FDTD implementations is analysed and increased. On the other hand, the distortion of numerical pulses is characterised and mitigation techniques proposed.
In addition, recent developments in general-purpose computing on graphics processing units (GPGPU) have unveiled new methods for the efficient implementation of FDTD algorithms. Therefore, this thesis proposes specific GPU-based guidelines for the implementation of the standard FDTD. Then, metaheuristics are used for the calibration of a FDTD-based narrowband simulator. Regarding the simulation of wideband sources, this thesis uses first Lagrange multipliers to characterise the extrema of the numerical group velocity. Then, the spread of numerical Gaussian pulses is characterised analytically in terms of the FDTD grid parameters.
The usefulness of the proposed solutions to the previously described problems is illustrated in this thesis using coverage and wideband predictions in large-scale scenarios. In particular, the indoor-to-outdoor radio channel in residential areas is studied. Furthermore, coverage and wideband measurements have also been used to validate the predictions.
As a result of all the above, this thesis introduces first an efficient and accurate FDTD simulator. Then, it characterises analytically the propagation of numerical pulses. Finally, the narrowband and wideband indoorto-outdoor channels are modeled using the developed techniques
Efficient rendering of large 3-D and 4-D scalar fields
Rendering volumetric data, as a compute/communication intensive and highly parallel application,
represents the characteristics of future workloads for desktop computers.
Interactively rendering volumetric data has been a challenging problem due to its
high computational and communication requirements.
With the consistent trend toward high resolution data, it has remained
a difficult problem despite the continuous increase in processing power,
because of the increasing performance gap between computation and communication.
On the other hand, the new multi-core architecture trend in computational units in PC, which
can be characterized by parallelism and heterogeneity, provides both
opportunities and challenges. While the new on-chip parallel architectures
offer opportunities for extremely high performance,
widespread use of those parallel processors requires extensive changes in previous
algorithms to take advantage of the new architectures.
In this dissertation, we develop new methods and techniques to support interactive rendering of large volumetric data.
In particular, we present a novel method to layout data on disk for efficiently
performing an out-of-core axis-aligned slicing of large multidimensional scalar
fields. We also present a new method to efficiently build an out-of-core indexing structure for
n-dimensional volumetric data. Then, we describe a streaming model for efficiently implementing
volume ray casting on a heterogeneous compute resource environment.
We describe how we implement the model on SONY/TOSHIBA/IBM Cell Broadband Engine
and on NVIDIA CUDA architecture.
Our results show that our out-of-core techniques significantly reduce
the communication bandwidth requirements and that our streaming model very effectively makes use of
the strengths of those heterogeneous parallel compute resource environment for volume rendering.
In all cases, we achieve scalability and load balancing, while hiding memory latency
Fast ray tracing 3D models
In many computer graphics applications such as CAD, realistic displays have very important and positive effects on the users of the system. There are several techniques to generate realistic images with the computer. Ray tracing gives the most effective results by simulating the interaction of light with its environment. However, it may require an excessive amount of time to generate an image. In this article, we present a survey of methods developed to speed up the ray tracing algorithm and introduce a fast ray tracer to process a 3D scene that is defined by interactive 3D modeling software. © 1991
- …