10,155 research outputs found
Evaluation of optimisation techniques for multiscopic rendering
A thesis submitted to the University of Bedfordshire in fulfilment of the requirements for the degree of Master of Science by ResearchThis project evaluates different performance optimisation techniques applied to stereoscopic and multiscopic rendering for interactive applications. The artefact
features a robust plug-in package for the Unity game engine. The thesis provides background information for the performance optimisations, outlines all the findings, evaluates the optimisations and provides suggestions for future work.
Scrum development methodology is used to develop the artefact and quantitative research methodology is used to evaluate the findings by measuring performance.
This project concludes that the use of each performance optimisation has specific use case scenarios in which performance benefits. Foveated rendering provides
greatest performance increase for both stereoscopic and multiscopic rendering but is also more computationally intensive as it requires an eye tracking solution.
Dynamic resolution is very beneficial when overall frame rate smoothness is needed and frame drops are present. Depth optimisation is beneficial for vast open environments but can lead to decreased performance if used inappropriately
Increasing rendering performance of graphics hardware
Graphics Processing Unit (GPU) performance is increasing faster than central processing unit (CPU) performance. This growth is driven by performance improvements that can be divided into the following three categories: algorithmic improvements, architectural improvements, and circuit-level improvements. In this dissertation I present techniques that improve the rendering performance of graphics hardware measured in speed, power consumption or image quality in each of these three areas. At the algorithmic level, I introduce a method for using graphics hardware to rapidly and efficiently generate summed-area tables, which are data structures that hold pre-computed two-dimensional integrals of subsets of a given image, and present several novel rendering techniques that take advantage of summed-area tables to produce dynamic, high-quality images at interactive frame rates. These techniques improve the visual quality of images rendered on current commodity GPUs without requiring modifications to the underlying hardware or architecture. At the architectural level, I propose modifications to the architecture of current GPUs that add conditional streaming capabilities. I describe a novel GPU-based ray-tracing algorithm that takes advantage of conditional output streams to reduce the memory bandwidth requirements by over an order of magnitude times when compared to previous techniques. At the circuit level, I propose a compute-on-demand paradigm for the design of high-speed and energy-efficient graphics components. The goal of the compute-on-demand paradigm is to only perform computation at the bit-level when needed. The compute-on-demand paradigm exploits the data-dependent nature of computation, and thereby obtains speed and energy improvements by optimizing designs for the common case. This approach is illustrated with the design of a high-speed Z-comparator that is implemented using asynchronous logic. Asynchronous or "clockless" circuits were chosen for my implementations since they allow for data-dependent completion times and reduced power consumption by disabling inactive components. The resulting circuit-level implementation runs over 1.5 times faster while on dissipating 25% the energy of a comparable synchronous comparator for the average case. Also at the circuit-level, I introduce a novel implementation of counterflow pipelining, which allows two streams of data to flow in opposite directions within the same pipeline without the need for complex arbitration. The advantages of this implementation are demonstrated by the design of a high-speed asynchronous Booth multiplier. While both the comparator and the multiplier are useful components of a graphics pipeline, the objective of this work was to propose the new design paradigm as a promising alternative to current graphics hardware design practices
Navigation domain representation for interactive multiview imaging
Enabling users to interactively navigate through different viewpoints of a
static scene is a new interesting functionality in 3D streaming systems. While
it opens exciting perspectives towards rich multimedia applications, it
requires the design of novel representations and coding techniques in order to
solve the new challenges imposed by interactive navigation. Interactivity
clearly brings new design constraints: the encoder is unaware of the exact
decoding process, while the decoder has to reconstruct information from
incomplete subsets of data since the server can generally not transmit images
for all possible viewpoints due to resource constrains. In this paper, we
propose a novel multiview data representation that permits to satisfy bandwidth
and storage constraints in an interactive multiview streaming system. In
particular, we partition the multiview navigation domain into segments, each of
which is described by a reference image and some auxiliary information. The
auxiliary information enables the client to recreate any viewpoint in the
navigation segment via view synthesis. The decoder is then able to navigate
freely in the segment without further data request to the server; it requests
additional data only when it moves to a different segment. We discuss the
benefits of this novel representation in interactive navigation systems and
further propose a method to optimize the partitioning of the navigation domain
into independent segments, under bandwidth and storage constraints.
Experimental results confirm the potential of the proposed representation;
namely, our system leads to similar compression performance as classical
inter-view coding, while it provides the high level of flexibility that is
required for interactive streaming. Hence, our new framework represents a
promising solution for 3D data representation in novel interactive multimedia
services
Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS
Many modern video processing pipelines rely on edge-aware (EA) filtering
methods. However, recent high-quality methods are challenging to run in
real-time on embedded hardware due to their computational load. To this end, we
propose an area-efficient and real-time capable hardware implementation of a
high quality EA method. In particular, we focus on the recently proposed
permeability filter (PF) that delivers promising quality and performance in the
domains of HDR tone mapping, disparity and optical flow estimation. We present
an efficient hardware accelerator that implements a tiled variant of the PF
with low on-chip memory requirements and a significantly reduced external
memory bandwidth (6.4x w.r.t. the non-tiled PF). The design has been taped out
in 65 nm CMOS technology, is able to filter 720p grayscale video at 24.8 Hz and
achieves a high compute density of 6.7 GFLOPS/mm2 (12x higher than embedded
GPUs when scaled to the same technology node). The low area and bandwidth
requirements make the accelerator highly suitable for integration into SoCs
where silicon area budget is constrained and external memory is typically a
heavily contended resource
Sequential Circuit Design for Embedded Cryptographic Applications Resilient to Adversarial Faults
In the relatively young field of fault-tolerant cryptography, the main research effort has focused exclusively on the protection of the data path of cryptographic circuits. To date, however, we have not found any work that aims at protecting the control logic of these circuits against fault attacks, which thus remains the proverbial Achilles’ heel. Motivated by a hypothetical yet realistic fault analysis attack that, in principle, could be mounted against any modular exponentiation engine, even one with appropriate data path protection, we set out to close this remaining gap. In this paper, we present guidelines for the design of multifault-resilient sequential control logic based on standard Error-Detecting Codes (EDCs) with large minimum distance. We introduce a metric that measures the effectiveness of the error detection technique in terms of the effort the attacker has to make in relation to the area overhead spent in
implementing the EDC. Our comparison shows that the proposed EDC-based technique provides superior performance when compared against regular N-modular redundancy techniques. Furthermore, our technique scales well and does not affect the critical path delay
- …