465 research outputs found
H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System
High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to
perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo
video, however, remains challenging with commodity cameras. Existing spatial
super-resolution or temporal frame interpolation methods provide compromised
solutions that lack temporal or spatial details, respectively. To alleviate
this problem, we propose a dual camera system, in which one camera captures
high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial
details, and the other captures low-spatial-resolution high-frame-rate
(LSR-HFR) videos with smooth temporal details. We then devise a Learned
Information Fusion network (LIFnet) that exploits the cross-camera redundancies
to enhance both camera views to high spatiotemporal resolution (HSTR) for
reconstructing the H2-Stereo video effectively. We utilize a disparity network
to transfer spatiotemporal information across views even in large disparity
scenes, based on which, we propose disparity-guided flow-based warping for
LSR-HFR view and complementary warping for HSR-LFR view. A multi-scale fusion
method in feature domain is proposed to minimize occlusion-induced warping
ghosts and holes in HSR-LFR view. The LIFnet is trained in an end-to-end manner
using our collected high-quality Stereo Video dataset from YouTube. Extensive
experiments demonstrate that our model outperforms existing state-of-the-art
methods for both views on synthetic data and camera-captured real data with
large disparity. Ablation studies explore various aspects, including
spatiotemporal resolution, camera baseline, camera desynchronization,
long/short exposures and applications, of our system to fully understand its
capability for potential applications
Cross-Spectral Full and Partial Face Recognition: Preprocessing, Feature Extraction and Matching
Cross-spectral face recognition remains a challenge in the area of biometrics. The problem arises from some real-world application scenarios such as surveillance at night time or in harsh environments, where traditional face recognition techniques are not suitable or limited due to usage of imagery obtained in the visible light spectrum. This motivates the study conducted in the dissertation which focuses on matching infrared facial images against visible light images. The study outspreads from aspects of face recognition such as preprocessing to feature extraction and to matching.;We address the problem of cross-spectral face recognition by proposing several new operators and algorithms based on advanced concepts such as composite operators, multi-level data fusion, image quality parity, and levels of measurement. To be specific, we experiment and fuse several popular individual operators to construct a higher-performed compound operator named GWLH which exhibits complementary advantages of involved individual operators. We also combine a Gaussian function with LBP, generalized LBP, WLD and/or HOG and modify them into multi-lobe operators with smoothed neighborhood to have a new type of operators named Composite Multi-Lobe Descriptors. We further design a novel operator termed Gabor Multi-Levels of Measurement based on the theory of levels of measurements, which benefits from taking into consideration the complementary edge and feature information at different levels of measurements.;The issue of image quality disparity is also studied in the dissertation due to its common occurrence in cross-spectral face recognition tasks. By bringing the quality of heterogeneous imagery closer to each other, we successfully achieve an improvement in the recognition performance. We further study the problem of cross-spectral recognition using partial face since it is also a common problem in practical usage. We begin with matching heterogeneous periocular regions and generalize the topic by considering all three facial regions defined in both a characteristic way and a mixture way.;In the experiments we employ datasets which include all the sub-bands within the infrared spectrum: near-infrared, short-wave infrared, mid-wave infrared, and long-wave infrared. Different standoff distances varying from short to intermediate and long are considered too. Our methods are compared with other popular or state-of-the-art methods and are proven to be advantageous
Real-time GPU-accelerated Out-of-Core Rendering and Light-field Display Visualization for Improved Massive Volume Understanding
Nowadays huge digital models are becoming increasingly available for a number of different applications ranging from CAD, industrial design to medicine and natural sciences. Particularly, in the field of medicine, data acquisition devices such as MRI or CT scanners routinely produce huge volumetric datasets. Currently, these datasets can easily reach dimensions of 1024^3 voxels and datasets larger than that are not uncommon.
This thesis focuses on efficient methods for the interactive exploration of such large volumes using direct volume visualization techniques on commodity platforms. To reach this goal specialized multi-resolution structures and algorithms, which are able to directly render volumes of potentially unlimited size are introduced. The developed techniques are output sensitive and their rendering costs depend only on the complexity of the generated images and not on the complexity of the input datasets. The advanced characteristics of modern GPGPU architectures are exploited and combined with an out-of-core framework in order to provide a more flexible, scalable and efficient implementation of these algorithms and data structures on single GPUs and GPU clusters.
To improve visual perception and understanding, the use of novel 3D display technology based on a light-field approach is introduced. This kind of device allows multiple naked-eye users to perceive virtual objects floating inside the display workspace, exploiting the stereo and horizontal parallax. A set of specialized and interactive illustrative techniques capable of providing different contextual information in different areas of the display, as well as an out-of-core CUDA based ray-casting engine with a number of improvements over current GPU volume ray-casters are both reported. The possibilities of the system are demonstrated by the multi-user interactive exploration of 64-GVoxel datasets on a 35-MPixel light-field display driven by a cluster of PCs. ------------------------------------------------------------------------------------------------------
Negli ultimi anni si sta verificando una proliferazione sempre più consistente di modelli
digitali di notevoli dimensioni in campi applicativi che variano dal CAD e la progettazione
industriale alla medicina e le scienze naturali. In modo particolare, nel settore della medicina,
le apparecchiature di acquisizione dei dati come RM o TAC producono comunemente dei
dataset volumetrici di grosse dimensioni. Questi dataset possono facilmente raggiungere
taglie dell’ordine di 10243 voxels e dataset di dimensioni maggiori possono essere frequenti.
Questa tesi si focalizza su metodi efficienti per l’esplorazione di tali grossi volumi utilizzando
tecniche di visualizzazione diretta su piattaforme HW di diffusione di massa. Per
raggiungere tale obiettivo si introducono strutture specializzate multi-risoluzione e algoritmi
in grado di visualizzare volumi di dimensioni potenzialmente infinite. Le tecniche sviluppate
sono “ouput sensitive” e la loro complessità di rendering dipende soltanto dalle dimensioni
delle immagini generate e non dalle dimensioni dei dataset di input. Le caratteristiche avanzate
delle architetture moderne GPGPU vengono inoltre sfruttate e combinate con un framework
“out-of-core” in modo da offrire una implementazione di questi algoritmi e strutture
dati più flessibile, scalabile ed efficiente su singole GPU o cluster di GPU.
Per migliorare la percezione visiva e la comprensione dei dati, viene introdotto inoltre l’uso
di tecnologie di display 3D di nuova generazione basate su un approccio di tipo light-field.
Questi tipi di dispositivi consentono a diversi utenti di percepire ad occhio nudo oggetti che
galleggiano all’interno dello spazio di lavoro del display, sfruttando lo stereo e la parallasse
orizzontale. Si descrivono infine un insieme di tecniche illustrative interattive in grado di
fornire diverse informazioni contestuali in diverse zone del display, così come un motore di
“ray-casting out-of-core” basato su CUDA e contenente una serie di miglioramenti rispetto
agli attuali metodi GPU di “ray-casting” di volumi. Le possibilità del sistema sono dimostrate
attraverso l’esplorazione interattiva di dataset di 64-GVoxel su un display di tipo light-field
da 35-MPixel pilotato da un cluster di PC
Validating Stereoscopic Volume Rendering
The evaluation of stereoscopic displays for surface-based renderings is well established in terms of accurate depth perception and tasks that require an understanding of the spatial layout of the scene. In comparison direct volume rendering (DVR) that typically produces images with a high number of low opacity, overlapping features is only beginning to be critically studied on stereoscopic displays. The properties of the specific images and the choice of parameters for DVR algorithms make assessing the effectiveness of stereoscopic displays for DVR particularly challenging and as a result existing literature is sparse with inconclusive results.
In this thesis stereoscopic volume rendering is analysed for tasks that require depth perception including: stereo-acuity tasks, spatial search tasks and observer preference ratings. The evaluations focus on aspects of the DVR rendering pipeline and assess how the parameters of volume resolution, reconstruction filter and transfer function may alter task performance and the perceived quality of the produced images.
The results of the evaluations suggest that the transfer function and choice of recon- struction filter can have an effect on the performance on tasks with stereoscopic displays when all other parameters are kept consistent. Further, these were found to affect the sensitivity and bias response of the participants. The studies also show that properties of the reconstruction filters such as post-aliasing and smoothing do not correlate well with either task performance or quality ratings.
Included in the contributions are guidelines and recommendations on the choice of pa- rameters for increased task performance and quality scores as well as image based methods of analysing stereoscopic DVR images
Fusing spatial and temporal components for real-time depth data enhancement of dynamic scenes
The depth images from consumer depth cameras (e.g., structured-light/ToF devices) exhibit a substantial amount of artifacts (e.g., holes, flickering, ghosting) that needs to be removed for real-world applications. Existing methods cannot entirely remove them and perform slow. This thesis proposes a new real-time spatio-temporal depth image enhancement filter that completely removes flickering and ghosting, and significantly reduces holes. This thesis also presents a novel depth-data capture setup and two data reduction methods to optimize the performance of the proposed enhancement method
Iris Recognition: Robust Processing, Synthesis, Performance Evaluation and Applications
The popularity of iris biometric has grown considerably over the past few years. It has resulted in the development of a large number of new iris processing and encoding algorithms. In this dissertation, we will discuss the following aspects of the iris recognition problem: iris image acquisition, iris quality, iris segmentation, iris encoding, performance enhancement and two novel applications.;The specific claimed novelties of this dissertation include: (1) a method to generate a large scale realistic database of iris images; (2) a crosspectral iris matching method for comparison of images in color range against images in Near-Infrared (NIR) range; (3) a method to evaluate iris image and video quality; (4) a robust quality-based iris segmentation method; (5) several approaches to enhance recognition performance and security of traditional iris encoding techniques; (6) a method to increase iris capture volume for acquisition of iris on the move from a distance and (7) a method to improve performance of biometric systems due to available soft data in the form of links and connections in a relevant social network
Novel Aggregated Solutions for Robust Visual Tracking in Traffic Scenarios
This work proposes novel approaches for object tracking in challenging scenarios like severe occlusion, deteriorated vision and long range multi-object reidentification. All these solutions are only based on image sequence captured by a monocular camera and do not require additional sensors. Experiments on standard benchmarks demonstrate an improved state-of-the-art performance of these approaches. Since all the presented approaches are smartly designed, they can run at a real-time speed
Recommended from our members
Camera positioning for 3D panoramic image rendering
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Virtual camera realisation and the proposition of trapezoidal camera architecture are the two broad contributions of this thesis. Firstly, multiple camera and their arrangement constitute a critical component which affect the integrity of visual content acquisition for multi-view video. Currently, linear, convergence, and divergence arrays are the prominent camera topologies adopted. However, the large number of cameras required and their synchronisation are two of prominent challenges usually encountered. The use of virtual cameras can significantly reduce the number of physical cameras used with respect to any of the known
camera structures, hence adequately reducing some of the other implementation issues. This thesis explores to use image-based rendering with and without geometry in the implementations leading to the realisation of virtual cameras. The virtual camera implementation was carried out from the perspective of depth map (geometry) and use of multiple image samples (no geometry). Prior to the virtual camera realisation, the generation of depth map was investigated using region match measures widely known for solving image point correspondence problem. The constructed depth maps have been compare with the ones generated
using the dynamic programming approach. In both the geometry and no geometry approaches, the virtual cameras lead to the rendering of views from a textured depth map, construction of 3D panoramic image of a scene by stitching multiple image samples and performing superposition on them, and computation
of virtual scene from a stereo pair of panoramic images. The quality of these rendered images were assessed through the use of either objective or subjective analysis in Imatest software. Further more, metric reconstruction of a scene was performed by re-projection of the pixel points from multiple image samples with
a single centre of projection. This was done using sparse bundle adjustment algorithm. The statistical summary obtained after the application of this algorithm provides a gauge for the efficiency of the optimisation step. The optimised data was then visualised in Meshlab software environment, hence providing the reconstructed scene. Secondly, with any of the well-established camera arrangements, all cameras are usually constrained to the same horizontal plane. Therefore, occlusion becomes an extremely challenging problem, and a robust camera set-up is required in order to resolve strongly the hidden part of any scene objects.
To adequately meet the visibility condition for scene objects and given that occlusion of the same scene objects can occur, a multi-plane camera structure is highly desirable. Therefore, this thesis also explore trapezoidal camera structure for image acquisition. The approach here is to assess the feasibility and potential
of several physical cameras of the same model being sparsely arranged on the edge of an efficient trapezoid graph. This is implemented both Matlab and Maya. The quality of the depth maps rendered in Matlab are better in Quality
- …