844 research outputs found
Full-reference stereoscopic video quality assessment using a motion sensitive HVS model
Stereoscopic video quality assessment has become a major research topic in recent years. Existing stereoscopic video quality metrics are predominantly based on stereoscopic image quality metrics extended to the time domain via for example temporal pooling. These approaches do not explicitly consider the motion sensitivity of the Human Visual System (HVS). To address this limitation, this paper introduces a novel HVS model inspired by physiological findings characterising the motion sensitive response of complex cells in the primary visual cortex (V1 area). The proposed HVS model generalises previous HVS models, which characterised the behaviour of simple and complex cells but ignored motion sensitivity, by estimating optical flow to measure scene velocity at different scales and orientations. The local motion characteristics (direction and amplitude) are used to modulate the output of complex cells. The model is applied to develop a new type of full-reference stereoscopic video quality metrics which uniquely combine non-motion sensitive and motion sensitive energy terms to mimic the response of the HVS. A tailored two-stage multi-variate stepwise regression algorithm is introduced to determine the optimal contribution of each energy term. The two proposed stereoscopic video quality metrics are evaluated on three stereoscopic video datasets. Results indicate that they achieve average correlations with subjective scores of 0.9257 (PLCC), 0.9338 and 0.9120 (SRCC), 0.8622 and 0.8306 (KRCC), and outperform previous stereoscopic video quality metrics including other recent HVS-based metrics
Multi-Scale 3D Scene Flow from Binocular Stereo Sequences
Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108
Recommended from our members
Holoscopic 3D image depth estimation and segmentation techniques
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonToday’s 3D imaging techniques offer significant benefits over conventional 2D imaging techniques. The presence of natural depth information in the scene affords the observer an overall improved sense of reality and naturalness. A variety of systems attempting to reach this goal have been designed by many independent research groups, such as stereoscopic and auto-stereoscopic systems. Though the images displayed by such systems tend to cause eye strain, fatigue and headaches after prolonged viewing as users are required to focus on the screen plane/accommodation to converge their eyes to a point in space in a different plane/convergence. Holoscopy is a 3D technology that targets overcoming the above limitations of current 3D technology and was recently developed at Brunel University. This work is part W4.1 of the 3D VIVANT project that is funded by the EU under the ICT program and coordinated by Dr. Aman Aggoun at Brunel University, West London, UK. The objective of the work described in this thesis is to develop estimation and segmentation techniques that are capable of estimating precise 3D depth, and are applicable for holoscopic 3D imaging system. Particular emphasis is given to the task of automatic techniques i.e. favours algorithms with broad generalisation abilities, as no constraints are placed on the setting. Algorithms that provide invariance to most appearance based variation of objects in the scene (e.g. viewpoint changes, deformable objects, presence of noise and changes in lighting). Moreover, have the ability to estimate depth information from both types of holoscopic 3D images i.e. Unidirectional and Omni-directional which gives horizontal parallax and full parallax (vertical and horizontal), respectively. The main aim of this research is to develop 3D depth estimation and 3D image segmentation techniques with great precision. In particular, emphasis on automation of thresholding techniques and cues identifications for development of robust algorithms. A method for depth-through-disparity feature analysis has been built based on the existing correlation between the pixels at a one micro-lens pitch which has been exploited to extract the viewpoint images (VPIs). The corresponding displacement among the VPIs has been exploited to estimate the depth information map via setting and extracting reliable sets of local features. ii Feature-based-point and feature-based-edge are two novel automatic thresholding techniques for detecting and extracting features that have been used in this approach. These techniques offer a solution to the problem of setting and extracting reliable features automatically to improve the performance of the depth estimation related to the generalizations, speed and quality. Due to the resolution limitation of the extracted VPIs, obtaining an accurate 3D depth map is challenging. Therefore, sub-pixel shift and integration is a novel interpolation technique that has been used in this approach to generate super-resolution VPIs. By shift and integration of a set of up-sampled low resolution VPIs, the new information contained in each viewpoint is exploited to obtain a super resolution VPI. This produces a high resolution perspective VPI with wide Field Of View (FOV). This means that the holoscopic 3D image system can be converted into a multi-view 3D image pixel format. Both depth accuracy and a fast execution time have been achieved that improved the 3D depth map. For a 3D object to be recognized the related foreground regions and depth information map needs to be identified. Two novel unsupervised segmentation methods that generate interactive depth maps from single viewpoint segmentation were developed. Both techniques offer new improvements over the existing methods due to their simple use and being fully automatic; therefore, producing the 3D depth interactive map without human interaction. The final contribution is a performance evaluation, to provide an equitable measurement for the extent of the success of the proposed techniques for foreground object segmentation, 3D depth interactive map creation and the generation of 2D super-resolution viewpoint techniques. The no-reference image quality assessment metrics and their correlation with the human perception of quality are used with the help of human participants in a subjective manner
A family of stereoscopic image compression algorithms using wavelet transforms
With the standardization of JPEG-2000, wavelet-based image and video
compression technologies are gradually replacing the popular DCT-based methods. In
parallel to this, recent developments in autostereoscopic display technology is now
threatening to revolutionize the way in which consumers are used to enjoying the
traditional 2D display based electronic media such as television, computer and
movies. However, due to the two-fold bandwidth/storage space requirement of
stereoscopic imaging, an essential requirement of a stereo imaging system is efficient
data compression.
In this thesis, seven wavelet-based stereo image compression algorithms are
proposed, to take advantage of the higher data compaction capability and better
flexibility of wavelets. In the proposed CODEC I, block-based disparity
estimation/compensation (DE/DC) is performed in pixel domain. However, this
results in an inefficiency when DWT is applied on the whole predictive error image
that results from the DE process. This is because of the existence of artificial block
boundaries between error blocks in the predictive error image. To overcome this
problem, in the remaining proposed CODECs, DE/DC is performed in the wavelet
domain. Due to the multiresolution nature of the wavelet domain, two methods of
disparity estimation and compensation have been proposed. The first method is
performing DEJDC in each subband of the lowest/coarsest resolution level and then
propagating the disparity vectors obtained to the corresponding subbands of
higher/finer resolution. Note that DE is not performed in every subband due to the
high overhead bits that could be required for the coding of disparity vectors of all
subbands. This method is being used in CODEC II. In the second method, DEJDC is
performed m the wavelet-block domain. This enables disparity estimation to be
performed m all subbands simultaneously without increasing the overhead bits
required for the coding disparity vectors. This method is used by CODEC III.
However, performing disparity estimation/compensation in all subbands would result
in a significant improvement of CODEC III. To further improve the performance of
CODEC ill, pioneering wavelet-block search technique is implemented in CODEC
IV. The pioneering wavelet-block search technique enables the right/predicted image
to be reconstructed at the decoder end without the need of transmitting the disparity
vectors. In proposed CODEC V, pioneering block search is performed in all subbands
of DWT decomposition which results in an improvement of its performance. Further,
the CODEC IV and V are able to perform at very low bit rates(< 0.15 bpp). In
CODEC VI and CODEC VII, Overlapped Block Disparity Compensation (OBDC) is
used with & without the need of coding disparity vector. Our experiment results
showed that no significant coding gains could be obtained for these CODECs over
CODEC IV & V.
All proposed CODECs m this thesis are wavelet-based stereo image coding
algorithms that maximise the flexibility and benefits offered by wavelet transform
technology when applied to stereo imaging. In addition the use of a baseline-JPEG
coding architecture would enable the easy adaptation of the proposed algorithms
within systems originally built for DCT-based coding. This is an important feature
that would be useful during an era where DCT-based technology is only slowly being
phased out to give way for DWT based compression technology.
In addition, this thesis proposed a stereo image coding algorithm that uses JPEG-2000
technology as the basic compression engine. The proposed CODEC, named RASTER
is a rate scalable stereo image CODEC that has a unique ability to preserve the image
quality at binocular depth boundaries, which is an important requirement in the design
of stereo image CODEC. The experimental results have shown that the proposed
CODEC is able to achieve PSNR gains of up to 3.7 dB as compared to directly
transmitting the right frame using JPEG-2000
A family of stereoscopic image compression algorithms using wavelet transforms
With the standardization of JPEG-2000, wavelet-based image and video
compression technologies are gradually replacing the popular DCT-based methods. In
parallel to this, recent developments in autostereoscopic display technology is now
threatening to revolutionize the way in which consumers are used to enjoying the
traditional 2-D display based electronic media such as television, computer and
movies. However, due to the two-fold bandwidth/storage space requirement of
stereoscopic imaging, an essential requirement of a stereo imaging system is efficient
data compression.
In this thesis, seven wavelet-based stereo image compression algorithms are
proposed, to take advantage of the higher data compaction capability and better
flexibility of wavelets. [Continues.
Autonomous vehicle guidance in unknown environments
Gaining from significant advances in their performance granted by technological evolution, Autonomous Vehicles are rapidly increasing the number of fields of possible and effective applications. From operations in hostile, dangerous environments (military use in removing unexploded projectiles, survey of nuclear power and chemical industrial plants following accidents) to repetitive 24h tasks (border surveillance), from power-multipliers helping in production to less exotic commercial application in household activities (cleaning robots as consumer electronics products), the combination of autonomy and motion offers nowadays impressive options. In fact, an autonomous vehicle can be completed by a number of sensors, actuators, devices making it able to exploit a quite large number of tasks. However, in order to successfully attain these results, the vehicle should be capable to navigate its path in different, sometimes unknown environments. This is the goal of this dissertation: to analyze and - mainly - to propose a suitable solution for the guidance of autonomous vehicles. The frame in which this research takes its steps is the activity carried on at the Guidance and Navigation Lab of Sapienza – Università di Roma, hosted at the School of Aerospace Engineering. Indeed, the solution proposed has an intrinsic, while not limiting, bias towards possible space applications, that will become obvious in some of the following content. A second bias dictated by the Guidance and Navigation Lab activities is
represented by the choice of a sample platform. In fact, it would be difficult to perform a meaningful study keeping it a very general level, independent on the characteristics of the targeted kind of vehicle: it is easy to see from the rough list of applications cited above that these characteristics are extremely varied. The Lab hosted – even before the beginning of this thesis activity – a simple, home-designed and manufactured model of a small, yet performing enough autonomous vehicle, called RAGNO (standing for Rover for Autonomous Guidance Navigation and Observation): it was an obvious choice to select that rover as the reference platform to identify solutions for guidance, and to use it, cooperating to its improvement, for the test activities which should be considered as mandatory in this kind of thesis work to validate the suggested approaches.
The draft of the thesis includes four main chapters, plus introduction, final remarks and future perspectives, and the list of references.
The first chapter (“Autonomous Guidance Exploiting Stereoscopic Vision”) investigates in detail the technique which has been deemed as the most interesting for small vehicles. The current availability of low cost, high performance cameras suggests the adoption of the stereoscopic vision as a quite effective technique, also capable to making available to remote crew a view of the scenario quite similar to the one humans would have. Several advanced image analysis techniques have been investigated for the extraction of the features from left- and right-eye images, with SURF and BRISK algorithm being selected as the most promising one. In short, SURF is a blob detector with an associated descriptor of 64 elements, where the generic feature is extracted by applying sequential box filters to the surrounding area. The features are then localized in the point of the image where the determinant of the Hessian matrix H(x,y) is maximum. The descriptor vector is than determined by calculating the Haar wavelet response in a sampling pattern centered in the feature. BRISK is instead a corner detector with an associated binary descriptor of 512 bit. The generic feature is identified as the brightest point in a sampling circular area of N pixels while the descriptor vector is calculated by computing the brightness gradient of each of the N(N-1)/2 pairs of sampling points. Once left and right features have been extracted, their descriptors are compared in order to determine the corresponding pairs. The matching criterion consists in seeking for the two descriptors for which their relative distance (Euclidean norm for SURF, Hamming distance for BRISK) is minimum. The matching process is computationally expensive: to reduce the required time the thesis successfully explored the theory of the
epipolar geometry, based on the geometric constraint existing between the left and right projection of the scene point P, and indeed limiting the space to be searched. Overall, the selected techniques require between 200 and 300 ms on a 2.4GHz clock CPU for the feature extraction and matching in a single (left+right) capture, making it a feasible solution for slow motion vehicles. Once matching phase has been finalized, a disparity map can be prepared highlighting the position of the identified objects, and by means of a triangulation (the baseline between the two cameras is known, the size of the targeted object is measured in pixels in both images) the position and distance of the obstacles can be obtained.
The second chapter (“A Vehicle Prototype and its Guidance System”) is devoted to the implementation of the stereoscopic vision onboard a small test vehicle, which is the previously cited RAGNO rover. Indeed, a description of the vehicle – the chassis, the propulsion system with four electric motors empowering the wheels, the good roadside performance attainable, the commanding options – either fully autonomous, partly autonomous with remote monitoring, or fully remotely controlled via TCP/IP on mobile networks - is included first, with a focus on different sensors that, depending on the scenario, can integrate the stereoscopic vision system. The intelligence-side of guidance subsystem, exploiting the navigation information provided by the camera, is then detailed. Two guidance techniques have been studied and implemented to identify the optimal trajectory in a field with scattered obstacles: the artificial potential guidance, based on the Lyapunov approach, and the A-star algorithm, looking for the minimum of a cost function built on graphs joining the cells of a mesh over-imposed to the scenario. Performance of the two techniques are assessed for two specific test-cases, and the possibility of unstable behavior of the artificial potential guidance, bouncing among local minima, has been highlighted. Overall, A-star guidance is the suggested solution in terms of time, cost and reliability. Notice that, withstanding the noise affecting information from sensors, an estimation process based on Kalman filtering has been also included in the process to improve the smoothness of the targeted trajectory.
The third chapter (“Examples of Possible Missions and Applications”) reports two experimental campaigns adopting RAGNO for the detection of dangerous gases. In the first one, the rover accommodates a specific sensor, and autonomously moves in open fields, avoiding possible obstacles, to exploit measurements at given time intervals. The same
configuration for RAGNO is also used in the second campaign: this time, however, the path of the rover is autonomously computed on the basis of the way points communicated by a drone which is flying above the area of measurements and identifies possible targets of interest.
The fourth chapter (“Guidance of Fleet of Autonomous Vehicles ”) stresses this successful idea of fleet of vehicles, and numerically investigates by algorithms purposely written in Matlab the performance of a simple swarm of two rovers exploring an unknown scenario, pretending – as an example - to represent a case of planetary surface exploration. The awareness of the surrounding environment is dictated by the characteristics of the sensors accommodated onboard, which have been assumed on the basis of the experience gained with the material of previous chapter. Moreover, the communication issues that would likely affect real world cases are included in the scheme by the possibility to model the comm link, and by running the simulation in a multi-task configuration where the two rovers are assigned to two different computer processes, each of them having a different TCP/IP address with a behavior actually depending on the flow of information received form the other explorer. Even if at a simulation-level only, it is deemed that such a final step collects different aspects investigated during the PhD period, with feasible sensors’ characteristics (obviously focusing on stereoscopic vision), guidance technique, coordination among autonomous agents and possible interesting application cases
Computerised stereoscopic measurement of the human retina
The research described herein is an investigation into the problems of obtaining useful clinical measurements from stereo photographs of the human retina through automation of the stereometric procedure by digital stereo matching and image analysis techniques. Clinical research has indicated a correlation between physical changes to the optic disc topography (the region on the retina where the optic nerve enters the eye) and the advance of eye disease such as hypertension and glaucoma. Stereoscopic photography of the human retina (or fundus, as it is called) and the subsequent measurement of the topography of the optic disc is of great potential clinical value as an aid in observing the pathogenesis of such disease, and to this end, accurate measurements of the various parameters that characterise the changing shape of the optic disc topography must be provided. Following a survey of current clinical methods for stereoscopic measurement of the optic disc, fundus image data acquisition, stereo geometry, limitations of resolution and accuracy, and other relevant physical constraints related to fundus imaging are investigated. A survey of digital stereo matching algorithms is presented and their strengths and weaknesses are explored, specifically as they relate to the suitability of the algorithm for the fundus image data. The selection of an appropriate stereo matching algorithm is discussed, and its application to four test data sets is presented in detail. A mathematical model of two-dimensional image formation is developed together with its corresponding auto-correlation function. In the presense of additive noise, the model is used as a tool for exploring key problems with respect to the stereo matching of fundus images. Specifically, measures for predicting correlation matching error are developed and applied. Such measures are shown to be of use in applications where the results of image correlation cannot be independently verified, and meaningful quantitative error measures are required. The application of these theoretical tools to the fundus image data indicate a systematic way to measure, assess and control cross-correlation error. Conclusions drawn from this research point the way forward for stereo analysis of the optic disc and highlight a number of areas which will require further research. The development of a fully automated system for diagnostic evaluation of the optic disc topography is discussed in the light of the results obtained during this research
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Stereoscopic vision in vehicle navigation.
Traffic sign (TS) detection and tracking is one of the main tasks of an autonomous vehicle which is addressed in the field of computer vision. An autonomous vehicle must have vision based recognition of the road to follow the rules like every other vehicle on the road. Besides, TS detection and tracking can be used to give feedbacks to the driver. This can significantly increase safety in making driving decisions. For a successful TS detection and tracking changes in weather and lighting conditions should be considered. Also, the camera is in motion, which results in image distortion and motion blur. In this work a fast and robust method is proposed for tracking the stop signs in videos taken with stereoscopic cameras that are mounted on the car. Using camera parameters and the detected sign, the distance between the stop sign and the vehicle is calculated. This calculated distance can be widely used in building visual driver-assistance systems
- …