1,084 research outputs found
Predicting Slice-to-Volume Transformation in Presence of Arbitrary Subject Motion
This paper aims to solve a fundamental problem in intensity-based 2D/3D
registration, which concerns the limited capture range and need for very good
initialization of state-of-the-art image registration methods. We propose a
regression approach that learns to predict rotation and translations of
arbitrary 2D image slices from 3D volumes, with respect to a learned canonical
atlas co-ordinate system. To this end, we utilize Convolutional Neural Networks
(CNNs) to learn the highly complex regression function that maps 2D image
slices into their correct position and orientation in 3D space. Our approach is
attractive in challenging imaging scenarios, where significant subject motion
complicates reconstruction performance of 3D volumes from 2D slice data. We
extensively evaluate the effectiveness of our approach quantitatively on
simulated MRI brain data with extreme random motion. We further demonstrate
qualitative results on fetal MRI where our method is integrated into a full
reconstruction and motion compensation pipeline. With our CNN regression
approach we obtain an average prediction error of 7mm on simulated data, and
convincing reconstruction quality of images of very young fetuses where
previous methods fail. We further discuss applications to Computed Tomography
and X-ray projections. Our approach is a general solution to the 2D/3D
initialization problem. It is computationally efficient, with prediction times
per slice of a few milliseconds, making it suitable for real-time scenarios.Comment: 8 pages, 4 figures, 6 pages supplemental material, currently under
review for MICCAI 201
Resolving the projecjion of an occluded stimulus on the human cortical surface
The human visual system is capable of tracking multiple visual targets under a variety of task constraints and configurations. For nearly two decades, the psychophysical literature has shown that moving, occluded visual targets -- targets that are momentarily invisible as they pass behind an occluding bar -- are differentially represented by the visual system compared to their moving, non-occluded counterparts. Here, I sought to examine the neurophysiological basis of this behavioral difference in response to occluded versus non-occluded visual targets. I used brain imaging to conduct modern retinotopic mapping experiments in human participants. Once· their early visual cortices were mapped, I was able characterize the neural representations for both targets and distractors as well as during moments of occlusion and non-occlusion. The results show that, using our method, we can distinguish visual targets from distractors; furthermore, there appears to be a representation in retinotopically organized early visual cortex for visual targets that have momentarily disappeared from the visual field due to occlusion
Self-similarity and wavelet forms for the compression of still image and video data
This thesis is concerned with the methods used to reduce the data volume required to represent
still images and video sequences. The number of disparate still image and video
coding methods increases almost daily. Recently, two new strategies have emerged and
have stimulated widespread research. These are the fractal method and the wavelet transform.
In this thesis, it will be argued that the two methods share a common principle: that
of self-similarity. The two will be related concretely via an image coding algorithm which
combines the two, normally disparate, strategies.
The wavelet transform is an orientation selective transform. It will be shown that the
selectivity of the conventional transform is not sufficient to allow exploitation of self-similarity
while keeping computational cost low. To address this, a new wavelet transform
is presented which allows for greater orientation selectivity, while maintaining the
orthogonality and data volume of the conventional wavelet transform. Many designs for
vector quantizers have been published recently and another is added to the gamut by this
work. The tree structured vector quantizer presented here is on-line and self structuring,
requiring no distinct training phase. Combining these into a still image data compression
system produces results which are among the best that have been published to date.
An extension of the two dimensional wavelet transform to encompass the time dimension
is straightforward and this work attempts to extrapolate some of its properties into three
dimensions. The vector quantizer is then applied to three dimensional image data to
produce a video coding system which, while not optimal, produces very encouraging
results
Learning ultrasound plane pose regression: assessing generalized pose coordinates in the fetal brain
In obstetric ultrasound (US) scanning, the learner's ability to mentally
build a three-dimensional (3D) map of the fetus from a two-dimensional (2D) US
image represents a significant challenge in skill acquisition. We aim to build
a US plane localization system for 3D visualization, training, and guidance
without integrating additional sensors. This work builds on top of our previous
work, which predicts the six-dimensional (6D) pose of arbitrarily-oriented US
planes slicing the fetal brain with respect to a normalized reference frame
using a convolutional neural network (CNN) regression network. Here, we analyze
in detail the assumptions of the normalized fetal brain reference frame and
quantify its accuracy with respect to the acquisition of transventricular (TV)
standard plane (SP) for fetal biometry. We investigate the impact of
registration quality in the training and testing data and its subsequent effect
on trained models. Finally, we introduce data augmentations and larger training
sets that improve the results of our previous work, achieving median errors of
3.53 mm and 6.42 degrees for translation and rotation, respectively.Comment: 12 pages, 9 figures, 2 tables. This work has been submitted to the
IEEE for possible publication (IEEE TMRB). Copyright may be transferred
without notice, after which this version may no longer be accessibl
Automatic face recognition using stereo images
Face recognition is an important pattern recognition problem, in the study of both natural and artificial learning problems. Compaxed to other biometrics, it is non-intrusive, non- invasive and requires no paxticipation from the subjects. As a result, it has many applications varying from human-computer-interaction to access control and law-enforcement to crowd surveillance. In typical optical image based face recognition systems, the systematic vaxiability arising from representing the three-dimensional (3D) shape of a face by a two-dimensional (21)) illumination intensity matrix is treated as random vaxiability. Multiple examples of the face displaying vaxying pose and expressions axe captured in different imaging conditions. The imaging environment, pose and expressions are strictly controlled and the images undergo rigorous normalisation and pre-processing. This may be implemented in a paxtially or a fully automated system. Although these systems report high classification accuracies (>90%), they lack versatility and tend to fail when deployed outside laboratory conditions. Recently, more sophisticated 3D face recognition systems haxnessing the depth information have emerged. These systems usually employ specialist equipment such as laser scanners and structured light projectors. Although more accurate than 2D optical image based recognition, these systems are equally difficult to implement in a non-co-operative environment. Existing face recognition systems, both 2D and 3D, detract from the main advantages of face recognition and fail to fully exploit its non-intrusive capacity. This is either because they rely too much on subject co-operation, which is not always available, or because they cannot cope with noisy data. The main objective of this work was to investigate the role of depth information in face recognition in a noisy environment. A stereo-based system, inspired by the human binocular vision, was devised using a pair of manually calibrated digital off-the-shelf cameras in a stereo setup to compute depth information. Depth values extracted from 2D intensity images using stereoscopy are extremely noisy, and as a result this approach for face recognition is rare. This was cofirmed by the results of our experimental work. Noise in the set of correspondences, camera calibration and triangulation led to inaccurate depth reconstruction, which in turn led to poor classifier accuracy for both 3D surface matching and 211) 2 depth maps. Recognition experiments axe performed on the Sheffield Dataset, consisting 692 images of 22 individuals with varying pose, illumination and expressions
Line geometry and camera autocalibration
We provide a completely new rigorous matrix formulation of the absolute quadratic complex (AQC), given by the set of lines intersecting the absolute conic. The new results include closed-form expressions for the camera intrinsic parameters in terms of the AQC, an algorithm to obtain the dual absolute quadric from the AQC using straightforward matrix operations, and an equally direct computation of a Euclidean-upgrading homography from the AQC. We also completely characterize the 6×6 matrices acting on lines which are induced by a spatial homography. Several algorithmic possibilities arising from the AQC are systematically explored and analyzed in terms of efficiency and computational cost. Experiments include 3D reconstruction from real images
- …