31 research outputs found

    Video Deinterlacing using Control Grid Interpolation Frameworks

    Get PDF
    abstract: Video deinterlacing is a key technique in digital video processing, particularly with the widespread usage of LCD and plasma TVs. This thesis proposes a novel spatio-temporal, non-linear video deinterlacing technique that adaptively chooses between the results from one dimensional control grid interpolation (1DCGI), vertical temporal filter (VTF) and temporal line averaging (LA). The proposed method performs better than several popular benchmarking methods in terms of both visual quality and peak signal to noise ratio (PSNR). The algorithm performs better than existing approaches like edge-based line averaging (ELA) and spatio-temporal edge-based median filtering (STELA) on fine moving edges and semi-static regions of videos, which are recognized as particularly challenging deinterlacing cases. The proposed approach also performs better than the state-of-the-art content adaptive vertical temporal filtering (CAVTF) approach. Along with the main approach several spin-off approaches are also proposed each with its own characteristics.Dissertation/ThesisM.S. Electrical Engineering 201

    Mobility of Nano-Particles in Rock Based Micro-Models

    Get PDF
    A confocal micro-particle image velocimetry (C-μPIV) technique along with associated post-processing algorithms is detailed for obtaining three dimensional distributions of nano-particle velocity and concentrations at select locations of the 2.5D (pseudo 3D) Poly(methyl methacrylate) (PMMA) and ceramic micro-model. The designed and fabricated 2.5D micro-model incorporates microchannel networks with 3D wall structures with one at observation wall which resembles fourteen morphological and flow parameters to those of fully 3D actual reservoir rock (Boise Sandstone) at resolutions of 5 and 10 μm in depth and 5 and 25 μm on plane. In addition, an in-situ, non-destructive method for measuring the geometry of low and high resolution PMMA and ceramic micro-models, including its depth, is described and demonstrated. The flow experiments use 860 nm and 300 nm fluorescence-labeled polystyrene particles, and the data is acquired using confocal laser scanning microscopy. Regular fluorescence microscopy is used for the in-situ geometry measurement along with the use of Rhodamine dye and a depth-to-fluorescence-intensity calibration, which is linear. Monochromatic excitation at a wavelength of 544 nm (green) produced by a HeNe continuous wave laser was used to excite the fluorescence-labeled nanoparticles emitting at 612 nm (red). Confocal images were captured by a highly sensitive fluorescence detector photomultiplier tube. Results of detailed three dimensional velocity, particle concentration distributions, and particle deposition rates from experiments conducted at flow rates of 0.5 nL/min, 1 nL/min, 10 nL/min and 100 nL/min are presented and discussed. The three dimensional micro-model geometry reconstructed from fluorescence data is used as the computational domain to conduct numerical simulations of the flow in the as-tested micro-model for comparisons to experimental results using dimensionless Navier-Stokes model. The flow simulation results are also used to qualitatively compare with velocity distributions of the flowing particles at selected locations. The comparison is qualitative because the particle sizes used in these experiments may not accurately follow the flow itself given the geometry of the micro-models. These larger particles were used for proof of concept purposes, and the techniques and algorithms used permit future use of particles as small as 50 nm

    DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

    Get PDF
    In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM

    Image Registration to Map Endoscopic Video to Computed Tomography for Head and Neck Radiotherapy Patients

    Get PDF
    The purpose of this work was to explore the feasibility of registering endoscopic video to radiotherapy treatment plans for patients with head and neck cancer without physical tracking of the endoscope during the examination. Endoscopy-CT registration would provide a clinical tool that could be used to enhance the treatment planning process and would allow for new methods to study the incidence of radiation-related toxicity. Endoscopic video frames were registered to CT by optimizing virtual endoscope placement to maximize the similarity between the frame and the virtual image. Virtual endoscopic images were rendered using a polygonal mesh created by segmenting the airways of the head and neck with a density threshold. The optical properties of the virtual endoscope were matched to a calibrated model of the real endoscope. A novel registration algorithm was developed that takes advantage of physical constraints on the endoscope to effectively search the airways of the head and neck for the desired virtual endoscope coordinates. This algorithm was tested on rigid phantoms with embedded point markers and protruding bolus material. In these tests, the median registration accuracy was 3.0 mm for point measurements and 3.5 mm for surface measurements. The algorithm was also tested on four endoscopic examinations of three patients, in which it achieved a median registration accuracy of 9.9 mm. The uncertainties caused by the non-rigid anatomy of the head and neck and differences in patient positioning between endoscopic examinations and CT scans were examined by taking repeated measurements after placing the virtual endoscope in surface meshes created from different CT scans. Non-rigid anatomy introduced errors on the order of 1-3 mm. Patient positioning had a larger impact, introducing errors on the order of 3.5-4.5 mm. Endoscopy-CT registration in the head and neck is possible, but large registration errors were found in patients. The uncertainty analyses suggest a lower limit of 3-5 mm. Further development is required to achieve an accuracy suitable for clinical use

    Enhanced video indirect ophthalmoscopy (VIO) via robust mosaicing

    Get PDF
    Indirect ophthalmoscopy (IO) is the standard of care for evaluation of the neonatal retina. When recorded on video from a head-mounted camera, IO images have low quality and narrow Field of View (FOV). We present an image fusion methodology for converting a video IO recording into a single, high quality, wide-FOV mosaic that seamlessly blends the best frames in the video. To this end, we have developed fast and robust algorithms for automatic evaluation of video quality, artifact detection and removal, vessel mapping, registration, and multi-frame image fusion. Our experiments show the effectiveness of the proposed methods

    Efficient implementation of video processing algorithms on FPGA

    Get PDF
    The work contained in this portfolio thesis was carried out as part of an Engineering Doctorate (Eng.D) programme from the Institute for System Level Integration. The work was sponsored by Thales Optronics, and focuses on issues surrounding the implementation of video processing algorithms on field programmable gate arrays (FPGA). A description is given of FPGA technology and the currently dominant methods of designing and verifying firmware. The problems of translating a description of behaviour into one of structure are discussed, and some of the latest methodologies for tackling this problem are introduced. A number of algorithms are then looked at, including methods of contrast enhancement, deconvolution, and image fusion. Algorithms are characterised according to the nature of their execution flow, and this is used as justification for some of the design choices that are made. An efficient method of performing large two-dimensional convolutions is also described. The portfolio also contains a discussion of an FPGA implementation of a PID control algorithm, an overview of FPGA dynamic reconfigurability, and the development of a demonstration platform for rapid deployment of video processing algorithms in FPGA hardware

    Salient point region covariance descriptor for target tracking

    Get PDF
    Cataloged from PDF version of article.Features extracted at salient points are used to construct a region covariance descriptor (RCD) for target tracking. In the classical approach, the RCD is computed by using the features at each pixel location, which increases the computational cost in many cases. This approach is redundant because image statistics do not change significantly between neighboring image pixels. Furthermore, this redundancy may decrease tracking accuracy while tracking large targets because statistics of flat regions dominate region covariance matrix. In the proposed approach, salient points are extracted via the Shi and Tomasi’s minimum eigenvalue method over a Hessian matrix, and the RCD features extracted only at these salient points are used in target tracking. Experimental results indicate that the salient point RCD scheme provides comparable and even better tracking results compared to a classical RCD-based approach, scale-invariant feature transform, and speeded-up robust features-based trackers while providing a computationally more efficient structure. © 2013 Society of Photo-Optical Instrumentation Engineers (SPIE) [DOI: 10 .1117/1.OE.52.2.027207

    Multimodal feature extraction and fusion for audio-visual speech recognition

    Get PDF
    Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two-fold. First, as the modalities are usually complementary, the end-result of multimodal processing is more informative than for each of the modalities individually, which represents the first advantage. This is true in all application domains: human-machine interaction, multimodal identification or multimodal image processing. The second advantage is that, as modalities are not always reliable, it is possible, when one modality becomes corrupted, to extract the missing information from the other one. There are two essential challenges in multimodal signal processing. First, the features used from each modality need to be as relevant and as few as possible. The fact that multimodal systems have to process more than just one modality means that they can run into errors caused by the curse of dimensionality much more easily than mono-modal ones. The curse of dimensionality is a term used essentially to say that the number of equally-distributed samples required to cover a region of space grows exponentially with the dimensionality of the space. This has important implications in the classification domain, since accurate models can only be obtained if an adequate number of samples is available, and obviously this required number of samples grows with the dimensionality of the features. Dimensionality reduction is thus a necessary step in any application dealing with complex signals, and this is achieved through selection, transforms or the combination of the two. The second essential challenge is multimodal integration. Since the signals involved do not necessarily have the same data rate, range or even dimensionality, combining information coming from such different sources is not straightforward. This can be done at different levels, starting from the basic signal level by combining the signals themselves, if they are compatible, up to the highest decision level, where only the individual decisions taken based on the signals are combined. Ideally, the fusion method should allow temporal variations in the relative importance of the two streams, to account for possible changes in their quality. However, this can only be done with methods operating at a high decision level. The aim of this thesis is to offer solutions to both these challenges, in the context of audio-visual speech recognition and speaker localization. Both these applications are from the field of human-machine interaction. Audio-visual speech recognition aims to improve the accuracy of speech recognizers by augmenting the audio with information extracted from the video, more particularly, the movement of the speaker's lips. This works well especially when the audio is corrupted, leading in this case to significant gains in accuracy. Speaker localization means detecting who is the active speaker in a audio-video sequence containing several persons, something that is useful for videoconferencing and the automated annotation of meetings. These two applications are the context in which we present our solutions to both feature selection and multimodal integration. First, we show how informative features can be extracted from the visual modality, using an information-theoretic framework which gives us a quantitative measure of the relevance of individual features. We also prove that reducing redundancy between these features is important for avoiding the curse of dimensionality and improving recognition results. The methods that we present are novel in the field of audio-visual speech recognition and we found that their use leads to significant improvements compared to the state of the art. Second, we present a method of multimodal fusion at the level of intermediate decisions using a weight for each of the streams. The weights are adaptive, changing according to the estimated reliability of each stream. This makes the system tolerant to changes in the quality of either stream, and even to the temporary interruption of one of the streams. The reliability estimate is based on the entropy of the posterior probability distributions of each stream at the intermediate decision level. Our results are superior to those obtained with a state of the art method based on maximizing the same posteriors. Moreover, we analyze the effect of a constraint typically imposed on stream weights in the literature, the constraint that they should sum to one. Our results show that removing this constraint can lead to improvements in recognition accuracy. Finally, we develop a method for audio-visual speaker localization, based on the correlation between audio energy and the movement of the speaker's lips. Our method is based on a joint probability model of the audio and video which is used to build a likelihood map showing the likely positions of the speaker's mouth. We show that our novel method performs better than a similar method from the literature. In conclusion, we analyze two different challenges of multimodal signal processing for two audio-visual problems, and offer innovative approaches for solving them

    Multiresolution models in image restoration and reconstruction with medical and other applications

    Get PDF
    corecore