41 research outputs found

    Wavelet based stereo images reconstruction using depth images

    Get PDF
    It is believed by many that three-dimensional (3D) television will be the next logical development toward a more natural and vivid home entertaiment experience. While classical 3D approach requires the transmission of two video streams, one for each view, 3D TV systems based on depth image rendering (DIBR) require a single stream of monoscopic images and a second stream of associated images usually termed depth images or depth maps, that contain per-pixel depth information. Depth map is a two-dimensional function that contains information about distance from camera to a certain point of the object as a function of the image coordinates. By using this depth information and the original image it is possible to reconstruct a virtual image of a nearby viewpoint by projecting the pixels of available image to their locations in 3D space and finding their position in the desired view plane. One of the most significant advantages of the DIBR is that depth maps can be coded more efficiently than two streams corresponding to left and right view of the scene, thereby reducing the bandwidth required for transmission, which makes it possible to reuse existing transmission channels for the transmission of 3D TV. This technique can also be applied for other 3D technologies such as multimedia systems. In this paper we propose an advanced wavelet domain scheme for the reconstruction of stereoscopic images, which solves some of the shortcommings of the existing methods discussed above. We perform the wavelet transform of both the luminance and depth images in order to obtain significant geometric features, which enable more sensible reconstruction of the virtual view. Motion estimation employed in our approach uses Markov random field smoothness prior for regularization of the estimated motion field. The evaluation of the proposed reconstruction method is done on two video sequences which are typically used for comparison of stereo reconstruction algorithms. The results demonstrate advantages of the proposed approach with respect to the state-of-the-art methods, in terms of both objective and subjective performance measures

    Wavelet-based stereo images reconstruction using depth images

    Full text link

    Image Registration to Map Endoscopic Video to Computed Tomography for Head and Neck Radiotherapy Patients

    Get PDF
    The purpose of this work was to explore the feasibility of registering endoscopic video to radiotherapy treatment plans for patients with head and neck cancer without physical tracking of the endoscope during the examination. Endoscopy-CT registration would provide a clinical tool that could be used to enhance the treatment planning process and would allow for new methods to study the incidence of radiation-related toxicity. Endoscopic video frames were registered to CT by optimizing virtual endoscope placement to maximize the similarity between the frame and the virtual image. Virtual endoscopic images were rendered using a polygonal mesh created by segmenting the airways of the head and neck with a density threshold. The optical properties of the virtual endoscope were matched to a calibrated model of the real endoscope. A novel registration algorithm was developed that takes advantage of physical constraints on the endoscope to effectively search the airways of the head and neck for the desired virtual endoscope coordinates. This algorithm was tested on rigid phantoms with embedded point markers and protruding bolus material. In these tests, the median registration accuracy was 3.0 mm for point measurements and 3.5 mm for surface measurements. The algorithm was also tested on four endoscopic examinations of three patients, in which it achieved a median registration accuracy of 9.9 mm. The uncertainties caused by the non-rigid anatomy of the head and neck and differences in patient positioning between endoscopic examinations and CT scans were examined by taking repeated measurements after placing the virtual endoscope in surface meshes created from different CT scans. Non-rigid anatomy introduced errors on the order of 1-3 mm. Patient positioning had a larger impact, introducing errors on the order of 3.5-4.5 mm. Endoscopy-CT registration in the head and neck is possible, but large registration errors were found in patients. The uncertainty analyses suggest a lower limit of 3-5 mm. Further development is required to achieve an accuracy suitable for clinical use

    Enhancing a Neurosurgical Imaging System with a PC-based Video Processing Solution

    Get PDF
    This work presents a PC-based prototype video processing application developed to be used with a specific neurosurgical imaging device, the OPMI® PenteroTM operating microscope, in the Department of Neurosurgery of Helsinki University Central Hospital at Töölö, Helsinki. The motivation for implementing the software was the lack of some clinically important features in the imaging system provided by the microscope. The imaging system is used as an online diagnostic aid during surgery. The microscope has two internal video cameras; one for regular white light imaging and one for near-infrared fluorescence imaging, used for indocyanine green videoangiography. The footage of the microscope’s current imaging mode is accessed via the composite auxiliary output of the device. The microscope also has an external high resolution white light video camera, accessed via a composite output of a separate video hub. The PC was chosen as the video processing platform for its unparalleled combination of prototyping and high-throughput video processing capabilities. A thorough analysis of the platform and efficient video processing methods was conducted in the thesis and the results were used in the design of the imaging station. The features found feasible during the project were incorporated into a video processing application running on a GNU/Linux distribution Ubuntu. The clinical usefulness of the implemented features was ensured beforehand by consulting the neurosurgeons using the original system. The most significant shortcomings of the original imaging system were mended in this work. The key features of the developed application include: live streaming, simultaneous streaming and recording, and playing back of upto two video streams. The playback mode provides full media player controls, with a frame-by-frame precision rewinding, in an intuitive and responsive interface. A single view and a side-by-side comparison mode are provided for the streams. The former gives more detail, while the latter can be used, for example, for before-after and anatomic-angiographic comparisons.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format

    Platforms for handling and development of audiovisual data

    Get PDF
    Estágio realizado na MOG Solutions e orientado por Vítor TeixeiraTese de mestrado integrado. Engenharia Informátca e Computação. Faculdade de Engenharia. Universidade do Porto. 200

    Salient stills

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1992.Includes bibliographical references (leaves 67-70).by Laura A. Teodosio.M.S

    Flat panel display signal processing

    Get PDF
    Televisions (TVs) have shown considerable technological progress since their introduction almost a century ago. Starting out as small, dim and monochrome screens in wooden cabinets, TVs have evolved to large, bright and colorful displays in plastic boxes. It took until the turn of the century, however, for the TV to become like a ‘picture on the wall’. This happened when the bulky Cathode Ray Tube (CRT) was replaced with thin and light-weight Flat Panel Displays (FPDs), such as Liquid Crystal Displays (LCDs) or Plasma Display Panels (PDPs). However, the TV system and transmission formats are still strongly coupled to the CRT technology, whereas FPDs use very different principles to convert the electronic video signal to visible images. These differences result in image artifacts that the CRT never had, but at the same time provide opportunities to improve FPD image quality beyond that of the CRT. This thesis presents an analysis of the properties of flat panel displays, their relation to image quality, and video signal processing algorithms to improve the quality of the displayed images. To analyze different types of displays, the display signal chain is described using basic principles common to all displays. The main function of a display is to create visible images (light) from an electronic signal (video), requiring display chain functions like opto-electronic effect, spatial and temporal addressing and reconstruction, and color synthesis. The properties of these functions are used to describe CRT, LCDs, and PDPs, showing that these displays perform the same functions, using different implementations. These differences have a number of consequences, that are further investigated in this thesis. Spatial and temporal aspects, corresponding to ‘static’ and ‘dynamic’ resolution respectively, are covered in detail. Moreover, video signal processing is an essential part of the display signal chain for FPDs, because the display format will in general no longer match the source format. In this thesis, it is investigated how specific FPD properties, especially related to spatial and temporal addressing and reconstruction, affect the video signal processing chain. A model of the display signal chain is presented, and applied to analyze FPD spatial properties in relation to static resolution. In particular, the effect of the color subpixels, that enable color image reproduction in FPDs, is analyzed. The perceived display resolution is strongly influenced by the color subpixel arrangement. When taken into account in the signal chain, this improves the perceived resolution on FPDs, which clearly outperform CRTs in this respect. The cause and effect of this improvement, also for alternative subpixel arrangements, is studied using the display signal model. However, the resolution increase cannot be achieved without video processing. This processing is efficiently combined with image scaling, which is always required in the FPD display signal chain, resulting in an algorithm called ‘subpixel image scaling’. A comparison of the effects of subpixel scaling on several subpixel arrangements shows that the largest increase in perceived resolution is found for two-dimensional subpixel arrangements. FPDs outperform CRTs with respect to static resolution, but not with respect to ‘dynamic resolution’, i.e. the perceived resolution of moving images. Life-like reproduction of moving images is an important requirement for a TV display, but the temporal properties of FPDs cause artifacts in moving images (‘motion artifacts’), that are not found in CRTs. A model of the temporal aspects of the display signal chain is used to analyze dynamic resolution and motion artifacts on several display types, in particular LCD and PDP. Furthermore, video signal processing algorithms are developed that can reduce motion artifacts and increase the dynamic resolution. The occurrence of motion artifacts is explained by the fact that the human visual system tracks moving objects. This converts temporal effects on the display into perceived spatial effects, that can appear in very different ways. The analysis shows how addressing mismatches in the chain cause motion-dependent misalignment of image data, e.g. resulting in the ‘dynamic false contour’ artifact in PDPs. Also, non-ideal temporal reconstruction results in ‘motion blur’, i.e. a loss of sharpness of moving images, which is typical for LCDs. The relation between motion blur, dynamic resolution, and temporal properties of LCDs is analyzed using the display signal model in the temporal (frequency) domain. The concepts of temporal aperture, motion aperture and temporal display bandwidth are introduced, which enable characterization of motion blur in a simple and direct way. This is applied to compare several motion blur reduction methods, based on modified display design and driving. This thesis further describes the development of several video processing algorithms that can reduce motion artifacts. It is shown that the motion of objects in the image plays an essential role in these algorithms, i.e. they require motion estimation and compensation techniques. In LCDs, video processing for motion artifact reduction involves a compensation for the temporal reconstruction characteristics of the display, leading to the ‘motion compensated inverse filtering’ algorithm. The display chain model is used to analyze this algorithm, and several methods to increase its performance are presented. In PDPs, motion artifact reduction can be achieved with ‘motion compensated subfield generation’, for which an advanced algorithm is presented

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    Learning Representations for Controllable Image Restoration

    Get PDF
    Deep Convolutional Neural Networks have sparked a renaissance in all the sub-fields of computer vision. Tremendous progress has been made in the area of image restoration. The research community has pushed the boundaries of image deblurring, super-resolution, and denoising. However, given a distorted image, most existing methods typically produce a single restored output. The tasks mentioned above are inherently ill-posed, leading to an infinite number of plausible solutions. This thesis focuses on designing image restoration techniques capable of producing multiple restored results and granting users more control over the restoration process. Towards this goal, we demonstrate how one could leverage the power of unsupervised representation learning. Image restoration is vital when applied to distorted images of human faces due to their social significance. Generative Adversarial Networks enable an unprecedented level of generated facial details combined with smooth latent space. We leverage the power of GANs towards the goal of learning controllable neural face representations. We demonstrate how to learn an inverse mapping from image space to these latent representations, tuning these representations towards a specific task, and finally manipulating latent codes in these spaces. For example, we show how GANs and their inverse mappings enable the restoration and editing of faces in the context of extreme face super-resolution and the generation of novel view sharp videos from a single motion-blurred image of a face. This thesis also addresses more general blind super-resolution, denoising, and scratch removal problems, where blur kernels and noise levels are unknown. We resort to contrastive representation learning and first learn the latent space of degradations. We demonstrate that the learned representation allows inference of ground-truth degradation parameters and can guide the restoration process. Moreover, it enables control over the amount of deblurring and denoising in the restoration via manipulation of latent degradation features
    corecore