681 research outputs found

    A novel filter for block-based motion estimation

    Get PDF
    Noises, in the form of false motion vectors, cannot be avoided while capturing block motion vectors using block based motion estimation techniques. Similar noises are further introduced when the technique of global motion compensation is applied to obtain 'true' object motion from video sequences, where both the camera and object motions are present. We observe that the performance of the mean and the median filters in removing false motion vectors, for estimating 'true' object motion, is not satisfactory, especially when the size of the object is significantly smaller than the scene. In this paper we introduce a novel filter, named as the Mean-Accumulated-Thresholded (MAT) filter, in order to capture 'true' object motion vectors from video sequences with or without the camera motion (zoom and/or pan). Experimental results on representative standard video sequences are included to establish the superiority of our filter compared with the traditional median and mean filters

    Toward sparse and geometry adapted video approximations

    Get PDF
    Video signals are sequences of natural images, where images are often modeled as piecewise-smooth signals. Hence, video can be seen as a 3D piecewise-smooth signal made of piecewise-smooth regions that move through time. Based on the piecewise-smooth model and on related theoretical work on rate-distortion performance of wavelet and oracle based coding schemes, one can better analyze the appropriate coding strategies that adaptive video codecs need to implement in order to be efficient. Efficient video representations for coding purposes require the use of adaptive signal decompositions able to capture appropriately the structure and redundancy appearing in video signals. Adaptivity needs to be such that it allows for proper modeling of signals in order to represent these with the lowest possible coding cost. Video is a very structured signal with high geometric content. This includes temporal geometry (normally represented by motion information) as well as spatial geometry. Clearly, most of past and present strategies used to represent video signals do not exploit properly its spatial geometry. Similarly to the case of images, a very interesting approach seems to be the decomposition of video using large over-complete libraries of basis functions able to represent salient geometric features of the signal. In the framework of video, these features should model 2D geometric video components as well as their temporal evolution, forming spatio-temporal 3D geometric primitives. Through this PhD dissertation, different aspects on the use of adaptivity in video representation are studied looking toward exploiting both aspects of video: its piecewise nature and the geometry. The first part of this work studies the use of localized temporal adaptivity in subband video coding. This is done considering two transformation schemes used for video coding: 3D wavelet representations and motion compensated temporal filtering. A theoretical R-D analysis as well as empirical results demonstrate how temporal adaptivity improves coding performance of moving edges in 3D transform (without motion compensation) based video coding. Adaptivity allows, at the same time, to equally exploit redundancy in non-moving video areas. The analogy between motion compensated video and 1D piecewise-smooth signals is studied as well. This motivates the introduction of local length adaptivity within frame-adaptive motion compensated lifted wavelet decompositions. This allows an optimal rate-distortion performance when video motion trajectories are shorter than the transformation "Group Of Pictures", or when efficient motion compensation can not be ensured. After studying temporal adaptivity, the second part of this thesis is dedicated to understand the fundamentals of how can temporal and spatial geometry be jointly exploited. This work builds on some previous results that considered the representation of spatial geometry in video (but not temporal, i.e, without motion). In order to obtain flexible and efficient (sparse) signal representations, using redundant dictionaries, the use of highly non-linear decomposition algorithms, like Matching Pursuit, is required. General signal representation using these techniques is still quite unexplored. For this reason, previous to the study of video representation, some aspects of non-linear decomposition algorithms and the efficient decomposition of images using Matching Pursuits and a geometric dictionary are investigated. A part of this investigation concerns the study on the influence of using a priori models within approximation non-linear algorithms. Dictionaries with a high internal coherence have some problems to obtain optimally sparse signal representations when used with Matching Pursuits. It is proved, theoretically and empirically, that inserting in this algorithm a priori models allows to improve the capacity to obtain sparse signal approximations, mainly when coherent dictionaries are used. Another point discussed in this preliminary study, on the use of Matching Pursuits, concerns the approach used in this work for the decompositions of video frames and images. The technique proposed in this thesis improves a previous work, where authors had to recur to sub-optimal Matching Pursuit strategies (using Genetic Algorithms), given the size of the functions library. In this work the use of full search strategies is made possible, at the same time that approximation efficiency is significantly improved and computational complexity is reduced. Finally, a priori based Matching Pursuit geometric decompositions are investigated for geometric video representations. Regularity constraints are taken into account to recover the temporal evolution of spatial geometric signal components. The results obtained for coding and multi-modal (audio-visual) signal analysis, clarify many unknowns and show to be promising, encouraging to prosecute research on the subject

    Computerized Analysis of Magnetic Resonance Images to Study Cerebral Anatomy in Developing Neonates

    Get PDF
    The study of cerebral anatomy in developing neonates is of great importance for the understanding of brain development during the early period of life. This dissertation therefore focuses on three challenges in the modelling of cerebral anatomy in neonates during brain development. The methods that have been developed all use Magnetic Resonance Images (MRI) as source data. To facilitate study of vascular development in the neonatal period, a set of image analysis algorithms are developed to automatically extract and model cerebral vessel trees. The whole process consists of cerebral vessel tracking from automatically placed seed points, vessel tree generation, and vasculature registration and matching. These algorithms have been tested on clinical Time-of- Flight (TOF) MR angiographic datasets. To facilitate study of the neonatal cortex a complete cerebral cortex segmentation and reconstruction pipeline has been developed. Segmentation of the neonatal cortex is not effectively done by existing algorithms designed for the adult brain because the contrast between grey and white matter is reversed. This causes pixels containing tissue mixtures to be incorrectly labelled by conventional methods. The neonatal cortical segmentation method that has been developed is based on a novel expectation-maximization (EM) method with explicit correction for mislabelled partial volume voxels. Based on the resulting cortical segmentation, an implicit surface evolution technique is adopted for the reconstruction of the cortex in neonates. The performance of the method is investigated by performing a detailed landmark study. To facilitate study of cortical development, a cortical surface registration algorithm for aligning the cortical surface is developed. The method first inflates extracted cortical surfaces and then performs a non-rigid surface registration using free-form deformations (FFDs) to remove residual alignment. Validation experiments using data labelled by an expert observer demonstrate that the method can capture local changes and follow the growth of specific sulcus

    On motion in dynamic magnetic resonance imaging: Applications in cardiac function and abdominal diffusion

    Get PDF
    La imagen por resonancia magnética (MRI), hoy en día, representa una potente herramienta para el diagnóstico clínico debido a su flexibilidad y sensibilidad a un amplio rango de propiedades del tejido. Sus principales ventajas son su sobresaliente versatilidad y su capacidad para proporcionar alto contraste entre tejidos blandos. Gracias a esa versatilidad, la MRI se puede emplear para observar diferentes fenómenos físicos dentro del cuerpo humano combinando distintos tipos de pulsos dentro de la secuencia. Esto ha permitido crear distintas modalidades con múltiples aplicaciones tanto biológicas como clínicas. La adquisición de MR es, sin embargo, un proceso lento, lo que conlleva una solución de compromiso entre resolución y tiempo de adquisición (Lima da Cruz, 2016; Royuela-del Val, 2017). Debido a esto, la presencia de movimiento fisiológico durante la adquisición puede conllevar una grave degradación de la calidad de imagen, así como un incremento del tiempo de adquisición, aumentando así tambien la incomodidad del paciente. Esta limitación práctica representa un gran obstáculo para la viabilidad clínica de la MRI. En esta Tesis Doctoral se abordan dos problemas de interés en el campo de la MRI en los que el movimiento fisiológico tiene un papel protagonista. Éstos son, por un lado, la estimación robusta de parámetros de rotación y esfuerzo miocárdico a partir de imágenes de MR-Tagging dinámica para el diagnóstico y clasificación de cardiomiopatías y, por otro, la reconstrucción de mapas del coeficiente de difusión aparente (ADC) a alta resolución y con alta relación señal a ruido (SNR) a partir de adquisiciones de imagen ponderada en difusión (DWI) multiparamétrica en el hígado.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaDoctorado en Tecnologías de la Información y las Telecomunicacione

    Nouvelles méthodes de prédiction inter-images pour la compression d’images et de vidéos

    Get PDF
    Due to the large availability of video cameras and new social media practices, as well as the emergence of cloud services, images and videosconstitute today a significant amount of the total data that is transmitted over the internet. Video streaming applications account for more than 70% of the world internet bandwidth. Whereas billions of images are already stored in the cloud and millions are uploaded every day. The ever growing streaming and storage requirements of these media require the constant improvements of image and video coding tools. This thesis aims at exploring novel approaches for improving current inter-prediction methods. Such methods leverage redundancies between similar frames, and were originally developed in the context of video compression. In a first approach, novel global and local inter-prediction tools are associated to improve the efficiency of image sets compression schemes based on video codecs. By leveraging a global geometric and photometric compensation with a locally linear prediction, significant improvements can be obtained. A second approach is then proposed which introduces a region-based inter-prediction scheme. The proposed method is able to improve the coding performances compared to existing solutions by estimating and compensating geometric and photometric distortions on a semi-local level. This approach is then adapted and validated in the context of video compression. Bit-rate improvements are obtained, especially for sequences displaying complex real-world motions such as zooms and rotations. The last part of the thesis focuses on deep learning approaches for inter-prediction. Deep neural networks have shown striking results for a large number of computer vision tasks over the last years. Deep learning based methods proposed for frame interpolation applications are studied here in the context of video compression. Coding performance improvements over traditional motion estimation and compensation methods highlight the potential of these deep architectures.En raison de la grande disponibilité des dispositifs de capture vidéo et des nouvelles pratiques liées aux réseaux sociaux, ainsi qu’à l’émergence desservices en ligne, les images et les vidéos constituent aujourd’hui une partie importante de données transmises sur internet. Les applications de streaming vidéo représentent ainsi plus de 70% de la bande passante totale de l’internet. Des milliards d’images sont déjà stockées dans le cloud et des millions y sont téléchargés chaque jour. Les besoins toujours croissants en streaming et stockage nécessitent donc une amélioration constante des outils de compression d’image et de vidéo. Cette thèse vise à explorer des nouvelles approches pour améliorer les méthodes actuelles de prédiction inter-images. De telles méthodes tirent parti des redondances entre images similaires, et ont été développées à l’origine dans le contexte de la vidéo compression. Dans une première partie, de nouveaux outils de prédiction inter globaux et locaux sont associés pour améliorer l’efficacité des schémas de compression de bases de données d’image. En associant une compensation géométrique et photométrique globale avec une prédiction linéaire locale, des améliorations significatives peuvent être obtenues. Une seconde approche est ensuite proposée qui introduit un schéma deprédiction inter par régions. La méthode proposée est en mesure d’améliorer les performances de codage par rapport aux solutions existantes en estimant et en compensant les distorsions géométriques et photométriques à une échelle semi locale. Cette approche est ensuite adaptée et validée dans le cadre de la compression vidéo. Des améliorations en réduction de débit sont obtenues, en particulier pour les séquences présentant des mouvements complexes réels tels que des zooms et des rotations. La dernière partie de la thèse se concentre sur l’étude des méthodes d’apprentissage en profondeur dans le cadre de la prédiction inter. Ces dernières années, les réseaux de neurones profonds ont obtenu des résultats impressionnants pour un grand nombre de tâches de vision par ordinateur. Les méthodes basées sur l’apprentissage en profondeur proposéesà l’origine pour de l’interpolation d’images sont étudiées ici dans le contexte de la compression vidéo. Des améliorations en terme de performances de codage sont obtenues par rapport aux méthodes d’estimation et de compensation de mouvements traditionnelles. Ces résultats mettent en évidence le fort potentiel de ces architectures profondes dans le domaine de la compression vidéo

    Novel Motion Anchoring Strategies for Wavelet-based Highly Scalable Video Compression

    Full text link
    This thesis investigates new motion anchoring strategies that are targeted at wavelet-based highly scalable video compression (WSVC). We depart from two practices that are deeply ingrained in existing video compression systems. Instead of the commonly used block motion, which has poor scalability attributes, we employ piecewise-smooth motion together with a highly scalable motion boundary description. The combination of this more “physical” motion description together with motion discontinuity information allows us to change the conventional strategy of anchoring motion at target frames to anchoring motion at reference frames, which improves motion inference across time. In the proposed reference-based motion anchoring strategies, motion fields are mapped from reference to target frames, where they serve as prediction references; during this mapping process, disoccluded regions are readily discovered. Observing that motion discontinuities displace with foreground objects, we propose motion-discontinuity driven motion mapping operations that handle traditionally challenging regions around moving objects. The reference-based motion anchoring exposes an intricate connection between temporal frame interpolation (TFI) and video compression. When employed in a compression system, all anchoring strategies explored in this thesis perform TFI once all residual information is quantized to zero at a given temporal level. The interpolation performance is evaluated on both natural and synthetic sequences, where we show favourable comparisons with state-of-the-art TFI schemes. We explore three reference-based motion anchoring strategies. In the first one, the motion anchoring is “flipped” with respect to a hierarchical B-frame structure. We develop an analytical model to determine the weights of the different spatio-temporal subbands, and assess the suitability and benefits of this reference-based WSVC for (highly scalable) video compression. Reduced motion coding cost and improved frame prediction, especially around moving objects, result in improved rate-distortion performance compared to a target-based WSVC. As the thesis evolves, the motion anchoring is progressively simplified to one where all motion is anchored at one base frame; this central motion organization facilitates the incorporation of higher-order motion models, which improve the prediction performance in regions following motion with non-constant velocity

    Utilization of improved recursive-shortest-spanning-tree method for video object segmentation

    Get PDF
    Ankara : Department of Electrical and Electronics Engineering and Institute of Engineering and Sciences, Bilkent Univ., 1997.Thesis(Master's) -- Bilkent University, 1997.Includes bibliographical references leaves 77-81Emerging standards MPEG-4 and MPEG-7 do not standardize the video object segmentation tools, although their performance depends on them. There are a lot of still image segmentation algorithms in the literature, like clustering, split-and-merge, region merging, etc. One of these methods, namely the recursive shortest spanning tree (RSST) method, is improved so that a still image is approximated as a piecewise planar function, and well-approximated areas on the image are extracted cis regions. A novel video object segmentation algorithm, which takes the previously estimated 2-D dense motion vector field as input, and uses this improved RSST method to approximate each component of the motion vector field as a piecewise planar function, is proposed. The algorithm is successful in locating 3-D planar objects in the scene correctly, with acceptable accuracy at the boundaries. Unlike the existing algorithms in the literature, the proposed algorithm is fast, parameter-free and requires no initial guess about the segmentation result. Moreover, it is a hierarchical scheme which gives finest to coarsest segmentation results. The proposed algorithm is inserted into the current version of the emerging “Analysis Model (AM)” of the Europan COST21U'’’ project, and it is observed that the current AM is outperformed.Tuncel, ErtemM.S
    • …
    corecore