64 research outputs found

    Sur la Restauration et l'Edition de Vidéo : Détection de Rayures et Inpainting de Scènes Complexes

    Get PDF
    The inevitable degradation of visual content such as images and films leads to the goal ofimage and video restoration. In this thesis, we look at two specific restoration problems : the detection ofline scratches in old films and the automatic completion of videos, or video inpainting as it is also known.Line scratches are caused when the film physically rubs against a mechanical part. This origin resultsin the specific characteristics of the defect, such as verticality and temporal persistence. We propose adetection algorithm based on the statistical approach known as a contrario methods. We also proposea temporal filtering step to remove false alarms present in the first detection step. Comparisons withprevious work show improved recall and precision, and robustness with respect to the presence of noiseand clutter in the film.The second part of the thesis concerns video inpainting. We propose an algorithm based on theminimisation of a patch-based functional of the video content. In this framework, we address the followingproblems : extremely high execution times, the correct handling of textures in the video and inpaintingwith moving cameras. We also address some convergence issues in a very simplified inpainting context.La degradation inévitable des contenus visuels (images, films) conduit nécessairementà la tâche de la restauration des images et des vidéos. Dans cetre thèse, nous nous intéresserons àdeux sous-problèmes de restauration : la détection des rayures dans les vieux films, et le remplissageautomatique des vidéos (“inpainting vidéo en anglais).En général, les rayures sont dues aux frottements de la pellicule du film avec un objet lors de laprojection du film. Les origines physiques de ce défaut lui donnent des caractéristiques très particuliers.Les rayures sont des lignes plus ou moins verticales qui peuvent être blanches ou noires (ou parfois encouleur) et qui sont temporellement persistantes, c’est-à-dire qu’elles ont une position qui est continuedans le temps. Afin de détecter ces défauts, nous proposons d’abord un algorithme de détection basésur un ensemble d’approches statistiques appelées les méthodes a contrario. Cet algorithme fournitune détection précise et robuste aux bruits et aux textures présentes dans l’image. Nous proposonségalement une étape de filtrage temporel afin d’écarter les fausses alarmes de la première étape dedétection. Celle-ci améliore la précision de l’algorithme en analysant le mouvement des détections spatiales.L’ensemble de l’algorithme (détection spatiale et filtrage temporel) est comparé à des approchesde la littérature et montre un rappel et une précision grandement améliorés.La deuxième partie de cette thèse est consacrée à l’inpainting vidéo. Le but ici est de remplirune région d’une vidéo avec un contenu qui semble visuellement cohérent et convaincant. Il existeune pléthore de méthodes qui traite ce problème dans le cas des images. La littérature dans le casdes vidéos est plus restreinte, notamment car le temps d’exécution représente un véritable obstacle.Nous proposons un algorithme d’inpainting vidéo qui vise l’optimisation d’une fonctionnelle d’énergiequi intègre la notion de patchs, c’est-à-dire des petits cubes de contenu vidéo. Nous traitons d’abord leprobl’‘eme du temps d’exécution avant d’attaquer celui de l’inpainting satisfaisant des textures dans lesvidéos. Nous traitons également le cas des vidéos dont le fond est en mouvement ou qui ont été prisesavec des caméras en mouvement. Enfin, nous nous intéressons à certaines questions de convergencede l’algorithme dans des cas très simplifiés

    Estabilização digital de vídeos : algoritmos e avaliação

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O desenvolvimento de equipamentos multimídia permitiu um crescimento significativo na produção de vídeos por meio de câmeras, celulares e outros dispositivos móveis. No entanto, os vídeos capturados por esses dispositivos estão sujeitos a movimentos indesejados devido à vibração da câmera. Para superar esse problema, a estabilização digital visa remover o movimento indesejado dos vídeos pela aplicação de ferramentas computacionais, sem o uso de hardware específico, para melhorar a qualidade visual das cenas de forma a melhorar aspectos do vídeo segundo a percepção humana ou facilitar aplicações finais, como detecção e rastreamento de objetos. O processo de estabilização digital de vídeos bidimensional geralmente é dividido em três etapas principais: estimativa de movimento da câmera, remoção do movimento indesejado e geração do vídeo corrigido. Neste trabalho, investigamos e avaliamos métodos de estabilização digital de vídeos para corrigir vibrações e instabilidades que ocorrem durante o processo de aquisição. Na etapa de estimativa de movimento, desenvolvemos e analisamos um método consensual para combinar um conjunto de técnicas de características locais para estimativa do movimento global. Também apresentamos e testamos uma nova abordagem que identifica falhas na estimativa do movimento da câmera por meio de técnicas de otimização e calcula uma estimativa corrigida. Na etapa de remoção do movimento indesejável, propomos e avaliamos uma nova abordagem para estabilização de vídeos com base em um filtro Gaussiano adaptativo para suavizar a trajetória da câmera. Devido a incoerências existentes nas medidas de avaliação disponíveis na literatura em relação à percepção humana, duas representações são propostas para avaliar qualitativamente os métodos de estabilização de vídeos: a primeira baseia-se em ritmos visuais e representa o comportamento do movimento do vídeo, enquanto que a segunda é baseada na imagem da energia do movimento e representa a quantidade de movimento presente no vídeo. Experimentos foram realizados em três bases de dados. A primeira consiste em onze vídeos disponíveis na base de dados GaTech VideoStab e outros três vídeos coletados separadamente. A segunda, proposta por Liu et al., consiste em 139 vídeos divididos em diferentes categorias. Finalmente, propomos uma base de dados complementar às demais, composta a partir de quatro vídeos coletados separadamente. Trechos dos vídeos originais com presença de objetos em movimento e com fundo pouco representativo foram extraídos, gerando-se um total de oito vídeos. Resultados experimentais demonstraram a eficácia das representações visuais como medida qualitativa para avaliar a estabilidade dos vídeos, bem como o método de combinação de características locais. O método proposto baseado em otimização foi capaz de detectar e corrigir falhas de estimativa de movimento, obtendo resultados significativamente superiores em relação à não aplicação dessa correção. O filtro Gaussiano adaptativo permitiu gerar vídeos com equilíbrio adequado entre a taxa de estabilização e a quantidade de pixels preservados nos quadros dos vídeos. Os resultados alcançados como o nosso método de otimização nos vídeos da base de dados proposta foram superiores aos obtidos pelo método implementado no YouTubeAbstract: The development of multimedia equipments has allowed a significant growth in the production of videos through professional and amateur cameras, smartphones and other mobile devices. However, videos captured by these devices are subject to unwanted vibrations due to camera shaking. To overcome such problem, digital stabilization aims to remove undesired motion from videos through software techniques, without the use of specific hardware, to enhance visual quality either with the intention of enhancing human perception or improving final applications, such as detection and tracking of objects. The two-dimensional digital video stabilization process is usually divided into three main steps: camera motion estimation, removal of unwanted motion, and generation of the corrected video. In this work, we investigate and evaluate digital video stabilization methods for correcting disturbances and instabilities that occur during the process of video acquisition. In the motion estimation step, we develop and analyzed a consensual method for combining a set of local feature techniques for global motion estimation. We also introduce and test a novel approach that identifies failures in the global motion estimation of the camera through optimization and computes a new estimate of the corrected motion. In the removal of unwanted motion step, we propose and evaluate a novel approach to video stabilization based on an adaptive Gaussian filter to smooth the camera path. Due to the incoherence of assessment measures available in the literature regarding human perception, two novel representations are proposed for qualitative evaluation of video stabilization methods: the first is based on the visual rhythms and represents the behavior of the video motion, whereas the second is based on the motion energy image and represents the amount of motion present in the video. Experiments are conducted on three video databases. The first consists of eleven videos available from the GaTech VideoStab database, and three other videos collected separately. The second, proposed by Liu et al., consists of 139 videos divided into different categories. Finally, we propose a database that is complementary to the others, composed from four videos collected separately, which are excerpts from the original videos with moving objects in the foreground and with little representative background extracted, resulting in eight final videos. Experimental results demonstrated the effectiveness of the visual representations as qualitative measure for evaluating video stability, as well as the combination method over individual local feature approaches. The proposed method based on optimization was able to detect and correct the motion estimation failures, achieving considerably superior results compared to when this correction is not applied. The adaptive Gaussian filter allowed to generate videos with adequate trade-off between stabilization rate and amount of frame pixels. The results reached with our optimization method for the videos of the proposed database were superior to those obtained with YouTube's state-of-the-art methodMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE

    Large Scale Inverse Problems

    Get PDF
    This book is thesecond volume of a three volume series recording the "Radon Special Semester 2011 on Multiscale Simulation &amp Analysis in Energy and the Environment" that took placein Linz, Austria, October 3-7, 2011. This volume addresses the common ground in the mathematical and computational procedures required for large-scale inverse problems and data assimilation in forefront applications. The solution of inverse problems is fundamental to a wide variety of applications such as weather forecasting, medical tomography, and oil exploration. Regularisation techniques are needed to ensure solutions of sufficient quality to be useful, and soundly theoretically based. This book addresses the common techniques required for all the applications, and is thus truly interdisciplinary. This collection of survey articles focusses on the large inverse problems commonly arising in simulation and forecasting in the earth sciences

    Deeply Learned Priors for Geometric Reconstruction

    Get PDF
    This thesis comprises of a body of work that investigates the use of deeply learned priors for dense geometric reconstruction of scenes. A typical image captured by a 2D camera sensor is a lossy two-dimensional (2D) projection of our three-dimensional (3D) world. Geometric reconstruction approaches usually recreate the lost structural information by taking in multiple images observing a scene from different views and solving a problem known as Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM). Remarkably, by establishing correspondences across images and use of geometric models, these methods (under reasonable conditions) can reconstruct a scene's 3D structure as well as precisely localise the observed views relative to the scene. The success of dense every-pixel multi-view reconstruction is however limited by matching ambiguities that commonly arise due to uniform texture, occlusion, and appearance distortion, among several other factors. The standard approach to deal with matching ambiguities is to handcraft priors based on assumptions like piecewise smoothness or planarity in the 3D map, in order to "fill in" map regions supported by little or ambiguous matching evidence. In this thesis we propose learned priors that in comparison more closely model the true structure of the scene and are based on geometric information predicted from the images. The motivation stems from recent advancements in deep learning algorithms and availability of massive datasets, that have allowed Convolutional Neural Networks (CNNs) to predict geometric properties of a scene such as point-wise surface normals and depths, from just a single image, more reliably than what was possible using previous machine learning-based or hand-crafted methods. In particular, we first explore how single image-based surface normals from a CNN trained on massive amount of indoor data can benefit the accuracy of dense reconstruction given input images from a moving monocular camera. Here we propose a novel surface normal based inverse depth regularizer and compare its performance against the inverse depth smoothness prior that is typically used to regularize regions in the reconstruction that are textureless. We also propose the first real-time CNN-based framework for live dense monocular reconstruction using our learned normal prior. Next, we look at how we can use deep learning to learn features in order to improve the pixel matching process itself, which is at the heart of multi-view geometric reconstruction. We propose a self-supervised feature learning scheme using RGB-D data from a 3D sensor (that does not require any manual labelling) and a multi-scale CNN architecture for feature extraction that is fast and eficient to run inside our proposed real-time monocular reconstruction framework. We extensively analyze the combined benefits of using learned normals and deep features that are good-for-matching in the context of dense reconstruction, both quantitatively and qualitatively on large real world datasets. Lastly, we explore how learned depths, also predicted on a per-pixel basis from a single image using a CNN, can be used to inpaint sparse 3D maps obtained from monocular SLAM or a 3D sensor. We propose a novel model that uses predicted depths and confidences from CNNs as priors to inpaint maps with arbitrary scale and sparsity. We obtain more reliable reconstructions than those of traditional depth inpainting methods such as the cross-bilateral filter that in comparison offer few learnable parameters. Here we advocate the idea of "just-in-time reconstruction" where a higher level of scene understanding reliably inpaints the corresponding portion of a sparse map on-demand and in real-time.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    Generative modeling of dynamic visual scenes

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 301-312).Modeling visual scenes is one of the fundamental tasks of computer vision. Whereas tremendous efforts have been devoted to video analysis in past decades, most prior work focuses on specific tasks, leading to dedicated methods to solve them. This PhD thesis instead aims to derive a probabilistic generative model that coherently integrates different aspects, notably appearance, motion, and the interaction between them. Specifically, this model considers each video as a composite of dynamic layers, each associated with a covering domain, an appearance template, and a flow describing its motion. These layers change dynamically following the associated flows, and are combined into video frames according to a Z-order that specifies their relative depth-order. To describe these layers and their dynamic changes, three major components are incorporated: (1) An appearance model describes the generative process of the pixel values of a video layer. This model, via the combination of a probabilistic patch manifold and a conditional Markov random field, is able to express rich local details while maintaining global coherence. (2) A motion model captures the motion pattern of a layer through a new concept called geometric flow that originates from differential geometric analysis. A geometric flow unifies the trajectory-based representation and the notion of geometric transformation to represent the collective dynamic behaviors persisting over time. (3) A partial Z-order specifies the relative depth order between layers. Here, through the unique correspondence between equivalent classes of partial orders and consistent choice functions, a distribution over the spaces of partial orders is established, and inference can thus be performed thereon. The development of these models leads to significant challenges in probabilistic modeling and inference that need new techniques to address. We studied two important problems: (1) Both the appearance model and the motion model rely on mixture modeling to capture complex distributions. In a dynamic setting, the components parameters and the number of components in a mixture model can change over time. While the use of Dirichlet processes (DPs) as priors allows indefinite number of components, incorporating temporal dependencies between DPs remains a nontrivial issue, theoretically and practically. Our research on this problem leads to a new construction of dependent DPs, enabling various forms of dynamic variations for nonparametric mixture models by harnessing the connections between Poisson and Dirichlet processes. (2) The inference of partial Z-order from a video needs a method to sample from the posterior distribution of partial orders. A key challenge here is that the underlying space of partial orders is disconnected, meaning that one may not be able to make local updates without violating the combinatorial constraints for partial orders. We developed a novel sampling method to tackle this problem, which dynamically introduces virtual states as bridges to connect between different parts of the space, implicitly resulting in an ergodic Markov chain over an augmented space. With this generative model of visual scenes, many vision problems can be readily solved through inference performed on the model. Empirical experiments demonstrate that this framework yields promising results on a series of practical tasks, including video denoising and inpainting, collective motion analysis, and semantic scene understanding.by Dahua Lin.Ph.D

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version
    • …
    corecore