20 research outputs found

    A Unified Approach to Restoration, Deinterlacing and Resolution Enhancement in Decoding MPEG-2 Video

    Get PDF

    Duality based optical flow algorithms with applications

    Get PDF
    We consider the popular TV-L1 optical flow formulation, and the so-called dual-ity based algorithm for minimizing the TV-L1 energy. The original formulation is extended to allow for vector valued images, and minimization results are given. In addition we consider di↵erent definitions of total variation regulariza-tion, and related formulations of the optical flow problem that may be used with a duality based algorithm. We present a highly optimized algorithmic setup to estimate optical flows, and give five novel applications. The first application is registration of medical images, where X-ray images of di↵erent hands, taken using di↵erent imaging devices are registered using a TV-L1 optical flow algo-rithm. We propose to regularize the input images, using sparsity enhancing regularization of the image gradient to improve registration results. The second application is registration of 2D chromatograms, where registration only have to be done in one of the two dimensions, resulting in a vector valued registration problem with values having several hundred dimensions. We propose a nove

    Efficient implementation of video processing algorithms on FPGA

    Get PDF
    The work contained in this portfolio thesis was carried out as part of an Engineering Doctorate (Eng.D) programme from the Institute for System Level Integration. The work was sponsored by Thales Optronics, and focuses on issues surrounding the implementation of video processing algorithms on field programmable gate arrays (FPGA). A description is given of FPGA technology and the currently dominant methods of designing and verifying firmware. The problems of translating a description of behaviour into one of structure are discussed, and some of the latest methodologies for tackling this problem are introduced. A number of algorithms are then looked at, including methods of contrast enhancement, deconvolution, and image fusion. Algorithms are characterised according to the nature of their execution flow, and this is used as justification for some of the design choices that are made. An efficient method of performing large two-dimensional convolutions is also described. The portfolio also contains a discussion of an FPGA implementation of a PID control algorithm, an overview of FPGA dynamic reconfigurability, and the development of a demonstration platform for rapid deployment of video processing algorithms in FPGA hardware

    Multimodal feature extraction and fusion for audio-visual speech recognition

    Get PDF
    Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two-fold. First, as the modalities are usually complementary, the end-result of multimodal processing is more informative than for each of the modalities individually, which represents the first advantage. This is true in all application domains: human-machine interaction, multimodal identification or multimodal image processing. The second advantage is that, as modalities are not always reliable, it is possible, when one modality becomes corrupted, to extract the missing information from the other one. There are two essential challenges in multimodal signal processing. First, the features used from each modality need to be as relevant and as few as possible. The fact that multimodal systems have to process more than just one modality means that they can run into errors caused by the curse of dimensionality much more easily than mono-modal ones. The curse of dimensionality is a term used essentially to say that the number of equally-distributed samples required to cover a region of space grows exponentially with the dimensionality of the space. This has important implications in the classification domain, since accurate models can only be obtained if an adequate number of samples is available, and obviously this required number of samples grows with the dimensionality of the features. Dimensionality reduction is thus a necessary step in any application dealing with complex signals, and this is achieved through selection, transforms or the combination of the two. The second essential challenge is multimodal integration. Since the signals involved do not necessarily have the same data rate, range or even dimensionality, combining information coming from such different sources is not straightforward. This can be done at different levels, starting from the basic signal level by combining the signals themselves, if they are compatible, up to the highest decision level, where only the individual decisions taken based on the signals are combined. Ideally, the fusion method should allow temporal variations in the relative importance of the two streams, to account for possible changes in their quality. However, this can only be done with methods operating at a high decision level. The aim of this thesis is to offer solutions to both these challenges, in the context of audio-visual speech recognition and speaker localization. Both these applications are from the field of human-machine interaction. Audio-visual speech recognition aims to improve the accuracy of speech recognizers by augmenting the audio with information extracted from the video, more particularly, the movement of the speaker's lips. This works well especially when the audio is corrupted, leading in this case to significant gains in accuracy. Speaker localization means detecting who is the active speaker in a audio-video sequence containing several persons, something that is useful for videoconferencing and the automated annotation of meetings. These two applications are the context in which we present our solutions to both feature selection and multimodal integration. First, we show how informative features can be extracted from the visual modality, using an information-theoretic framework which gives us a quantitative measure of the relevance of individual features. We also prove that reducing redundancy between these features is important for avoiding the curse of dimensionality and improving recognition results. The methods that we present are novel in the field of audio-visual speech recognition and we found that their use leads to significant improvements compared to the state of the art. Second, we present a method of multimodal fusion at the level of intermediate decisions using a weight for each of the streams. The weights are adaptive, changing according to the estimated reliability of each stream. This makes the system tolerant to changes in the quality of either stream, and even to the temporary interruption of one of the streams. The reliability estimate is based on the entropy of the posterior probability distributions of each stream at the intermediate decision level. Our results are superior to those obtained with a state of the art method based on maximizing the same posteriors. Moreover, we analyze the effect of a constraint typically imposed on stream weights in the literature, the constraint that they should sum to one. Our results show that removing this constraint can lead to improvements in recognition accuracy. Finally, we develop a method for audio-visual speaker localization, based on the correlation between audio energy and the movement of the speaker's lips. Our method is based on a joint probability model of the audio and video which is used to build a likelihood map showing the likely positions of the speaker's mouth. We show that our novel method performs better than a similar method from the literature. In conclusion, we analyze two different challenges of multimodal signal processing for two audio-visual problems, and offer innovative approaches for solving them

    Construction de mosaïques de super-résolution à partir de la vidéo de basse résolution. Application au résumé vidéo et la dissimulation d'erreurs de transmission.

    Get PDF
    La numérisation des vidéos existantes ainsi que le développement explosif des services multimédia par des réseaux comme la diffusion de la télévision numérique ou les communications mobiles ont produit une énorme quantité de vidéos compressées. Ceci nécessite des outils d’indexation et de navigation efficaces, mais une indexation avant l’encodage n’est pas habituelle. L’approche courante est le décodage complet des ces vidéos pour ensuite créer des indexes. Ceci est très coûteux et par conséquent non réalisable en temps réel. De plus, des informations importantes comme le mouvement, perdus lors du décodage, sont reestimées bien que déjà présentes dans le flux comprimé. Notre but dans cette thèse est donc la réutilisation des données déjà présents dans le flux comprimé MPEG pour l’indexation et la navigation rapide. Plus précisément, nous extrayons des coefficients DC et des vecteurs de mouvement. Dans le cadre de cette thèse, nous nous sommes en particulier intéressés à la construction de mosaïques à partir des images DC extraites des images I. Une mosaïque est construite par recalage et fusion de toutes les images d’une séquence vidéo dans un seul système de coordonnées. Ce dernier est en général aligné avec une des images de la séquence : l’image de référence. Il en résulte une seule image qui donne une vue globale de la séquence. Ainsi, nous proposons dans cette thèse un système complet pour la construction des mosaïques à partir du flux MPEG-1/2 qui tient compte de différentes problèmes apparaissant dans des séquences vidéo réeles, comme par exemple des objets en mouvment ou des changements d’éclairage. Une tâche essentielle pour la construction d’une mosaïque est l’estimation de mouvement entre chaque image de la séquence et l’image de référence. Notre méthode se base sur une estimation robuste du mouvement global de la caméra à partir des vecteurs de mouvement des images P. Cependant, le mouvement global de la caméra estimé pour une image P peut être incorrect car il dépend fortement de la précision des vecteurs encodés. Nous détectons les images P concernées en tenant compte des coefficients DC de l’erreur encodée associée et proposons deux méthodes pour corriger ces mouvements. Unemosaïque construite à partir des images DC a une résolution très faible et souffre des effets d’aliasing dus à la nature des images DC. Afin d’augmenter sa résolution et d’améliorer sa qualité visuelle, nous appliquons une méthode de super-résolution basée sur des rétro-projections itératives. Les méthodes de super-résolution sont également basées sur le recalage et la fusion des images d’une séquence vidéo, mais sont accompagnées d’une restauration d’image. Dans ce cadre, nous avons développé une nouvelleméthode d’estimation de flou dû au mouvement de la caméra ainsi qu’une méthode correspondante de restauration spectrale. La restauration spectrale permet de traiter le flou globalement, mais, dans le cas des obvi jets ayant un mouvement indépendant du mouvement de la caméra, des flous locaux apparaissent. C’est pourquoi, nous proposons un nouvel algorithme de super-résolution dérivé de la restauration spatiale itérative de Van Cittert et Jansson permettant de restaurer des flous locaux. En nous basant sur une segmentation d’objets en mouvement, nous restaurons séparément lamosaïque d’arrière-plan et les objets de l’avant-plan. Nous avons adapté notre méthode d’estimation de flou en conséquence. Dans une premier temps, nous avons appliqué notre méthode à la construction de résumé vidéo avec pour l’objectif la navigation rapide par mosaïques dans la vidéo compressée. Puis, nous établissions comment la réutilisation des résultats intermédiaires sert à d’autres tâches d’indexation, notamment à la détection de changement de plan pour les images I et à la caractérisation dumouvement de la caméra. Enfin, nous avons exploré le domaine de la récupération des erreurs de transmission. Notre approche consiste en construire une mosaïque lors du décodage d’un plan ; en cas de perte de données, l’information manquante peut être dissimulée grace à cette mosaïque

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Elastic image registration using parametric deformation models

    Get PDF
    The main topic of this thesis is elastic image registration for biomedical applications. We start with an overview and classification of existing registration techniques. We revisit the landmark interpolation which appears in the landmark-based registration techniques and add some generalizations. We develop a general elastic image registration algorithm. It uses a grid of uniform B-splines to describe the deformation. It also uses B-splines for image interpolation. Multiresolution in both image and deformation model spaces yields robustness and speed. First we describe a version of this algorithm targeted at finding unidirectional deformation in EPI magnetic resonance images. Then we present the enhanced and generalized version of this algorithm which is significantly faster and capable of treating multidimensional deformations. We apply this algorithm to the registration of SPECT data and to the motion estimation in ultrasound image sequences. A semi-automatic version of the registration algorithm is capable of accepting expert hints in the form of soft landmark constraints. Much fewer landmarks are needed and the results are far superior compared to pure landmark registration. In the second part of this thesis, we deal with the problem of generalized sampling and variational reconstruction. We explain how to reconstruct an object starting from several measurements using arbitrary linear operators. This comprises the case of traditional as well as generalized sampling. Among all possible reconstructions, we choose the one minimizing an a priori given quadratic variational criterion. We give an overview of the method and present several examples of applications. We also provide the mathematical details of the theory and discuss the choice of the variational criterion to be used
    corecore