Search CORE

14 research outputs found

Joint Visual Denoising and Classification using Deep Learning

Author: Chen Gang
Li Yawei
Srihari Sargur N.
Publication venue
Publication date: 04/12/2016
Field of study

Visual restoration and recognition are traditionally addressed in pipeline fashion, i.e. denoising followed by classification. Instead, observing correlations between the two tasks, for example clearer image will lead to better categorization and vice visa, we propose a joint framework for visual restoration and recognition for handwritten images, inspired by advances in deep autoencoder and multi-modality learning. Our model is a 3-pathway deep architecture with a hidden-layer representation which is shared by multi-inputs and outputs, and each branch can be composed of a multi-layer deep model. Thus, visual restoration and classification can be unified using shared representation via non-linear mapping, and model parameters can be learnt via backpropagation. Using MNIST and USPS data corrupted with structured noise, the proposed framework performs at least 20\% better in classification than separate pipelines, as well as clearer recovered images. The noise model and the reproducible source code is available at {\url{https://github.com/ganggit/jointmodel}}.Comment: 5 pages, 7 figures, ICIP 201

arXiv.org e-Print Archive

Crossref

Joint Demosaicing and Denoising with Double Deep Image Priors

Author: Dai Yutong
Lahiri Anish
Li Taihui
Mayer Owen
Publication venue
Publication date: 17/09/2023
Field of study

Demosaicing and denoising of RAW images are crucial steps in the processing pipeline of modern digital cameras. As only a third of the color information required to produce a digital image is captured by the camera sensor, the process of demosaicing is inherently ill-posed. The presence of noise further exacerbates this problem. Performing these two steps sequentially may distort the content of the captured RAW images and accumulate errors from one step to another. Recent deep neural-network-based approaches have shown the effectiveness of joint demosaicing and denoising to mitigate such challenges. However, these methods typically require a large number of training samples and do not generalize well to different types and intensities of noise. In this paper, we propose a novel joint demosaicing and denoising method, dubbed JDD-DoubleDIP, which operates directly on a single RAW image without requiring any training data. We validate the effectiveness of our method on two popular datasets -- Kodak and McMaster -- with various noises and noise intensities. The experimental results show that our method consistently outperforms other compared methods in terms of PSNR, SSIM, and qualitative visual perception

arXiv.org e-Print Archive

Role of machine learning in early diagnosis of kidney diseases.

Author: Shehata Mohamed Nazih Mohamed Ibrahim
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/08/2022
Field of study

Machine learning (ML) and deep learning (DL) approaches have been used as indispensable tools in modern artificial intelligence-based computer-aided diagnostic (AIbased CAD) systems that can provide non-invasive, early, and accurate diagnosis of a given medical condition. These AI-based CAD systems have proven themselves to be reproducible and have the generalization ability to diagnose new unseen cases with several diseases and medical conditions in different organs (e.g., kidneys, prostate, brain, liver, lung, breast, and bladder). In this dissertation, we will focus on the role of such AI-based CAD systems in early diagnosis of two kidney diseases, namely: acute rejection (AR) post kidney transplantation and renal cancer (RC). A new renal computer-assisted diagnostic (Renal-CAD) system was developed to precisely diagnose AR post kidney transplantation at an early stage. The developed Renal-CAD system perform the following main steps: (1) auto-segmentation of the renal allograft from surrounding tissues from diffusion weighted magnetic resonance imaging (DW-MRI) and blood oxygen level-dependent MRI (BOLD-MRI), (2) extraction of image markers, namely: voxel-wise apparent diffusion coefficients (ADCs) are calculated from DW-MRI scans at 11 different low and high b-values and then represented as cumulative distribution functions (CDFs) and extraction of the transverse relaxation rate (R2*) values from the segmented kidneys using BOLD-MRI scans at different echotimes, (3) integration of multimodal image markers with the associated clinical biomarkers, serum creatinine (SCr) and creatinine clearance (CrCl), and (4) diagnosing renal allograft status as nonrejection (NR) or AR by utilizing these integrated biomarkers and the developed deep learning classification model built on stacked auto-encoders (SAEs). Using a leaveone- subject-out cross-validation approach along with SAEs on a total of 30 patients with transplanted kidney (AR = 10 and NR = 20), the Renal-CAD system demonstrated 93.3% accuracy, 90.0% sensitivity, and 95.0% specificity in differentiating AR from NR. Robustness of the Renal-CAD system was also confirmed by the area under the curve value of 0.92. Using a stratified 10-fold cross-validation approach, the Renal-CAD system demonstrated its reproduciblity and robustness with a diagnostic accuracy of 86.7%, sensitivity of 80.0%, specificity of 90.0%, and AUC of 0.88. In addition, a new renal cancer CAD (RC-CAD) system for precise diagnosis of RC at an early stage was developed, which incorporates the following main steps: (1) estimating the morphological features by applying a new parametric spherical harmonic technique, (2) extracting appearance-based features, namely: first order textural features are calculated and second order textural features are extracted after constructing the graylevel co-occurrence matrix (GLCM), (3) estimating the functional features by constructing wash-in/wash-out slopes to quantify the enhancement variations across different contrast enhanced computed tomography (CE-CT) phases, (4) integrating all the aforementioned features and modeling a two-stage multilayer perceptron artificial neural network (MLPANN) classifier to classify the renal tumor as benign or malignant and identify the malignancy subtype. On a total of 140 RC patients (malignant = 70 patients (ccRCC = 40 and nccRCC = 30) and benign angiomyolipoma tumors = 70), the developed RC-CAD system was validated using a leave-one-subject-out cross-validation approach. The developed RC-CAD system achieved a sensitivity of 95.3% ± 2.0%, a specificity of 99.9% ± 0.4%, and Dice similarity coefficient of 0.98 ± 0.01 in differentiating malignant from benign renal tumors, as well as an overall accuracy of 89.6% ± 5.0% in the sub-typing of RCC. The diagnostic abilities of the developed RC-CAD system were further validated using a randomly stratified 10-fold cross-validation approach. The results obtained using the proposed MLP-ANN classification model outperformed other machine learning classifiers (e.g., support vector machine, random forests, and relational functional gradient boosting) as well as other different approaches from the literature. In summary, machine and deep learning approaches have shown potential abilities to be utilized to build AI-based CAD systems. This is evidenced by the promising diagnostic performance obtained by both Renal-CAD and RC-CAD systems. For the Renal- CAD, the integration of functional markers extracted from multimodal MRIs with clinical biomarkers using SAEs classification model, potentially improved the final diagnostic results evidenced by high accuracy, sensitivity, and specificity. The developed Renal-CAD demonstrated high feasibility and efficacy for early, accurate, and non-invasive identification of AR. For the RC-CAD, integrating morphological, textural, and functional features extracted from CE-CT images using a MLP-ANN classification model eventually enhanced the final results in terms of accuracy, sensitivity, and specificity, making the proposed RC-CAD a reliable noninvasive diagnostic tool for RC. The early and accurate diagnosis of AR or RC will help physicians to provide early intervention with the appropriate treatment plan to prolong the life span of the diseased kidney, increase the survival chance of the patient, and thus improve the healthcare outcome in the U.S. and worldwide

University of Louisville

Modeling of Facial Wrinkles for Applications in Computer Vision

Author: Amrutha Sethuram
C. Robert
C.C. Ng
G.O. Cula
G.O. Cula
H.C. Okada
J. Suo
J. Suo
L. Boissieux
L.I. Jiang
N. Batool
N. Batool
N. Batool
N. Magnenat-Thalmann
N. Ramanathan
Nazre Batool
O.G. Cula
S. Jeong
S. Mukaida
T. Ojala
T.F. Cootes
U. Hess Jr
U. Park
W. Freeman
Y. Fu
Y. Wu
Y.H. Kwon
Y.L. Tian
Yun Fu
Z. Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

International audienceAnalysis and modeling of aging human faces have been extensively studied in the past decade for applications in computer vision such as age estimation, age progression and face recognition across aging. Most of this research work is based on facial appearance and facial features such as face shape, geometry, location of landmarks and patch-based texture features. Despite the recent availability of higher resolution, high quality facial images, we do not find much work on the image analysis of local facial features such as wrinkles specifically. For the most part, modeling of facial skin texture, fine lines and wrinkles has been a focus in computer graphics research for photo-realistic rendering applications. In computer vision, very few aging related applications focus on such facial features. Where several survey papers can be found on facial aging analysis in computer vision, this chapter focuses specifically on the analysis of facial wrinkles in the context of several applications. Facial wrinkles can be categorized as subtle discontinuities or cracks in surrounding inhomogeneous skin texture and pose challenges to being detected/localized in images. First, we review commonly used image features to capture the intensity gradients caused by facial wrinkles and then present research in modeling and analysis of facial wrinkles as aging texture or curvilinear objects for different applications. The reviewed applications include localization or detection of wrinkles in facial images , incorporation of wrinkles for more realistic age progression, analysis for age estimation and inpainting/removal of wrinkles for facial retouching

Crossref

INRIA a CCSD electronic archive server

Diffusion-weighted magnetic resonance imaging in diagnosing graft dysfunction : a non-invasive alternative to renal biopsy.

Author: Hollis Elizabeth Marie
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/05/2017
Field of study

The thesis is divided into three parts. The first part focuses on background information including how the kidney functions, diseases, and available kidney disease treatment strategies. In addition, the thesis provides information on imaging instruments and how they can be used to diagnose renal graft dysfunction. The second part focuses on elucidating the parameters linked with highly accurate diagnosis of rejection. Four parameters categories were tested: clinical biomarkers alone, individual mean apparent diffusion coefficient (ADC) at 11-different b- values, mean ADCs of certain groups of b-value, and fusion of clinical biomarkers and all b-values. The most accurate model was found to be when the b-value of b=100 s/mm2 and b=700 s/mm2 were fused. The third part of this thesis focuses on a study that uses Diffusion-Weighted MRI to diagnose and differentiate two types of renal rejection. The system was found to correctly differentiate the two types of rejection with a 98% accuracy. The last part of this thesis concludes the work that has been done and states the possible trends and future avenues

University of Louisville

Assessment of the Segmentation of RGB Remote Sensing Images: A Subjective Approach

Author: Baušys Romualdas
Janušonis Edgaras
Kazakevičiūtė-Januškevičienė Girūta
Kiškis Mindaugas
Limba Tadas
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

The evaluation of remote sensing imagery segmentation results plays an important role in the further image analysis and decision-making. The search for the optimal segmentation method for a particular data set and the suitability of segmentation results for the use in satellite image classification are examples where the proper image segmentation quality assessment can affect the quality of the final result. There is no extensive research related to the assessment of the segmentation effectiveness of the images. The designed objective quality assessment metrics that can be used to assess the quality of the obtained segmentation results usually take into account the subjective features of the human visual system (HVS). A novel approach is used in the article to estimate the effectiveness of satellite image segmentation by relating and determining the correlation between subjective and objective segmentation quality metrics. Pearson’s and Spearman’s correlation was used for satellite images after applying a k-means++ clustering algorithm based on colour information. Simultaneously, the dataset of the satellite images with ground truth (GT) based on the “DeepGlobe Land Cover Classification Challenge” dataset was constructed for testing three classes of quality metrics for satellite image segmentation.This article belongs to the Special Issue The Quality of Remote Sensing Optical Images from Acquisition to UsersThis research has received funding from the Research Council of Lithuania (LMTLT), agreement No. S-MIP-19-27

Multidisciplinary Digital Publishing Institute

Vilniaus Gedimino Technikos Universitetas: VGTU Talpykla / Vilnius Gediminas Technical University: VGTU Repository

Mykolas Romeris University Institutional Repository

Computational methods for the analysis of functional 4D-CT chest images.

Author: Soliman Ahmed Soliman Naeem
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/12/2016
Field of study

Medical imaging is an important emerging technology that has been intensively used in the last few decades for disease diagnosis and monitoring as well as for the assessment of treatment effectiveness. Medical images provide a very large amount of valuable information that is too huge to be exploited by radiologists and physicians. Therefore, the design of computer-aided diagnostic (CAD) system, which can be used as an assistive tool for the medical community, is of a great importance. This dissertation deals with the development of a complete CAD system for lung cancer patients, which remains the leading cause of cancer-related death in the USA. In 2014, there were approximately 224,210 new cases of lung cancer and 159,260 related deaths. The process begins with the detection of lung cancer which is detected through the diagnosis of lung nodules (a manifestation of lung cancer). These nodules are approximately spherical regions of primarily high density tissue that are visible in computed tomography (CT) images of the lung. The treatment of these lung cancer nodules is complex, nearly 70% of lung cancer patients require radiation therapy as part of their treatment. Radiation-induced lung injury is a limiting toxicity that may decrease cure rates and increase morbidity and mortality treatment. By finding ways to accurately detect, at early stage, and hence prevent lung injury, it will have significant positive consequences for lung cancer patients. The ultimate goal of this dissertation is to develop a clinically usable CAD system that can improve the sensitivity and specificity of early detection of radiation-induced lung injury based on the hypotheses that radiated lung tissues may get affected and suffer decrease of their functionality as a side effect of radiation therapy treatment. These hypotheses have been validated by demonstrating that automatic segmentation of the lung regions and registration of consecutive respiratory phases to estimate their elasticity, ventilation, and texture features to provide discriminatory descriptors that can be used for early detection of radiation-induced lung injury. The proposed methodologies will lead to novel indexes for distinguishing normal/healthy and injured lung tissues in clinical decision-making. To achieve this goal, a CAD system for accurate detection of radiation-induced lung injury that requires three basic components has been developed. These components are the lung fields segmentation, lung registration, and features extraction and tissue classification. This dissertation starts with an exploration of the available medical imaging modalities to present the importance of medical imaging in today’s clinical applications. Secondly, the methodologies, challenges, and limitations of recent CAD systems for lung cancer detection are covered. This is followed by introducing an accurate segmentation methodology of the lung parenchyma with the focus of pathological lungs to extract the volume of interest (VOI) to be analyzed for potential existence of lung injuries stemmed from the radiation therapy. After the segmentation of the VOI, a lung registration framework is introduced to perform a crucial and important step that ensures the co-alignment of the intra-patient scans. This step eliminates the effects of orientation differences, motion, breathing, heart beats, and differences in scanning parameters to be able to accurately extract the functionality features for the lung fields. The developed registration framework also helps in the evaluation and gated control of the radiotherapy through the motion estimation analysis before and after the therapy dose. Finally, the radiation-induced lung injury is introduced, which combines the previous two medical image processing and analysis steps with the features estimation and classification step. This framework estimates and combines both texture and functional features. The texture features are modeled using the novel 7th-order Markov Gibbs random field (MGRF) model that has the ability to accurately models the texture of healthy and injured lung tissues through simultaneously accounting for both vertical and horizontal relative dependencies between voxel-wise signals. While the functionality features calculations are based on the calculated deformation fields, obtained from the 4D-CT lung registration, that maps lung voxels between successive CT scans in the respiratory cycle. These functionality features describe the ventilation, the air flow rate, of the lung tissues using the Jacobian of the deformation field and the tissues’ elasticity using the strain components calculated from the gradient of the deformation field. Finally, these features are combined in the classification model to detect the injured parts of the lung at an early stage and enables an earlier intervention

University of Louisville

Analysis and Modular Approach for Text Extraction from Scientific Figures on Limited Data

Author: Böschen Falk
Publication venue
Publication date: 01/01/2021
Field of study

Scientific figures are widely used as compact, comprehensible representations of important information. The re-usability of these figures is however limited, as one can rarely search directly for them, since they are mostly indexing by their surrounding text (e. g., publication or website) which often does not contain the full-message of the figure. In this thesis, the focus is on making the content of scientific figures accessible by extracting the text from these figures. A modular pipeline for unsupervised text extraction from scientific figures, based on a thorough analysis of the literature, was built to address the problem. This modular pipeline was used to build several unsupervised approaches, to evaluate different methods from the literature and new methods and method combinations. Some supervised approaches were built as well for comparison. One challenge, while evaluating the approaches, was the lack of annotated data, which especially needed to be considered when building the supervised approach. Three existing datasets were used for evaluation as well as two datasets of 241 scientific figures which were manually created and annotated. Additionally, two existing datasets for text extraction from other types of images were used for pretraining the supervised approach. Several experiments showed the superiority of the unsupervised pipeline over common Optical Character Recognition engines and identified the best unsupervised approach. This unsupervised approach was compared with the best supervised approach, which, despite of the limited amount of training data available, clearly outperformed the unsupervised approach.Infografiken sind ein viel verwendetes Medium zur kompakten Darstellung von Kernaussagen. Die Nachnutzbarkeit dieser Abbildungen ist jedoch häufig limitiert, da sie schlecht auffindbar sind, da sie meist über die umschließenden Medien, wie beispielsweise Publikationen oder Webseiten, und nicht über ihren Inhalt indexiert sind. Der Fokus dieser Arbeit liegt auf der Extraktion der textuellen Inhalte aus Infografiken, um deren Inhalt zu erschließen. Ausgehend von einer umfangreichen Analyse verwandter Arbeiten, wurde ein generalisierender, modularer Ansatz für die unüberwachte Textextraktion aus wissenschaftlichen Abbildungen entwickelt. Mit diesem modularen Ansatz wurden mehrere unüberwachte Ansätze und daneben auch noch einige überwachte Ansätze umgesetzt, um diverse Methoden aus der Literatur sowie neue und bisher noch nicht genutzte Methoden zu vergleichen. Eine Herausforderung bei der Evaluation war die geringe Menge an annotierten Abbildungen, was insbesondere beim überwachten Ansatz Methoden berücksichtigt werden musste. Für die Evaluation wurden drei existierende Datensätze verwendet und zudem wurden zusätzlich zwei Datensätze mit insgesamt 241 Infografiken erstellt und mit den nötigen Informationen annotiert, sodass insgesamt 5 Datensätze für die Evaluation verwendet werden konnten. Für das Pre-Training des überwachten Ansatzes wurden zudem zwei Datensätze aus verwandten Textextraktionsbereichen verwendet. In verschiedenen Experimenten wird gezeigt, dass der unüberwachte Ansatz besser funktioniert als klassische Texterkennungsverfahren und es wird aus den verschiedenen unüberwachten Ansätzen der beste ermittelt. Dieser unüberwachte Ansatz wird mit dem überwachten Ansatz verglichen, der trotz begrenzter Trainingsdaten die besten Ergebnisse liefert

MACAU: Open Access Repository of Kiel University

Visualização de padrões temporais cíclicos em estudos de fenologia

Author: Mariano Greice Cristina, 1986-
Publication venue: [s.n.]
Publication date: 04/09/2018
Field of study

Orientadores: Ricardo da Silva Torres, Leonor Patrícia Cerdeira MorellatoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Em diversas aplicações, grandes volumes de dados multidimensionais têm sido gerados continuamente ao longo do tempo. Uma abordagem adequada para lidar com estas coleções consiste no uso de métodos de visualização de informação, a partir dos quais padrões de interesse podem ser identificados, possibilitando o entendimento de fenômenos temporais complexos. De fato, em diversos domínios, o desenvolvimento de ferramentas adequadas para apoiar análises complexas, por exemplo, aquelas baseadas na identificação de padrões de mudanças ou correlações existentes entre múltiplas variáveis ao longo do tempo é de suma importância. Em estudos de fenologia, por exemplo, especialistas observam as mudanças que ocorrem ao longo da vida de plantas e animais e investigam qual é a relação entre essas mudanças com variáveis ambientais. Neste cenário, especialistas em fenologia cada vez mais precisam de ferramentas para, adequadamente, visualizar séries temporais longas, com muitas variáveis e de diferentes tipos (por exemplo, texto e imagem), assim como identificar padrões temporais cíclicos. Embora diversas abordagens tenham sido propostas para visualizar dados que variam ao longo do tempo, muitas não são apropriadas ou aplicáveis para dados de fenologia, porque não são capazes de: (i) lidar com séries temporais longas, com muitas variáveis de diferentes tipos de dados e de uma ou mais dimensões; e (ii) permitir a identificação de padrões temporais cíclicos e drivers ambientais associados. Este trabalho aborda essas questões a partir da proposta de duas novas abordagens para apoiar a análise e visualização de dados temporais multidimensionais. Nossa primeira proposta combina estruturas visuais radiais com ritmos visuais. As estruturas radiais são usadas para fornecer informação contextual sobre fenômenos cíclicos, enquanto que o ritmo visual é usado para sumarizar séries temporais longas em representações compactas. Nós desenvolvemos, avaliamos e validamos nossa proposta com especialistas em fenologia em tarefas relacionadas à visualização de dados de observação direta da fenologia de plantas em nível tanto de indivíduos quanto de espécies. Nós também validamos a proposta usando dados temporais relacionados a imagens obtidas de sistemas de monitoramento de vegetação próxima à superfície. Nossa segunda abordagem é uma nova representação baseada em imagem, chamada Change Frequency Heatmap (CFH), usada para codificar mudanças temporais de dados numéricos multivariados. O método calcula histogramas de padrões de mudanças observados em sucessivos instantes de tempo. Nós validamos o uso do CFH a partir da criação de uma ferramenta de caracterização de mudanças no ciclo de vida de plantas de múltiplos indivíduos e espécies ao longo do tempo. Nós demonstramos o potencial do CFH para ajudar na identificação visual de padrões de mudanças temporais complexas, especialmente na identificação de variações entre indivíduos em estudos relacionados à fenologia de plantasAbstract: In several applications, large volumes of multidimensional data have been generated continuously over time. One suitable approach for handling those collections in a meaningful way consists in the use of information visualization methods, based on which patterns of interest can be identified, triggering the understanding of complex temporal phenomena. In fact, in several domains, the development of appropriate tools for supporting complex analysis based, for example, on the identification of change patterns in temporal data or existing correlations, over time, among multiple variables, is of paramount importance. In phenology studies, for instance, phenologists observe changes in the development of plants and animals throughout their lives and investigate what is the relationship between these changes with environmental changes. Therefore, phenologists increasingly need tools for visualizing appropriately long-term series with many variables of different data types, as well as for identifying cyclical temporal patterns. Although several approaches have been proposed to visualize data varying over time, most of them are not appropriate or applicable to phenology data, because they are not able: (i) to handle long-term series with many variables of different data types and one or more dimensions and (ii) to support the identification of cyclical temporal patterns and associated environmental drivers. This work addresses these shortcomings by presenting two new approaches to support the analysis and visualization of multidimensional temporal data. Our first proposal to visualize phenological data combines radial visual structures along with visual rhythms. Radial visual structures are used to provide contextual insights regarding cyclical phenomena, while the visual rhythm encoding is used to summarize long-term time series into compact representations. We developed, evaluated, and validated our proposal with phenology experts using plant phenology direct observational data both at individuals and species levels. Also we validated the proposal using image-related temporal data obtained from near-surface vegetation monitoring systems. Our second approach is a novel image-based representation, named Change Frequency Heatmap (CFH), used to encode temporal changes of multivariate numerical data. The method computes histograms of change patterns observed at successive timestamps. We validated the use of CFHs through the creation of a temporal change characterization tool to support complex plant phenology analysis, concerning the characterization of plant life cycle changes of multiple individuals and species over time. We demonstrated the potential of CFH to support visual identification of complex temporal change patterns, especially to decipher interindividual variations in plant phenologyDoutoradoCiência da ComputaçãoDoutora em Ciência da Computação162312/2015-62013/501550-0CNPQCAPESFAPES

Repositorio da Producao Cientifica e Intelectual da Unicamp

Nouvelles méthodes de prédiction inter-images pour la compression d’images et de vidéos

Author: Bégaint Jean
Publication venue: HAL CCSD
Publication date: 29/11/2018
Field of study

Due to the large availability of video cameras and new social media practices, as well as the emergence of cloud services, images and videosconstitute today a significant amount of the total data that is transmitted over the internet. Video streaming applications account for more than 70% of the world internet bandwidth. Whereas billions of images are already stored in the cloud and millions are uploaded every day. The ever growing streaming and storage requirements of these media require the constant improvements of image and video coding tools. This thesis aims at exploring novel approaches for improving current inter-prediction methods. Such methods leverage redundancies between similar frames, and were originally developed in the context of video compression. In a first approach, novel global and local inter-prediction tools are associated to improve the efficiency of image sets compression schemes based on video codecs. By leveraging a global geometric and photometric compensation with a locally linear prediction, significant improvements can be obtained. A second approach is then proposed which introduces a region-based inter-prediction scheme. The proposed method is able to improve the coding performances compared to existing solutions by estimating and compensating geometric and photometric distortions on a semi-local level. This approach is then adapted and validated in the context of video compression. Bit-rate improvements are obtained, especially for sequences displaying complex real-world motions such as zooms and rotations. The last part of the thesis focuses on deep learning approaches for inter-prediction. Deep neural networks have shown striking results for a large number of computer vision tasks over the last years. Deep learning based methods proposed for frame interpolation applications are studied here in the context of video compression. Coding performance improvements over traditional motion estimation and compensation methods highlight the potential of these deep architectures.En raison de la grande disponibilité des dispositifs de capture vidéo et des nouvelles pratiques liées aux réseaux sociaux, ainsi qu’à l’émergence desservices en ligne, les images et les vidéos constituent aujourd’hui une partie importante de données transmises sur internet. Les applications de streaming vidéo représentent ainsi plus de 70% de la bande passante totale de l’internet. Des milliards d’images sont déjà stockées dans le cloud et des millions y sont téléchargés chaque jour. Les besoins toujours croissants en streaming et stockage nécessitent donc une amélioration constante des outils de compression d’image et de vidéo. Cette thèse vise à explorer des nouvelles approches pour améliorer les méthodes actuelles de prédiction inter-images. De telles méthodes tirent parti des redondances entre images similaires, et ont été développées à l’origine dans le contexte de la vidéo compression. Dans une première partie, de nouveaux outils de prédiction inter globaux et locaux sont associés pour améliorer l’efficacité des schémas de compression de bases de données d’image. En associant une compensation géométrique et photométrique globale avec une prédiction linéaire locale, des améliorations significatives peuvent être obtenues. Une seconde approche est ensuite proposée qui introduit un schéma deprédiction inter par régions. La méthode proposée est en mesure d’améliorer les performances de codage par rapport aux solutions existantes en estimant et en compensant les distorsions géométriques et photométriques à une échelle semi locale. Cette approche est ensuite adaptée et validée dans le cadre de la compression vidéo. Des améliorations en réduction de débit sont obtenues, en particulier pour les séquences présentant des mouvements complexes réels tels que des zooms et des rotations. La dernière partie de la thèse se concentre sur l’étude des méthodes d’apprentissage en profondeur dans le cadre de la prédiction inter. Ces dernières années, les réseaux de neurones profonds ont obtenu des résultats impressionnants pour un grand nombre de tâches de vision par ordinateur. Les méthodes basées sur l’apprentissage en profondeur proposéesà l’origine pour de l’interpolation d’images sont étudiées ici dans le contexte de la compression vidéo. Des améliorations en terme de performances de codage sont obtenues par rapport aux méthodes d’estimation et de compensation de mouvements traditionnelles. Ces résultats mettent en évidence le fort potentiel de ces architectures profondes dans le domaine de la compression vidéo

INRIA a CCSD electronic archive server