88 research outputs found
Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment. FMER is a subset of image processing and it
is a multidisciplinary topic to analysis. So, it requires familiarity with
other topics of Artifactual Intelligence (AI) such as machine learning, digital
image processing, psychology and more. So, it is a great opportunity to write a
book which covers all of these topics for beginner to professional readers in
the field of AI and even without having background of AI. Our goal is to
provide a standalone introduction in the field of MFER analysis in the form of
theorical descriptions for readers with no background in image processing with
reproducible Matlab practical examples. Also, we describe any basic definitions
for FMER analysis and MATLAB library which is used in the text, that helps
final reader to apply the experiments in the real-world applications. We
believe that this book is suitable for students, researchers, and professionals
alike, who need to develop practical skills, along with a basic understanding
of the field. We expect that, after reading this book, the reader feels
comfortable with different key stages such as color and depth image processing,
color and depth image representation, classification, machine learning, facial
micro-expressions recognition, feature extraction and dimensionality reduction.
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment.Comment: This is the second edition of the boo
Recommended from our members
LEARNING TO RIG CHARACTERS
With the emergence of 3D virtual worlds, 3D social media, and massive online games, the need for diverse, high-quality, animation-ready characters and avatars is greater than ever. To animate characters, artists hand-craft articulation structures, such as animation skeletons and part deformers, which require significant amount of manual and laborious interaction with 2D/3D modeling interfaces. This thesis presents deep learning methods that are able to significantly automate the process of character rigging.
First, the thesis introduces RigNet, a method capable of predicting an animation skeleton for an input static 3D shape in the form of a polygon mesh. The predicted skeletons match the animator expectations in joint placement and topology. RigNet also estimates surface skin weights which determine how the mesh is animated given the different skeletal poses. In contrast to prior work that fits pre-defined skeletal templates with hand-tuned objectives, RigNet is able to automatically rig diverse characters, such as humanoids, quadrupeds, toys, birds, with varying articulation structure and geometry. RigNet is based on a deep neural architecture that directly operates on the mesh representation. The architecture is trained on a diverse dataset of rigged models that we mined online and curated. The dataset includes 2.7K polygon meshes, along with their associated skeletons and corresponding skin weights.
Second, the thesis introduces Morig, a method that automatically rigs character meshes driven by single-view point cloud streams capturing the motion of performing characters. Compared to RigNet, MoRig\u27s rigging is \emph{motion-aware}: its neural network encodes motion cues from the point clouds into compact feature representations that are informative about the articulated parts of the performing character. These motion-aware features guide the inference of an appropriate skeletal rig for the input mesh. Furthermore, Morig is able to animate the rig according to the captured point cloud motion. Morig can handle diverse characters with different morphologies (e.g., humanoids, quadrupeds, toy characters). It also accounts for occluded regions in the point clouds and mismatches in the part proportions between the input mesh and captured character.
Third, the thesis introduces APES, a method that takes as input 2D raster images depicting a small set of poses of a character shown in a sprite sheet, and identifies articulated parts useful for rigging the character. APES uses a combination of neural network inference and integer linear programming to identify a compact set of articulated body parts, e.g. head, torso and limbs, that best reconstruct the input poses. Compared to Morig and RigNet that require a large collection of training models with associated skeletons and skinning weights, APES\u27 neural architecture relies on less effortful supervision from (i) pixel correspondences readily available in existing large cartoon image datasets (e.g., Creative Flow), (ii) a relatively small dataset of 57 cartoon characters segmented into moving parts.
Finally, the thesis discusses future research directions related to combining neural rigging with 3D and 4D reconstruction of characters from point cloud data and 2D video as well as automating the process of motion synthesis for 3D characters
Spatial Analysis for Landscape Changes
Recent increasing trends of the occurrence of natural and anthropic processes have a strong impact on landscape modification, and there is a growing need for the implementation of effective instruments, tools, and approaches to understand and manage landscape changes. A great improvement in the availability of high-resolution DEMs, GIS tools, and algorithms of automatic extraction of landform features and change detections has favored an increase in the analysis of landscape changes, which became an essential instrument for the quantitative evaluation of landscape changes in many research fields. One of the most effective ways of investigating natural landscape changes is the geomorphological one, which benefits from recent advances in the development of digital elevation model (DEM) comparison software and algorithms, image change detection, and landscape evolution models. This Special Issue collects six papers concerning the application of traditional and innovative multidisciplinary methods in several application fields, such as geomorphology, urban and territorial systems, vegetation restoration, and soil science. The papers include multidisciplinary studies that highlight the usefulness of quantitative analyses of satellite images and UAV-based DEMs, the application of Landscape Evolution Models (LEMs) and automatic landform classification algorithms to solve multidisciplinary issues of landscape changes. A review article is also presented, dealing with the bibliometric analysis of the research topic
Data-Efficient Learning of Semantic Segmentation
Semantic segmentation is a fundamental problem in visual perception with a wide range of applications ranging from robotics to autonomous vehicles, and recent approaches based on deep learning have achieved excellent performance. However, to train such systems there is in general a need for very large datasets of annotated images. In this thesis we investigate and propose methods and setups for which it is possible to use unlabelled data to increase the performance or to use limited application specific data to reduce the need for large datasets when learning semantic segmentation.In the first paper we study semantic video segmentation. We present a deep end-to-end trainable model that uses propagated labelling information in unlabelled frames in addition to sparsely labelled frames to predict semantic segmentation. Extensive experiments on the CityScapes and CamVid datasets show that the model can improve accuracy and temporal consistency by using extra unlabelled video frames in training and testing.In the second, third and fourth paper we study active learning for semantic segmentation in an embodied context where navigation is part of the problem. A navigable agent should explore a building and query for the labelling of informative views that increase the visual perception of the agent. In the second paper we introduce the embodied visual active learning problem, and propose and evaluate a range of methods from heuristic baselines to a fully trainable agent using reinforcement learning (RL) on the Matterport3D dataset. We show that the learned agent outperforms several comparable pre-specified baselines. In the third paper we study the embodied visual active learning problem in a lifelong setup, where the visual learning spans the exploration of multiple buildings, and the learning in one scene should influence the active learning in the next e.g. by not annotating already accurately segmented object classes. We introduce new methodology to encourage global exploration of scenes, via an RL-formulation that combines local navigation with global exploration by frontier exploration. We show that the RL-agent can learn adaptable behaviour such as annotating less frequently when it already has explored a number of buildings. Finally we study the embodied visual active learning problem with region-based active learning in the fourth paper. Instead of querying for annotations for a whole image, an agent can query for annotations of just parts of images, and we show that it is significantly more labelling efficient to just annotate regions in the image instead of the full images
Applications of Markov Random Field Optimization and 3D Neural Network Pruning in Computer Vision
Recent years witness the rapid development of Convolutional Neural Network (CNN) in various computer vision applications that were traditionally addressed by Markov Random Field (MRF) optimization methods. Even though CNN based methods achieve high accuracy in these tasks, a high level of fine results are difficult to be achieved. For instance, a pairwise MRF optimization method is capable of segmenting objects with the auxiliary edge information through the second-order terms, which is very uncertain to be achieved by a deep neural network. MRF optimization methods, however, are able to enhance the performance with an explicit theoretical and experimental supports using iterative energy minimization.
Secondly, such an edge detector can be learned by CNNs, and thus, seeking to transfer the task of a CNN for another task becomes valuable. It is desirable to fuse the superpixel contours from a state-of-the-art CNN with semantic segmentation results from another state-of-the-art CNN so that such a fusion enhances the object contours in semantic segmentation to be aligned with the superpixel contours. This kind of fusion is not limited to semantic segmentation but also other tasks with a collective effect of multiple off-the-shelf CNNs.
While fusing multiple CNNs is useful to enhance the performance, each of such CNNs is usually specifically designed and trained with an empirical configuration of resources. With such a large batch size, however, the joint CNN training is possible to be out of GPU memory. Such a problem is usually involved in efficient CNN training yet with limited resources. This issue is more obvious and severe in 3D CNNs than 2D CNNs due to the high requirement of training resources.
To solve the first problem, we propose two fast and differentiable message passing algorithms, namely Iterative Semi-Global Matching Revised (ISGMR) and Parallel Tree-Reweighted Message Passing (TRWP), for both energy minimization problems and deep learning applications. Our experiments on stereo vision dataset and image inpainting dataset validate the effectiveness and efficiency of our methods with minimum energies comparable to the state-of-the-art algorithm TRWS and greatly improve the forward and backward propagation speed using CUDA programming on massive parallel trees. Applying these two methods on deep learning semantic segmentation on PASCAL VOC 2012 with Canny edges achieves enhanced segmentation results measured by mean Intersection over Union (mIoU).
In the second problem, to effectively fuse and finetune multiple CNNs, we present a transparent initialization module that identically maps the output of a multiple-layer module to its input at the early stage of finetuning. The pretrained model parameters are then gradually divergent in training as the loss decreases. This transparent initialization has a higher initialization rate than Net2Net and a higher recovery rate compared with random initialization and Xavier initialization. Our experiments validate the effectiveness of the proposed transparent initialization and the sparse encoder with sparse matrix operations. The edges of segmented objects achieve a higher performance ratio and a higher F-measure than other comparable methods.
In the third problem, to compress a CNN effectually, especially for resource-inefficient 3D CNNs, we propose a single-shot neuron pruning method with resource constraints. The pruning principle is to remove the neurons with low neuron importance corresponding to small connection sensitivities. The reweighting strategy with the layerwise consumption of memory or FLOPs improves the pruning ability by avoiding infeasible pruning of the whole layer(s). Our experiments on point cloud dataset, ShapeNet, and medical image dataset, BraTS'18, prove the effectiveness of our method. Applying our method to video classification on UCF101 dataset using MobileNetV2 and I3D further strengthens the benefits of our method
Learning to Enhance RGB and Depth Images with Guidance
Image enhancement improves the visual quality of the input image to better identify key features and make it more suitable for other vision applications. Structure degradation remains a challenging problem in image enhancement, which refers to blurry edges or discontinuous structures due to unbalanced or inconsistent intensity transitions on structural regions. To overcome this issue, it is popular to make use of a guidance image to provide additional structural cues. In this thesis, we focus on two image enhancement tasks, i.e., RGB image smoothing and depth image completion. Through the two research problems, we aim to have a better understanding of what constitutes suitable guidance and how its proper use can benefit the reduction of structure degradation in image enhancement.
Image smoothing retains salient structures and removes insignificant textures in an image. Structure degradation results from the difficulty in distinguishing structures and textures with low-level cues. Structures may be inevitably blurred if the filter tries to remove some strong textures that have high contrast. Moreover, these strong textures may also be mistakenly retained as structures. We address this issue by applying two forms of guidance for structures and textures respectively. We first design a kernel-based double-guided filter (DGF), where we adopt semantic edge detection as structure guidance, and texture decomposition as texture guidance. The DGF is the first kernel filter that simultaneously leverages structure guidance and texture guidance to be both ''structure-aware'' and ''texture-aware''. Considering that textures present high randomness and variations in spatial distribution and intensities, it is not robust to localize and identify textures with hand-crafted features. Hence, we take advantage of deep learning for richer feature extraction and better generalization. Specifically, we generate synthetic data by blending natural textures with clean structure-only images. With the data, we build a texture prediction network (TPN) that estimates the location and magnitude of textures. We then combine the texture prediction results from TPN with a semantic structure prediction network so that the final texture and structure aware filtering network (TSAFN) is able to distinguish structures and textures more effectively. Our model achieves superior smoothing results than existing filters.
Depth completion recovers dense depth from sparse measurements, e.g., LiDAR. Existing depth-only methods use sparse depth as the only input and suffer from structure degradation, i.e., failing to recover semantically consistent boundaries or small/thin objects due to (1) the sparse nature of depth points and (2) the lack of images to provide structural cues. In the thesis, we deal with the structure degradation issue by using RGB image guidance in both supervised and unsupervised depth-only settings. For the supervised model, the unique design is that it simultaneously outputs a reconstructed image and a dense depth map. Specifically, we treat image reconstruction from sparse depth as an auxiliary task during training that is supervised by the image. For the unsupervised model, we regard dense depth as a reconstructed result of the sparse input, and formulate our model as an auto-encoder. To reduce structure degradation, we employ the image to guide latent features by penalizing their difference in the training process. The image guidance loss in both models enables them to acquire more dense and structural cues that are beneficial for producing more accurate and consistent depth values. For inference, the two models only take sparse depth as input and no image is required. On the KITTI Depth Completion Benchmark, we validate the effectiveness of the proposed image guidance through extensive experiments and achieve competitive performance over state-of-the-art supervised and unsupervised methods. Our approach is also applicable to indoor scenes
A review of image and video colorization: From analogies to deep learning
Image colorization is a classic and important topic in computer graphics, where the aim is to add color to a monochromatic input image to produce a colorful result. In this survey, we present the history of colorization research in chronological order and summarize popular algorithms in this field. Early works on colorization mostly focused on developing techniques to improve the colorization quality. In the last few years, researchers have considered more possibilities such as combining colorization with NLP (natural language processing) and focused more on industrial applications. To better control the color, various types of color control are designed, such as providing reference images or color-scribbles. We have created a taxonomy of the colorization methods according to the input type, divided into grayscale, sketch-based and hybrid. The pros and cons are discussed for each algorithm, and they are compared according to their main characteristics. Finally, we discuss how deep learning, and in particular Generative Adversarial Networks (GANs), has changed this field
Computational Analysis of Fundus Images: Rule-Based and Scale-Space Models
Fundus images are one of the most important imaging examinations in modern ophthalmology
because they are simple, inexpensive and, above all, noninvasive.
Nowadays, the acquisition and
storage of highresolution
fundus images is relatively easy and fast. Therefore, fundus imaging
has become a fundamental investigation in retinal lesion detection, ocular health monitoring and
screening programmes. Given the large volume and clinical complexity associated with these images,
their analysis and interpretation by trained clinicians becomes a timeconsuming
task and is
prone to human error. Therefore, there is a growing interest in developing automated approaches
that are affordable and have high sensitivity and specificity. These automated approaches need to
be robust if they are to be used in the general population to diagnose and track retinal diseases. To
be effective, the automated systems must be able to recognize normal structures and distinguish
them from pathological clinical manifestations.
The main objective of the research leading to this thesis was to develop automated systems capable
of recognizing and segmenting retinal anatomical structures and retinal pathological clinical
manifestations associated with the most common retinal diseases. In particular, these automated
algorithms were developed on the premise of robustness and efficiency to deal with the difficulties
and complexity inherent in these images. Four objectives were considered in the analysis of
fundus images. Segmentation of exudates, localization of the optic disc, detection of the midline
of blood vessels, segmentation of the vascular network and detection of microaneurysms.
In addition, we also evaluated the detection of diabetic retinopathy on fundus images using the
microaneurysm detection method. An overview of the state of the art is presented to compare the
performance of the developed approaches with the main methods described in the literature for
each of the previously described objectives. To facilitate the comparison of methods, the state of
the art has been divided into rulebased
methods and machine learningbased
methods.
In the research reported in this paper, rulebased
methods based on image processing methods
were preferred over machine learningbased
methods. In particular, scalespace
methods proved
to be effective in achieving the set goals.
Two different approaches to exudate segmentation were developed. The first approach is based on
scalespace
curvature in combination with the local maximum of a scalespace
blob detector and
dynamic thresholds. The second approach is based on the analysis of the distribution function of
the maximum values of the noise map in combination with morphological operators and adaptive
thresholds. Both approaches perform a correct segmentation of the exudates and cope well with
the uneven illumination and contrast variations in the fundus images.
Optic disc localization was achieved using a new technique called cumulative sum fields, which was
combined with a vascular enhancement method. The algorithm proved to be reliable and efficient,
especially for pathological images. The robustness of the method was tested on 8 datasets.
The detection of the midline of the blood vessels was achieved using a modified corner detector
in combination with binary philtres and dynamic thresholding. Segmentation of the vascular network
was achieved using a new scalespace
blood vessels enhancement method. The developed
methods have proven effective in detecting the midline of blood vessels and segmenting vascular
networks.
The microaneurysm detection method relies on a scalespace
microaneurysm detection and labelling
system. A new approach based on the neighbourhood of the microaneurysms was used
for labelling. Microaneurysm detection enabled the assessment of diabetic retinopathy detection.
The microaneurysm detection method proved to be competitive with other methods, especially with highresolution
images. Diabetic retinopathy detection with the developed microaneurysm
detection method showed similar performance to other methods and human experts.
The results of this work show that it is possible to develop reliable and robust scalespace
methods
that can detect various anatomical structures and pathological features of the retina. Furthermore,
the results obtained in this work show that although recent research has focused on machine learning
methods, scalespace
methods can achieve very competitive results and typically have greater
independence from image acquisition. The methods developed in this work may also be relevant
for the future definition of new descriptors and features that can significantly improve the results
of automated methods.As imagens do fundo do olho são hoje um dos principais exames imagiológicos da oftalmologia
moderna, pela sua simplicidade, baixo custo e acima de tudo pelo seu carácter nãoinvasivo.
A
aquisição e armazenamento de imagens do fundo do olho com alta resolução é também relativamente
simples e rápida. Desta forma, as imagens do fundo do olho são um exame fundamental
na identificação de alterações retinianas, monitorização da saúde ocular, e em programas de rastreio.
Considerando o elevado volume e complexidade clínica associada a estas imagens, a análise
e interpretação das mesmas por clínicos treinados tornase
uma tarefa morosa e propensa a erros
humanos. Assim, há um interesse crescente no desenvolvimento de abordagens automatizadas,
acessíveis em custo, e com uma alta sensibilidade e especificidade. Estas devem ser robustas para
serem aplicadas à população em geral no diagnóstico e seguimento de doenças retinianas. Para
serem eficazes, os sistemas de análise têm que conseguir detetar e distinguir estruturas normais
de sinais patológicos.
O objetivo principal da investigação que levou a esta tese de doutoramento é o desenvolvimento
de sistemas automáticos capazes de detetar e segmentar as estruturas anatómicas da retina, e os
sinais patológicos retinianos associados às doenças retinianas mais comuns. Em particular, estes
algoritmos automatizados foram desenvolvidos segundo as premissas de robustez e eficácia para
lidar com as dificuldades e complexidades inerentes a estas imagens.
Foram considerados quatro objetivos de análise de imagens do fundo do olho. São estes, a segmentação
de exsudados, a localização do disco ótico, a deteção da linha central venosa dos vasos
sanguíneos e segmentação da rede vascular, e a deteção de microaneurismas. De acrescentar que
usando o método de deteção de microaneurismas, avaliouse
também a capacidade de deteção da
retinopatia diabética em imagens do fundo do olho.
Para comparar o desempenho das metodologias desenvolvidas neste trabalho, foi realizado um
levantamento do estado da arte, onde foram considerados os métodos mais relevantes descritos na
literatura para cada um dos objetivos descritos anteriormente. Para facilitar a comparação entre
métodos, o estado da arte foi dividido em metodologias de processamento de imagem e baseadas
em aprendizagem máquina.
Optouse
no trabalho de investigação desenvolvido pela utilização de metodologias de análise espacial
de imagem em detrimento de metodologias baseadas em aprendizagem máquina. Em particular,
as metodologias baseadas no espaço de escalas mostraram ser efetivas na obtenção dos
objetivos estabelecidos.
Para a segmentação de exsudados foram usadas duas abordagens distintas. A primeira abordagem
baseiase
na curvatura em espaço de escalas em conjunto com a resposta máxima local de um detetor
de manchas em espaço de escalas e limiares dinâmicos. A segunda abordagem baseiase
na
análise do mapa de distribuição de ruído em conjunto com operadores morfológicos e limiares
adaptativos. Ambas as abordagens fazem uma segmentação dos exsudados de elevada precisão,
além de lidarem eficazmente com a iluminação nãouniforme
e a variação de contraste presente
nas imagens do fundo do olho. A localização do disco ótico foi conseguida com uma nova técnica
designada por campos de soma acumulativos, combinada com métodos de melhoramento da rede
vascular. O algoritmo revela ser fiável e eficiente, particularmente em imagens patológicas. A robustez
do método foi verificada pela sua avaliação em oito bases de dados. A deteção da linha central
dos vasos sanguíneos foi obtida através de um detetor de cantos modificado em conjunto com
filtros binários e limiares dinâmicos. A segmentação da rede vascular foi conseguida com um novo
método de melhoramento de vasos sanguíneos em espaço de escalas. Os métodos desenvolvidos mostraram ser eficazes na deteção da linha central dos vasos sanguíneos e na segmentação da rede
vascular. Finalmente, o método para a deteção de microaneurismas assenta num formalismo de
espaço de escalas na deteção e na rotulagem dos microaneurismas. Para a rotulagem foi utilizada
uma nova abordagem da vizinhança dos candidatos a microaneurismas. A deteção de microaneurismas
permitiu avaliar também a deteção da retinopatia diabética. O método para a deteção
de microaneurismas mostrou ser competitivo quando comparado com outros métodos, em particular
em imagens de alta resolução. A deteção da retinopatia diabética exibiu um desempenho
semelhante a outros métodos e a especialistas humanos.
Os trabalhos descritos nesta tese mostram ser possível desenvolver uma abordagem fiável e robusta
em espaço de escalas capaz de detetar diferentes estruturas anatómicas e sinais patológicos
da retina.
Além disso, os resultados obtidos mostram que apesar de a pesquisa mais recente concentrarse
em metodologias de aprendizagem máquina, as metodologias de análise espacial apresentam
resultados muito competitivos e tipicamente independentes do equipamento de aquisição das imagens.
As metodologias desenvolvidas nesta tese podem ser importantes na definição de novos
descritores e características, que podem melhorar significativamente o resultado de métodos automatizados
- …