139 research outputs found
Your fellows matter: Affect analysis across subjects in group videos
Automatic affect analysis has become a well established
research area in the last two decades. Recent works have
started moving from individual to group scenarios. However,
little attention has been paid to investigating how individuals in
a group influence the affective states of each other. In this paper,
we propose a novel framework for cross-subjects affect analysis
in group videos. Specifically, we analyze the correlation of the
affect among group members and investigate the automatic
recognition of the affect of one subject using the behaviours
expressed by another subject in the same group. A set of
experiments are conducted using a recently collected database
aimed at affect analysis in group settings. Our results show
that (1) people in the same group do share more information
in terms of behaviours and emotions than people in different
groups; and (2) the affect of one subject in a group can be
better predicted using the expressive behaviours of another
subject within the same group than using that of a subject
from a different group. This work is of great importance for
affect recognition in group settings: when the information of
one subject is unavailable due to occlusion, head/body poses
etc., we can predict his/her affect by employing the expressive
behaviours of the other subject(s).European Unions Horizon 202
Medical Image Registration Using Artificial Neural Network
Image registration is the transformation of different sets of images into one coordinate system in order to align and overlay multiple images. Image registration is used in many fields such as medical imaging, remote sensing, and computer vision. It is very important in medical research, where multiple images are acquired from different sensors at various points in time. This allows doctors to monitor the effects of treatments on patients in a certain region of interest over time. In this thesis, artificial neural networks with curvelet keypoints are used to estimate the parameters of registration. Simulations show that the curvelet keypoints provide more accurate results than using the Discrete Cosine Transform (DCT) coefficients and Scale Invariant Feature Transform (SIFT) keypoints on rotation and scale parameter estimation
Roadmap on holography
From its inception holography has proven an extremely productive and attractive area of research. While specific technical applications give rise to 'hot topics', and three-dimensional (3D) visualisation comes in and out of fashion, the core principals involved continue to lead to exciting innovations in a wide range of areas. We humbly submit that it is impossible, in any journal document of this type, to fully reflect current and potential activity; however, our valiant contributors have produced a series of documents that go no small way to neatly capture progress across a wide range of core activities. As editors we have attempted to spread our net wide in order to illustrate the breadth of international activity. In relation to this we believe we have been at least partially successful.This work was supported by Ministerio de EconomÃa, Industria y Competitividad (Spain) under projects FIS2017-82919-R (MINECO/AEI/FEDER, UE) and FIS2015-66570-P (MINECO/FEDER), and by Generalitat Valenciana (Spain) under project PROMETEO II/2015/015
Geometric and photometric affine invariant image registration
This thesis aims to present a solution to the correspondence problem for the registration
of wide-baseline images taken from uncalibrated cameras. We propose an affine
invariant descriptor that combines the geometry and photometry of the scene to find
correspondences between both views. The geometric affine invariant component of the
descriptor is based on the affine arc-length metric, whereas the photometry is analysed
by invariant colour moments. A graph structure represents the spatial distribution of the
primitive features; i.e. nodes correspond to detected high-curvature points, whereas arcs
represent connectivities by extracted contours. After matching, we refine the search for
correspondences by using a maximum likelihood robust algorithm. We have evaluated
the system over synthetic and real data. The method is endemic to propagation of errors
introduced by approximations in the system.BAE SystemsSelex Sensors and Airborne System
Robust visual speech recognition using optical flow analysis and rotation invariant features
The focus of this thesis is to develop computer vision algorithms for visual speech recognition system to identify the visemes. The majority of existing speech recognition systems is based on audio-visual signals and has been developed for speech enhancement and is prone to acoustic noise. Considering this problem, aim of this research is to investigate and develop a visual only speech recognition system which should be suitable for noisy environments. Potential applications of such a system include the lip-reading mobile phones, human computer interface (HCI) for mobility-impaired users, robotics, surveillance, improvement of speech based computer control in a noisy environment and for the rehabilitation of the persons who have undergone a laryngectomy surgery. In the literature, there are several models and algorithms available for visual feature extraction. These features are extracted from static mouth images and characterized as appearance and shape based features. However, these methods rarely incorporate the time dependent information of mouth dynamics. This dissertation presents two optical flow based approaches of visual feature extraction, which capture the mouth motions in an image sequence. The motivation for using motion features is, because the human perception of lip-reading is concerned with the temporal dynamics of mouth motion. The first approach is based on extraction of features from the optical flow vertical component. The optical flow vertical component is decomposed into multiple non-overlapping fixed scale blocks and statistical features of each block are computed for successive video frames of an utterance. To overcome the issue of large variation in speed of speech, each utterance is normalized using simple linear interpolation method. In the second approach, four directional motion templates based on optical flow are developed, each representing the consolidated motion information in an utterance in four directions (i.e.,up, down, left and right). This approach is an evolution of a view based approach known as motion history image (MHI). One of the main issues with the MHI method is its motion overwriting problem because of self-occlusion. DMHIs seem to solve this issue of overwriting. Two types of image descriptors, Zernike moments and Hu moments are used to represent each image of DMHIs. A support vector machine (SVM) classifier was used to classify the features obtained from the optical flow vertical component, Zernike and Hu moments separately. For identification of visemes, a multiclass SVM approach was employed. A video speech corpus of seven subjects was used for evaluating the efficiency of the proposed methods for lip-reading. The experimental results demonstrate the promising performance of the optical flow based mouth movement representations. Performance comparison between DMHI and MHI based on Zernike moments, shows that the DMHI technique outperforms the MHI technique. A video based adhoc temporal segmentation method is proposed in the thesis for isolated utterances. It has been used to detect the start and the end frame of an utterance from an image sequence. The technique is based on a pair-wise pixel comparison method. The efficiency of the proposed technique was tested on the available data set with short pauses between each utterance
Recommended from our members
Pattern mining approaches used in sensor-based biometric recognition: a review
Sensing technologies place significant interest in the use of biometrics for the recognition and assessment of individuals. Pattern mining techniques have established a critical step in the progress of sensor-based biometric systems that are capable of perceiving, recognizing and computing sensor data, being a technology that searches for the high-level information about pattern recognition from low-level sensor readings in order to construct an artificial substitute for human recognition. The design of a successful sensor-based biometric recognition system needs to pay attention to the different issues involved in processing variable data being - acquisition of biometric data from a sensor, data pre-processing, feature extraction, recognition and/or classification, clustering and validation. A significant number of approaches from image processing, pattern identification and machine learning have been used to process sensor data. This paper aims to deliver a state-of-the-art summary and present strategies for utilizing the broadly utilized pattern mining methods in order to identify the challenges as well as future research directions of sensor-based biometric systems
3D Deep Learning on Medical Images: A Review
The rapid advancements in machine learning, graphics processing technologies
and availability of medical imaging data has led to a rapid increase in use of
deep learning models in the medical domain. This was exacerbated by the rapid
advancements in convolutional neural network (CNN) based architectures, which
were adopted by the medical imaging community to assist clinicians in disease
diagnosis. Since the grand success of AlexNet in 2012, CNNs have been
increasingly used in medical image analysis to improve the efficiency of human
clinicians. In recent years, three-dimensional (3D) CNNs have been employed for
analysis of medical images. In this paper, we trace the history of how the 3D
CNN was developed from its machine learning roots, give a brief mathematical
description of 3D CNN and the preprocessing steps required for medical images
before feeding them to 3D CNNs. We review the significant research in the field
of 3D medical imaging analysis using 3D CNNs (and its variants) in different
medical areas such as classification, segmentation, detection, and
localization. We conclude by discussing the challenges associated with the use
of 3D CNNs in the medical imaging domain (and the use of deep learning models,
in general) and possible future trends in the field.Comment: 13 pages, 4 figures, 2 table
Signal processing algorithms for enhanced image fusion performance and assessment
The dissertation presents several signal processing algorithms for image fusion in noisy multimodal
conditions. It introduces a novel image fusion method which performs well for image
sets heavily corrupted by noise. As opposed to current image fusion schemes, the method has
no requirements for a priori knowledge of the noise component. The image is decomposed with
Chebyshev polynomials (CP) being used as basis functions to perform fusion at feature level. The
properties of CP, namely fast convergence and smooth approximation, renders it ideal for heuristic
and indiscriminate denoising fusion tasks. Quantitative evaluation using objective fusion assessment
methods show favourable performance of the proposed scheme compared to previous efforts
on image fusion, notably in heavily corrupted images.
The approach is further improved by incorporating the advantages of CP with a state-of-the-art
fusion technique named independent component analysis (ICA), for joint-fusion processing
based on region saliency. Whilst CP fusion is robust under severe noise conditions, it is prone to
eliminating high frequency information of the images involved, thereby limiting image sharpness.
Fusion using ICA, on the other hand, performs well in transferring edges and other salient features
of the input images into the composite output. The combination of both methods, coupled with
several mathematical morphological operations in an algorithm fusion framework, is considered a
viable solution. Again, according to the quantitative metrics the results of our proposed approach
are very encouraging as far as joint fusion and denoising are concerned.
Another focus of this dissertation is on a novel metric for image fusion evaluation that is based
on texture. The conservation of background textural details is considered important in many fusion
applications as they help define the image depth and structure, which may prove crucial in
many surveillance and remote sensing applications. Our work aims to evaluate the performance of image fusion algorithms based on their ability to retain textural details from the fusion process.
This is done by utilising the gray-level co-occurrence matrix (GLCM) model to extract second-order
statistical features for the derivation of an image textural measure, which is then used to
replace the edge-based calculations in an objective-based fusion metric. Performance evaluation
on established fusion methods verifies that the proposed metric is viable, especially for multimodal
scenarios
- …