373 research outputs found
Quaternion Orthogonal Transformer for Facial Expression Recognition in the Wild
Facial expression recognition (FER) is a challenging topic in artificial
intelligence. Recently, many researchers have attempted to introduce Vision
Transformer (ViT) to the FER task. However, ViT cannot fully utilize emotional
features extracted from raw images and requires a lot of computing resources.
To overcome these problems, we propose a quaternion orthogonal transformer
(QOT) for FER. Firstly, to reduce redundancy among features extracted from
pre-trained ResNet-50, we use the orthogonal loss to decompose and compact
these features into three sets of orthogonal sub-features. Secondly, three
orthogonal sub-features are integrated into a quaternion matrix, which
maintains the correlations between different orthogonal components. Finally, we
develop a quaternion vision transformer (Q-ViT) for feature classification. The
Q-ViT adopts quaternion operations instead of the original operations in ViT,
which improves the final accuracies with fewer parameters. Experimental results
on three in-the-wild FER datasets show that the proposed QOT outperforms
several state-of-the-art models and reduces the computations.Comment: This paper has been accepted to ICASSP202
HOG active appearance models
We propose the combination of dense Histogram of Oriented Gradients (HOG) features with Active Appearance Models (AAMs). We employ the efficient Inverse Compositional optimization technique and show results for the task of face fitting. By taking advantage of the descriptive characteristics of HOG features, we build robust and accurate AAMs that generalize well to unseen faces with illumination, identity, pose and occlusion variations. Our experiments on challenging in-the-wild databases show that HOG AAMs significantly outperform current state-of-the-art results of discriminative methods trained on larger databases
Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment. FMER is a subset of image processing and it
is a multidisciplinary topic to analysis. So, it requires familiarity with
other topics of Artifactual Intelligence (AI) such as machine learning, digital
image processing, psychology and more. So, it is a great opportunity to write a
book which covers all of these topics for beginner to professional readers in
the field of AI and even without having background of AI. Our goal is to
provide a standalone introduction in the field of MFER analysis in the form of
theorical descriptions for readers with no background in image processing with
reproducible Matlab practical examples. Also, we describe any basic definitions
for FMER analysis and MATLAB library which is used in the text, that helps
final reader to apply the experiments in the real-world applications. We
believe that this book is suitable for students, researchers, and professionals
alike, who need to develop practical skills, along with a basic understanding
of the field. We expect that, after reading this book, the reader feels
comfortable with different key stages such as color and depth image processing,
color and depth image representation, classification, machine learning, facial
micro-expressions recognition, feature extraction and dimensionality reduction.
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment.Comment: This is the second edition of the boo
Enhanced Augmented Reality Framework for Sports Entertainment Applications
Augmented Reality (AR) superimposes virtual information on real-world data, such as displaying useful information on videos/images of a scene. This dissertation presents an Enhanced AR (EAR) framework for displaying useful information on images of a sports game. The challenge in such applications is robust object detection and recognition. This is even more challenging when there is strong sunlight. We address the phenomenon where a captured image is degraded by strong sunlight.
The developed framework consists of an image enhancement technique to improve the accuracy of subsequent player and face detection. The image enhancement is followed by player detection, face detection, recognition of players, and display of personal information of players. First, an algorithm based on Multi-Scale Retinex (MSR) is proposed for image enhancement. For the tasks of player and face detection, we use adaptive boosting algorithm with Haar-like features for both feature selection and classification. The player face recognition algorithm uses adaptive boosting with the LDA for feature selection and nearest neighbor classifier for classification. The framework can be deployed in any sports where a viewer captures images. Display of players-specific information enhances the end-user experience. Detailed experiments are performed on 2096 diverse images captured using a digital camera and smartphone. The images contain players in different poses, expressions, and illuminations. Player face recognition module requires players faces to be frontal or up to ?350 of pose variation. The work demonstrates the great potential of computer vision based approaches for future development of AR applications.COMSATS Institute of Information Technolog
Human Attention Detection Using AM-FM Representations
Human activity detection from digital videos presents many challenges to the computer vision and image processing communities. Recently, many methods have been developed to detect human activities with varying degree of success. Yet, the general human activity detection problem remains very challenging, especially when the methods need to work “in the wild” (e.g., without having precise control over the imaging geometry). The thesis explores phase-based solutions for (i) detecting faces, (ii) back of the heads, (iii) joint detection of faces and back of the heads, and (iv) whether the head is looking to the left or the right, using standard video cameras without any control on the imaging geometry. The proposed phase-based approach is based on the development of simple and robust methods that relie on the use of Amplitude Modulation - Frequency Modulation (AM-FM) models. The approach is validated using video frames extracted from the Advancing Outof- school Learning in Mathematics and Engineering (AOLME) project. The dataset consisted of 13,265 images from ten students looking at the camera, and 6,122 images from five students looking away from the camera. For the students facing the camera, the method was able to correctly classify 97.1% of them looking to the left and 95.9% of them looking to the right. For the students facing the back of the camera, the method was able to correctly classify 87.6% of them looking to the left and 93.3% of them looking to the right. The results indicate that AM-FM based methods hold great promise for analyzing human activity videos
- …