735 research outputs found
Hierarchical binary CNNs for landmark localization with limited resources
Our goal is to design architectures that retain the groundbreaking performance of Convolutional Neural Networks (CNNs) for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual architecture that yields large performance improvement over the standard bottleneck block while having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance. (e) We further provide additional results for the problem of facial part segmentation. Code can be downloaded from https://www.adrianbulat.com/binary-cnn-landmark
Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources
Our goal is to design architectures that retain the groundbreaking performance of CNNs for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual architecture that yields large performance improvement over the standard bottleneck block while having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance. Code can be downloaded from https://www. adrianbulat.com/binary-cnn-landmark
The intersection of video capsule endoscopy and artificial intelligence: addressing unique challenges using machine learning
Introduction: Technical burdens and time-intensive review processes limit the
practical utility of video capsule endoscopy (VCE). Artificial intelligence
(AI) is poised to address these limitations, but the intersection of AI and VCE
reveals challenges that must first be overcome. We identified five challenges
to address. Challenge #1: VCE data are stochastic and contains significant
artifact. Challenge #2: VCE interpretation is cost-intensive. Challenge #3: VCE
data are inherently imbalanced. Challenge #4: Existing VCE AIMLT are
computationally cumbersome. Challenge #5: Clinicians are hesitant to accept
AIMLT that cannot explain their process.
Methods: An anatomic landmark detection model was used to test the
application of convolutional neural networks (CNNs) to the task of classifying
VCE data. We also created a tool that assists in expert annotation of VCE data.
We then created more elaborate models using different approaches including a
multi-frame approach, a CNN based on graph representation, and a few-shot
approach based on meta-learning.
Results: When used on full-length VCE footage, CNNs accurately identified
anatomic landmarks (99.1%), with gradient weighted-class activation mapping
showing the parts of each frame that the CNN used to make its decision. The
graph CNN with weakly supervised learning (accuracy 89.9%, sensitivity of
91.1%), the few-shot model (accuracy 90.8%, precision 91.4%, sensitivity
90.9%), and the multi-frame model (accuracy 97.5%, precision 91.5%, sensitivity
94.8%) performed well. Discussion: Each of these five challenges is addressed,
in part, by one of our AI-based models. Our goal of producing high performance
using lightweight models that aim to improve clinician confidence was achieved
3D Shape Descriptor-Based Facial Landmark Detection: A Machine Learning Approach
Facial landmark detection on 3D human faces has had numerous applications in the literature
such as establishing point-to-point correspondence between 3D face models which is itself a
key step for a wide range of applications like 3D face detection and authentication, matching,
reconstruction, and retrieval, to name a few.
Two groups of approaches, namely knowledge-driven and data-driven approaches, have been
employed for facial landmarking in the literature. Knowledge-driven techniques are the
traditional approaches that have been widely used to locate landmarks on human faces. In
these approaches, a user with sucient knowledge and experience usually denes features to
be extracted as the landmarks. Data-driven techniques, on the other hand, take advantage
of machine learning algorithms to detect prominent features on 3D face models. Besides
the key advantages, each category of these techniques has limitations that prevent it from
generating the most reliable results.
In this work we propose to combine the strengths of the two approaches to detect facial
landmarks in a more ecient and precise way. The suggested approach consists of two phases.
First, some salient features of the faces are extracted using expert systems. Afterwards,
these points are used as the initial control points in the well-known Thin Plate Spline (TPS)
technique to deform the input face towards a reference face model. Second, by exploring and
utilizing multiple machine learning algorithms another group of landmarks are extracted.
The data-driven landmark detection step is performed in a supervised manner providing an
information-rich set of training data in which a set of local descriptors are computed and used
to train the algorithm. We then, use the detected landmarks for establishing point-to-point
correspondence between the 3D human faces mainly using an improved version of Iterative
Closest Point (ICP) algorithms. Furthermore, we propose to use the detected landmarks for
3D face matching applications
Deep learning for real world face alignment
Face alignment is one of the fundamental steps in a vast number of tasks of high economical and social value, ranging from security to health and entertainment. Despite the attention received from the community for more than 2 decades and the success of cascaded regression based approaches, many challenges were yet to be solved, such as the case of near-profile poses and low resolution faces.
In this thesis, we successfully address a series of such challenges in the area of face alignment and super-resolution, significantly pushing the state-of-the-art by proposing novel deep learning-based architectures specially tailored for fine grained recognition tasks. In summary, we address the following problems: (I) fitting faces found in large poses (Chapter 3), (II) in both 2D and 3D space (Chapter 4), creating in the process (III) the largest in-the-wild large pose 3D face alignment dataset (Chapter 4). While the case of high resolution faces was actively explored in the past, in this thesis we systematically study and address a new challenge: that of (IV) fitting landmarks in very low resolution faces (Chapter 6). While deep learning based approaches achieved remarkable results on a wide variety of tasks, they are usually slow having high computational requirements. As such, in Chapter 5, we propose (V) a novel residual block carefully crafted for binarized neural networks that significantly improves the speed, due to the use of binary operations for both the weights and the activations, while maintaining a similar or competitive accuracy.
The results presented through out this thesis set the new state-of-the-art on both 2D & 3D face alignment and face super-resolution
- …