26 research outputs found
Efficient 3D Face Recognition with Gabor Patched Spectral Regression
In this paper, we utilize a novel framework for 3D face recognition, called 3D Gabor Patched Spectral Regression (3D GPSR), which can overcome some of the continuing challenges encountered with 2D or 3D facial images. In this active field, some obstacles, like expression variations, pose correction and data noise deteriorate the performance significantly. Our proposed system addresses these problems by first extracting the main facial area to remove irrelevant information corresponding to shoulders and necks. Pose correction is used to minimize the influence of large pose variations and then the normalized depth and gray images can be obtained. Due to better time-frequency characteristics and a distinctive biological background, the Gabor feature is extracted on depth images, known as 3D Gabor faces. Data noise is mainly caused by distorted meshes, varieties of subordinates and misalignment. To solve these problems, we introduce a Patched Spectral Regression strategy, which can make good use of the robustness and efficiency of accurate patched discriminant low-dimension features and minimize the effect of noise term. Computational analysis shows that spectral regression is much faster than the traditional approaches. Our experiments are based on the CASIA and FRGC 3D face databases which contain a huge number of challenging data. Experimental results show that our framework consistently outperforms the other existing methods with the distinctive characteristics of efficiency, robustness and generality
Evaluation of face recognition algorithms under noise
One of the major applications of computer vision and image processing is face recognition,
where a computerized algorithm automatically identifies a person’s face from
a large image dataset or even from a live video. This thesis addresses facial recognition,
a topic that has been widely studied due to its importance in many applications
in both civilian and military domains. The application of face recognition systems
has expanded from security purposes to social networking sites, managing fraud, and
improving user experience. Numerous algorithms have been designed to perform face
recognition with good accuracy. This problem is challenging due to the dynamic nature
of the human face and the different poses that it can take. Regardless of the
algorithm, facial recognition accuracy can be heavily affected by the presence of noise.
This thesis presents a comparison of traditional and deep learning face recognition
algorithms under the presence of noise. For this purpose, Gaussian and salt-andpepper
noises are applied to the face images drawn from the ORL Dataset. The
image recognition is performed using each of the following eight algorithms: principal
component analysis (PCA), two-dimensional PCA (2D-PCA), linear discriminant
analysis (LDA), independent component analysis (ICA), discrete cosine transform
(DCT), support vector machine (SVM), convolution neural network (CNN) and Alex
Net. The ORL dataset was used in the experiments to calculate the evaluation accuracy
for each of the investigated algorithms. Each algorithm is evaluated with two
experiments; in the first experiment only one image per person is used for training,
whereas in the second experiment, five images per person are used for training. The investigated traditional algorithms are implemented with MATLAB and the deep
learning algorithms approaches are implemented with Python. The results show that
the best performance was obtained using the DCT algorithm with 92% dominant
eigenvalues and 95.25 % accuracy, whereas for deep learning, the best performance
was using a CNN with accuracy of 97.95%, which makes it the best choice under noisy
conditions
Groupwise non-rigid registration for automatic construction of appearance models of the human craniofacial complex for analysis, synthesis and simulation
Finally, a novel application of 3D appearance modelling is proposed: a faster than real-time algorithm for statistically constrained quasi-mechanical simulation. Experiments demonstrate superior realism, achieved in the proposed method by employing statistical appearance models to drive the simulation, in comparison with the comparable state-of-the-art quasi-mechanical approaches.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Groupwise non-rigid registration for automatic construction of appearance models of the human craniofacial complex for analysis, synthesis and simulation
Finally, a novel application of 3D appearance modelling is proposed: a faster than real-time algorithm for statistically constrained quasi-mechanical simulation. Experiments demonstrate superior realism, achieved in the proposed method by employing statistical appearance models to drive the simulation, in comparison with the comparable state-of-the-art quasi-mechanical approaches
Coordinate Independent Convolutional Networks -- Isometry and Gauge Equivariant Convolutions on Riemannian Manifolds
Motivated by the vast success of deep convolutional networks, there is a
great interest in generalizing convolutions to non-Euclidean manifolds. A major
complication in comparison to flat spaces is that it is unclear in which
alignment a convolution kernel should be applied on a manifold. The underlying
reason for this ambiguity is that general manifolds do not come with a
canonical choice of reference frames (gauge). Kernels and features therefore
have to be expressed relative to arbitrary coordinates. We argue that the
particular choice of coordinatization should not affect a network's inference
-- it should be coordinate independent. A simultaneous demand for coordinate
independence and weight sharing is shown to result in a requirement on the
network to be equivariant under local gauge transformations (changes of local
reference frames). The ambiguity of reference frames depends thereby on the
G-structure of the manifold, such that the necessary level of gauge
equivariance is prescribed by the corresponding structure group G. Coordinate
independent convolutions are proven to be equivariant w.r.t. those isometries
that are symmetries of the G-structure. The resulting theory is formulated in a
coordinate free fashion in terms of fiber bundles. To exemplify the design of
coordinate independent convolutions, we implement a convolutional network on
the M\"obius strip. The generality of our differential geometric formulation of
convolutional networks is demonstrated by an extensive literature review which
explains a large number of Euclidean CNNs, spherical CNNs and CNNs on general
surfaces as specific instances of coordinate independent convolutions.Comment: The implementation of orientation independent M\"obius convolutions
is publicly available at https://github.com/mauriceweiler/MobiusCNN
A global-to-local model for the representation of human faces
In the context of face modeling and face recognition, statistical models are
widely used for the representation and modeling of surfaces. Most of these
models are obtained by computing Principal Components Analysis (PCA) on a set
of representative examples. These models represent novel faces poorly due to
their holistic nature (i.e.\ each component has global support), and they
suffer from overfitting when used for generalization from partial information.
In this work, we present a novel analysis method that breaks the objects up
into modes based on spatial frequency. The high-frequency modes are segmented
into regions with respect to specific features of the object. After computing
PCA on these segments individually, a hierarchy of global and local components
gradually decreasing in size of their support is combined into a linear
statistical model, hence the name, Global-to-Local model (G2L). We apply our
methodology to build a novel G2L model of 3D shapes of human heads. Both the
representation and the generalization capabilities of the models are evaluated
and compared in a standardized test, and it is demonstrated that the G2L model
performs better compared to traditional holistic PCA models. Furthermore, both
models are used to reconstruct the 3D shape of faces from a single photograph.
A novel adaptive fitting method is presented that estimates the model parameters
using a multi-resolution approach. The model is first fitted to contours
extracted from the image. In a second stage, the contours are kept fixed and
the remaining flexibility of the model is fitted to the input image. This makes
the method fast (30 sec on a standard PC), efficient, and accurate
Real-time Visual Flow Algorithms for Robotic Applications
Vision offers important sensor cues to modern robotic platforms.
Applications such as control of aerial vehicles, visual servoing,
simultaneous localization and mapping, navigation and more
recently, learning, are examples where visual information is
fundamental to accomplish tasks. However, the use of computer
vision algorithms carries the computational cost of extracting
useful information from the stream of raw pixel data. The most
sophisticated algorithms use complex mathematical formulations
leading typically to computationally expensive, and consequently,
slow implementations. Even with modern computing resources,
high-speed and high-resolution video feed can only be used for
basic image processing operations. For a vision algorithm to be
integrated on a robotic system, the output of the algorithm
should be provided in real time, that is, at least at the same
frequency as the control logic of the robot. With robotic
vehicles becoming more dynamic and ubiquitous, this places higher
requirements to the vision processing pipeline.
This thesis addresses the problem of estimating dense visual flow
information in real time. The contributions of this work are
threefold. First, it introduces a new filtering algorithm for the
estimation of dense optical flow at frame rates as fast as 800 Hz
for 640x480 image resolution. The algorithm follows a
update-prediction architecture to estimate dense optical flow
fields incrementally over time. A fundamental component of the
algorithm is the modeling of the spatio-temporal evolution of the
optical flow field by means of partial differential equations.
Numerical predictors can implement such PDEs to propagate current
estimation of flow forward in time. Experimental validation of
the algorithm is provided using high-speed ground truth image
dataset as well as real-life video data at 300 Hz.
The second contribution is a new type of visual flow named
structure flow. Mathematically, structure flow is the
three-dimensional scene flow scaled by the inverse depth at each
pixel in the image. Intuitively, it is the complete velocity
field associated with image motion, including both optical flow
and scale-change or apparent divergence of the image. Analogously
to optic flow, structure flow provides a robotic vehicle with
perception of the motion of the environment as seen by the
camera. However, structure flow encodes the full 3D image motion
of the scene whereas optic flow only encodes the component on the
image plane. An algorithm to estimate structure flow from image
and depth measurements is proposed based on the same filtering
idea used to estimate optical flow.
The final contribution is the spherepix data structure for
processing spherical images. This data structure is the numerical
back-end used for the real-time implementation of the structure
flow filter. It consists of a set of overlapping patches covering
the surface of the sphere. Each individual patch approximately
holds properties such as orthogonality and equidistance of
points, thus allowing efficient implementations of low-level
classical 2D convolution based image processing routines such as
Gaussian filters and numerical derivatives.
These algorithms are implemented on GPU hardware and can be
integrated to future Robotic Embedded Vision systems to provide
fast visual information to robotic vehicles
Recommended from our members
Three-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors
With the recent development of low-cost depth sensors, an entirely new type of 3D data is being generated rapidly by regular consumers. Traditionally, 3D data is produced by a small number of professional designers (i.e., the Computer Aided Design (CAD) model); however, 3D data from massive consumer-level sensors has the potential of introducing many new applications, such as user-captured 3D warehouse and search engines, robots with 3D sensing capability, and customized 3D printing. Nevertheless, the low-cost sensors used by general consumers also pose new technological challenges. First, they have relatively high levels of sensor noise. Second, the use of such consumer devices is often in uncontrolled settings, resulting in challenging conditions, such as poor lighting, cluttered scenes, and object occlusion. To address such emerging opportunities and associated challenges, this dissertation is dedicated to the development of novel algorithms and systems for 3D data understanding and processing, using input from a consumer-level 3D sensor.
In particular, the key problems of 3D shape retrieval, scene understanding, and pose recognition are explored in order to present a comprehensive coverage of the key aspects of content-based 3D shape analysis. To resolve the aforementioned challenges, we propose a flexible Markov Random Field (MRF) framework that uses local information to allow partial matching, and thus address the model incompleteness problem; the framework also uses higher-order correlation to provide additional robustness against sensor noise. With the MRF framework, these 3D analysis problems can be transformed into a unified potential energy minimization problem, while preserving the flexibility to adapt to different settings and resolve the unique challenges of each problem. The contributions of the dissertation include:
a. Cross-Domain 3D Retrieval: First we tackle the problem of searching 3D noise- free models using noisy data captured by low-cost 3D sensors – a unique cross-domain setting. To manage the challenges of sensor noise and model incompleteness from consumer-level sensors, we propose a novel MRF formulation for the retrieval problem. The potential function of the random field is designed to capture both the local shape and global spatial consistency in order to preserve the local matching capability, while offering robustness against the sensor noise. The specific form of the potential functions is determined efficiently by a series of weak classifiers, thus forming a variant of the Regression Tree Field (RTF). We achieve better retrieval precision and recall in the cross-domain settings with a consumer-level depth sensor compared with state-of-the-art approaches.
b. 3D Scene Understanding: We develop a scene understanding system based on input from consumer-level depth sensors. To resolve the key challenge of the lack of annotated 3D training data, we construct an MRF that connects the input 3D point cloud and the associated 2D reference images, based on which the 3D point cloud is stitched. A series of weak classifiers are trained to obtain an approximate semantic segmentation result from the reference images. The potential function of the field is designed to integrate the results from the classifiers, while taking advantage of the 3D spatial consistency in order to output a comprehensive scene understanding result. We achieve comparable accuracy and much faster speed compared with state-of-the-art 3D scene understanding systems, with the difference that we do not require annotated 3D training data.
c. Pose Recognition of Deformable Objects: We develop a method for supporting a robotics system to recognize pose and manipulate deformable objects. More specifically, garment pose is recognized with the help of an offline simulated database and the proposed retrieval approach. We use a novel binary feature representation extracted from the reconstructed 3D surfaces in order to allow efficient matching, thus achieving real-time performance. A spatial weight is further learned in order to integrate the local matching result. The system shows superior recognition accuracy and faster speed than the state-of-the-art approaches.
d. Application with 2D Data: In addition to the traditional 3D applications, we explore the possibility of extending MRF formulation to 2D data, especially those used in classical low-level 2D vision problems, such as image deblurring and denoising. One well-known technique that uses image prior, the probabilistic patched-based prior, is known to have bottlenecks in finding the most similar model from a model set, which can be posed as a retrieval problem. Therefore, we apply the MRF formulation originally developed for 3D shape retrieval, and extend it to this 2D problem by introducing a grid-like random field structure. We can achieve 40x acceleration compared with the state-of-the-art algorithm, while preserving quality.
We organize the dissertation as follows. First, the core problems of 3D shape retrieval, scene understanding, and pose recognition, and with the proposed solutions that use MRF and RTF are explored in Part I. In Part II, the extension to 2D data is discussed. Extensive evaluation is performed in each specific task in order to compare the proposed approaches with state-of-the-art algorithms and systems, and also to justify the components of the proposed methods. Finally, in Part III, we include the conclusion remarks and discussion of open issues and future work
Deliverable D1.1 State of the art and requirements analysis for hypervideo
This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary
Tensor Representations for Object Classification and Detection
A key problem in object recognition is finding a suitable object representation.
For historical and computational reasons, vector descriptions that encode particular
statistical properties of the data have been broadly applied. However, employing
tensor representation can describe the interactions of multiple factors
inherent to image formation. One of the most convenient uses for tensors is to represent
complex objects in order to build a discriminative description.
Thus thesis has several main contributions, focusing on visual data detection (e.g. of heads or pedestrians) and classification (e.g. of head or human body orientation) in still images and on machine learning techniques to analyse tensor data. These applications are among the most studied in computer vision and are typically formulated as binary or multi-class classification problems.
The applicative context of this thesis is the video surveillance, where classification and detection tasks
can be very hard, due to the scarce resolution and the noise characterising
sensor data. Therefore, the main goal in that context is to design algorithms that can
characterise different objects of interest, especially when immersed in a cluttered
background and captured at low resolution.
In the different amount of machine learning approaches, the ensemble-of-classifiers demonstrated to reach
excellent classification accuracy, good generalisation ability, and robustness of noisy data. For these
reasons, some approaches in that class have been adopted as basic machine classification
frameworks to build robust classifiers and detectors. Moreover, also
kernel machines has been exploited for classification purposes,
since they represent a natural learning framework for tensors