Search CORE

16,189 research outputs found

Array of Multilayer Perceptrons with No-class Resampling Training for Face Recognition

Author: Capello D.
Martínez César Ernesto
Milone Diego Humberto
Stegmayer Georgina
Publication venue: AEPIA
Publication date: 01/01/2009
Field of study

A face recognition (FR) problem involves the face detection, representation and classification steps. Once a face is located in an image, it has to be represented through a feature extraction process, for later performing a proper face classication task. The most widely used approach for feature extraction is the eigenfaces method, where an eigenspace is established from the image training samples using principal components analysis.In the classification phase, an input face is projected to the obtained eigenspace and classified by an appropriate classifier. Neural network classifiers based on multilayer perceptron models have proven to be well suited to this task. This paper presents an array of multilayer perceptron neural networks trained with a novel no-class resampling strategy which takes into account the balance problem between class and no-class examples andincreases the generalization capabilities. The proposed model is compared against a classical multilayer perceptron classifier for face recognition over the AT&T database of faces, obtaining results that show an improvement over the classification rates of a classical classifier.Fil: Capello, D.. Universidad Tecnológica Nacional. Facultad Regional Santa Fe; ArgentinaFil: Martínez, César Ernesto. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas; ArgentinaFil: Milone, Diego Humberto. Universidad Nacional de Entre Ríos; ArgentinaFil: Stegmayer, Georgina. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas; Argentin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Secretaría de Estado de Cultura

Registration of Face Image Using Modified BRISK Feature Descriptor

Author: G S Parnav
Publication venue
Publication date: 01/05/2015
Field of study

Automatic face recognition is a hot area of research in the field of computer vision. Even though a lot of research have been done in this field, still researchers are unable to develop an algorithm which can detect the face images under all possible real time conditions. Automatic face recognition algorithms are used in a variety of applications such as surveillance, automatic tagging, and human-robot interaction etc. The main problem faced by researchers working with the above real time problems is the uncertainty about the pose of the detected face, i.e. if the pose of the sensed image differ from the images in the trained database most of the existing algorithms will fail. So researchers suggested and proved that the detection accuracy against pose variation can be improved if we considered image registration as a preprocessing step prior to face recognition. In this work, scale and rotation invariant features have been used for image registration. The important steps in feature based image registration are preprocessing, feature detection, feature matching, transformation estimation, and resampling. In this work, feature detectors and descriptors like SIFT, SURF, FAST, DAISY and BRISK are used. Among all these descriptors the BRISK descriptor performs the best. To avoid mismatches, using some threshold values, a modified BRISK descriptor has been proposed in this work. Modified BRISK descriptor performs best in terms of maximum matching as compared to other state of arts descriptors. The next step is to calculate the transformation model which is capable of transforming the coordinates of sensed image to coordinates of reference image. Some radial basis functions are used in this step to design the proper transformation function. In resampling step, we used bilinear interpolation to compute some pixels in the output image. A new algorithm is proposed in this work to find out the possible image pairs from the train database corresponds to the input image, for doing image registration. In this work, image registration algorithms are simulated in MATLAB with different detector-descriptor combination and affine transformation matrix. For measuring the similarity between registered output image and the reference image, SSIM index and mutual information is used

ethesis@nitr

Single camera pose estimation using Bayesian filtering and Kinect motion priors

Author: Burke Michael
Lasenby Joan
Publication venue
Publication date: 17/06/2014
Field of study

Traditional approaches to upper body pose estimation using monocular vision rely on complex body models and a large variety of geometric constraints. We argue that this is not ideal and somewhat inelegant as it results in large processing burdens, and instead attempt to incorporate these constraints through priors obtained directly from training data. A prior distribution covering the probability of a human pose occurring is used to incorporate likely human poses. This distribution is obtained offline, by fitting a Gaussian mixture model to a large dataset of recorded human body poses, tracked using a Kinect sensor. We combine this prior information with a random walk transition model to obtain an upper body model, suitable for use within a recursive Bayesian filtering framework. Our model can be viewed as a mixture of discrete Ornstein-Uhlenbeck processes, in that states behave as random walks, but drift towards a set of typically observed poses. This model is combined with measurements of the human head and hand positions, using recursive Bayesian estimation to incorporate temporal information. Measurements are obtained using face detection and a simple skin colour hand detector, trained using the detected face. The suggested model is designed with analytical tractability in mind and we show that the pose tracking can be Rao-Blackwellised using the mixture Kalman filter, allowing for computational efficiency while still incorporating bio-mechanical properties of the upper body. In addition, the use of the proposed upper body model allows reliable three-dimensional pose estimates to be obtained indirectly for a number of joints that are often difficult to detect using traditional object recognition strategies. Comparisons with Kinect sensor results and the state of the art in 2D pose estimation highlight the efficacy of the proposed approach.Comment: 25 pages, Technical report, related to Burke and Lasenby, AMDO 2014 conference paper. Code sample: https://github.com/mgb45/SignerBodyPose Video: https://www.youtube.com/watch?v=dJMTSo7-uF

arXiv.org e-Print Archive

CiteSeerX

Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries

Author: Bappy Jawadul H.
Manjunath B. S.
Nataraj Lakshmanan
Roy-Chowdhury Amit K.
Simons Cody
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/03/2019
Field of study

With advanced image journaling tools, one can easily alter the semantic meaning of an image by exploiting certain manipulation techniques such as copy-clone, object splicing, and removal, which mislead the viewers. In contrast, the identification of these manipulations becomes a very challenging task as manipulated regions are not visually apparent. This paper proposes a high-confidence manipulation localization architecture which utilizes resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder network to segment out manipulated regions from non-manipulated ones. Resampling features are used to capture artifacts like JPEG quality loss, upsampling, downsampling, rotation, and shearing. The proposed network exploits larger receptive fields (spatial maps) and frequency domain correlation to analyze the discriminative characteristics between manipulated and non-manipulated regions by incorporating encoder and LSTM network. Finally, decoder network learns the mapping from low-resolution feature maps to pixel-wise predictions for image tamper localization. With predicted mask provided by final layer (softmax) of the proposed architecture, end-to-end training is performed to learn the network parameters through back-propagation using ground-truth masks. Furthermore, a large image splicing dataset is introduced to guide the training process. The proposed method is capable of localizing image manipulations at pixel level with high precision, which is demonstrated through rigorous experimentation on three diverse datasets

arXiv.org e-Print Archive

eScholarship - University of California