2,213 research outputs found

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Convolutional Neural Network for Face Recognition with Pose and Illumination Variation

    Get PDF
    Face recognition remains a challenging problem till today. The main challenge is how to improve the recognition performance when affected by the variability of non-linear effects that include illumination variances, poses, facial expressions, occlusions, etc. In this paper, a robust 4-layer Convolutional Neural Network (CNN) architecture is proposed for the face recognition problem, with a solution that is capable of handling facial images that contain occlusions, poses, facial expressions and varying illumination. Experimental results show that the proposed CNN solution outperforms existing works, achieving 99.5% recognition accuracy on AR database. The test on the 35-subjects of FERET database achieves an accuracy of 85.13%, which is in the similar range of performance as the best result of previous works. More significantly, our proposed system completes the facial recognition process in less than 0.01 seconds

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

    Geometric Expression Invariant 3D Face Recognition using Statistical Discriminant Models

    No full text
    Currently there is no complete face recognition system that is invariant to all facial expressions. Although humans find it easy to identify and recognise faces regardless of changes in illumination, pose and expression, producing a computer system with a similar capability has proved to be particularly di cult. Three dimensional face models are geometric in nature and therefore have the advantage of being invariant to head pose and lighting. However they are still susceptible to facial expressions. This can be seen in the decrease in the recognition results using principal component analysis when expressions are added to a data set. In order to achieve expression-invariant face recognition systems, we have employed a tensor algebra framework to represent 3D face data with facial expressions in a parsimonious space. Face variation factors are organised in particular subject and facial expression modes. We manipulate this using single value decomposition on sub-tensors representing one variation mode. This framework possesses the ability to deal with the shortcomings of PCA in less constrained environments and still preserves the integrity of the 3D data. The results show improved recognition rates for faces and facial expressions, even recognising high intensity expressions that are not in the training datasets. We have determined, experimentally, a set of anatomical landmarks that best describe facial expression e ectively. We found that the best placement of landmarks to distinguish di erent facial expressions are in areas around the prominent features, such as the cheeks and eyebrows. Recognition results using landmark-based face recognition could be improved with better placement. We looked into the possibility of achieving expression-invariant face recognition by reconstructing and manipulating realistic facial expressions. We proposed a tensor-based statistical discriminant analysis method to reconstruct facial expressions and in particular to neutralise facial expressions. The results of the synthesised facial expressions are visually more realistic than facial expressions generated using conventional active shape modelling (ASM). We then used reconstructed neutral faces in the sub-tensor framework for recognition purposes. The recognition results showed slight improvement. Besides biometric recognition, this novel tensor-based synthesis approach could be used in computer games and real-time animation applications

    Hybrid component-based face recognition.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Pietermaritzburg.Facial recognition (FR) is the trusted biometric method for authentication. Compared to other biometrics such as signature; which can be compromised, facial recognition is non-intrusive and it can be apprehended at a distance in a concealed manner. It has a significant role in conveying the identity of a person in social interaction and its performance largely depends on a variety of factors such as illumination, facial pose, expression, age span, hair, facial wear, and motion. In the light of these considerations this dissertation proposes a hybrid component-based approach that seeks to utilise any successfully detected components. This research proposes a facial recognition technique to recognize faces at component level. It employs the texture descriptors Grey-Level Co-occurrence (GLCM), Gabor Filters, Speeded-Up Robust Features (SURF) and Scale Invariant Feature Transforms (SIFT), and the shape descriptor Zernike Moments. The advantage of using the texture attributes is their simplicity. However, they cannot completely characterise the whole face recognition, hence the Zernike Moments descriptor was used to compute the shape properties of the selected facial components. These descriptors are effective facial components feature representations and are robust to illumination and pose changes. Experiments were performed on four different state of the art facial databases, the FERET, FEI, SCface and CMU and Error-Correcting Output Code (ECOC) was used for classification. The results show that component-based facial recognition is more effective than whole face and the proposed methods achieve 98.75% of recognition accuracy rate. This approach performs well compared to other componentbased facial recognition approaches

    Multimodaalsel emotsioonide tuvastamisel põhineva inimese-roboti suhtluse arendamine

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneÜks afektiivse arvutiteaduse peamistest huviobjektidest on mitmemodaalne emotsioonituvastus, mis leiab rakendust peamiselt inimese-arvuti interaktsioonis. Emotsiooni äratundmiseks uuritakse nendes süsteemides nii inimese näoilmeid kui kakõnet. Käesolevas töös uuritakse inimese emotsioonide ja nende avaldumise visuaalseid ja akustilisi tunnuseid, et töötada välja automaatne multimodaalne emotsioonituvastussüsteem. Kõnest arvutatakse mel-sageduse kepstri kordajad, helisignaali erinevate komponentide energiad ja prosoodilised näitajad. Näoilmeteanalüüsimiseks kasutatakse kahte erinevat strateegiat. Esiteks arvutatakse inimesenäo tähtsamate punktide vahelised erinevad geomeetrilised suhted. Teiseks võetakse emotsionaalse sisuga video kokku vähendatud hulgaks põhikaadriteks, misantakse sisendiks konvolutsioonilisele tehisnärvivõrgule emotsioonide visuaalsekseristamiseks. Kolme klassifitseerija väljunditest (1 akustiline, 2 visuaalset) koostatakse uus kogum tunnuseid, mida kasutatakse õppimiseks süsteemi viimasesetapis. Loodud süsteemi katsetati SAVEE, Poola ja Serbia emotsionaalse kõneandmebaaside, eNTERFACE’05 ja RML andmebaaside peal. Saadud tulemusednäitavad, et võrreldes olemasolevatega võimaldab käesoleva töö raames loodudsüsteem suuremat täpsust emotsioonide äratundmisel. Lisaks anname käesolevastöös ülevaate kirjanduses väljapakutud süsteemidest, millel on võimekus tunda äraemotsiooniga seotud ̆zeste. Selle ülevaate eesmärgiks on hõlbustada uute uurimissuundade leidmist, mis aitaksid lisada töö raames loodud süsteemile ̆zestipõhiseemotsioonituvastuse võimekuse, et veelgi enam tõsta süsteemi emotsioonide äratundmise täpsust.Automatic multimodal emotion recognition is a fundamental subject of interest in affective computing. Its main applications are in human-computer interaction. The systems developed for the foregoing purpose consider combinations of different modalities, based on vocal and visual cues. This thesis takes the foregoing modalities into account, in order to develop an automatic multimodal emotion recognition system. More specifically, it takes advantage of the information extracted from speech and face signals. From speech signals, Mel-frequency cepstral coefficients, filter-bank energies and prosodic features are extracted. Moreover, two different strategies are considered for analyzing the facial data. First, facial landmarks' geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames. Then they are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to the key-frames summarizing the videos. Afterward, the output confidence values of all the classifiers from both of the modalities are used to define a new feature space. Lastly, the latter values are learned for the final emotion label prediction, in a late fusion. The experiments are conducted on the SAVEE, Polish, Serbian, eNTERFACE'05 and RML datasets. The results show significant performance improvements by the proposed system in comparison to the existing alternatives, defining the current state-of-the-art on all the datasets. Additionally, we provide a review of emotional body gesture recognition systems proposed in the literature. The aim of the foregoing part is to help figure out possible future research directions for enhancing the performance of the proposed system. More clearly, we imply that incorporating data representing gestures, which constitute another major component of the visual modality, can result in a more efficient framework
    corecore