51 research outputs found

    Resolution-Aware 3D Morphable Model

    Get PDF
    The 3D Morphable Model (3DMM) is currently receiving considerable attention for human face analysis. Most existing work focuses on fitting a 3DMM to high resolution images. However, in many applications, fitting a 3DMM to low-resolution images is also important. In this paper, we propose a Resolution-Aware 3DMM (RA- 3DMM), which consists of 3 different resolution 3DMMs: High-Resolution 3DMM (HR- 3DMM), Medium-Resolution 3DMM (MR-3DMM) and Low-Resolution 3DMM (LR-3DMM). RA-3DMM can automatically select the best model to fit the input images of different resolutions. The multi-resolution model was evaluated in experiments conducted on PIE and XM2VTS databases. The experimental results verified that HR- 3DMM achieves the best performance for input image of high resolution, and MR- 3DMM and LR-3DMM worked best for medium and low resolution input images, respectively. A model selection strategy incorporated in the RA-3DMM is proposed based on these results. The RA-3DMM model has been applied to pose correction of face images ranging from high to low resolution. The face verification results obtained with the pose-corrected images show considerable performance improvement over the result without pose correction in all resolution

    Resolutionaware 3D morphable model

    Get PDF
    Abstract The 3D Morphable Model (3DMM) is currently receiving considerable attention for human face analysis. Most existing work focuses on fitting a 3DMM to high resolution images. However, in many applications, fitting a 3DMM to low-resolution images is also important. In this paper, we propose a Resolution-Aware 3DMM (RA-3DMM), which consists of 3 different resolution 3DMMs: High-Resolution 3DMM (HR-3DMM), Medium-Resolution 3DMM (MR-3DMM) and Low-Resolution 3DMM (LR-3DMM). RA-3DMM can automatically select the best model to fit the input images of different resolutions. The multi-resolution model was evaluated in experiments conducted on PIE and XM2VTS databases. The experimental results verified that HR-3DMM achieves the best performance for input image of high resolution, and MR-3DMM and LR-3DMM worked best for medium and low resolution input images, respectively. A model selection strategy incorporated in the RA-3DMM is proposed based on these results. The RA-3DMM model has been applied to pose correction of face images ranging from high to low resolution. The face verification results obtained with the pose-corrected images show considerable performance improvement over the result without pose correction in all resolutions

    Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation

    Get PDF
    This paper evaluates the performance of face and speaker verification techniques in the context of a mobile environment. The mobile environment was chosen as it provides a realistic and challenging test-bed for biometric person verification techniques to operate. For instance the audio environment is quite noisy and there is limited control over the illumination conditions and the pose of the subject for the video. To conduct this evaluation, a part of a database captured during the ``Mobile Biometry'' (MOBIO) European Project was used. In total there were nine participants to the evaluation who submitted a face verification system and five participants who submitted speaker verification systems. The nine face verification systems all varied significantly in terms of both verification algorithms and face detection algorithms. Several systems used the OpenCV face detector while the better systems used proprietary software for the task of face detection. This ended up making the evaluation of verification algorithms challenging. The five speaker verification systems were based on one of two paradigms: a Gaussian Mixture Model (GMM) or Support Vector Machine (SVM) paradigm. In general the systems based on the SVM paradigm performed better than those based on the GMM paradigm

    Enhancing person annotation for personal photo management using content and context based technologies

    Get PDF
    Rapid technological growth and the decreasing cost of photo capture means that we are all taking more digital photographs than ever before. However, lack of technology for automatically organising personal photo archives has resulted in many users left with poorly annotated photos, causing them great frustration when such photo collections are to be browsed or searched at a later time. As a result, there has recently been significant research interest in technologies for supporting effective annotation. This thesis addresses an important sub-problem of the broad annotation problem, namely "person annotation" associated with personal digital photo management. Solutions to this problem are provided using content analysis tools in combination with context data within the experimental photo management framework, called “MediAssist”. Readily available image metadata, such as location and date/time, are captured from digital cameras with in-built GPS functionality, and thus provide knowledge about when and where the photos were taken. Such information is then used to identify the "real-world" events corresponding to certain activities in the photo capture process. The problem of enabling effective person annotation is formulated in such a way that both "within-event" and "cross-event" relationships of persons' appearances are captured. The research reported in the thesis is built upon a firm foundation of content-based analysis technologies, namely face detection, face recognition, and body-patch matching together with data fusion. Two annotation models are investigated in this thesis, namely progressive and non-progressive. The effectiveness of each model is evaluated against varying proportions of initial annotation, and the type of initial annotation based on individual and combined face, body-patch and person-context information sources. The results reported in the thesis strongly validate the use of multiple information sources for person annotation whilst emphasising the advantage of event-based photo analysis in real-life photo management systems

    Face Recognition from Face Signatures

    No full text
    This thesis presents techniques for detecting and recognizing faces under various imaging conditions. In particular, it presents a system that combines several methods for face detection and recognition. Initially, the faces in the images are located using the Viola-Jones method and each detected face is represented by a subimage. Then, an eye and mouth detection method is used to identify the coordinates of the eyes and mouth, which are then used to update the subimages so that the subimages contain only the face area. After that, a method based on Bayesian estimation and a fuzzy membership function is used to identify the actual faces on both subimages (obtained from the first and second steps). Then, a face similarity measure is used to locate the oval shape of a face in both subimages. The similarity measures between the two faces are compared and the one with the highest value is selected. In the recognition task, the Trace transform method is used to extract the face signatures from the oval shape face. These signatures are evaluated using the BANCA and FERET databases in authentication tasks. Here, the signatures with discriminating ability are selected and were used to construct a classifier. However, the classifier was shown to be a weak classifier. This problem is tackled by constructing a boosted assembly of classifiers developed by a Gentle Adaboost algorithm. The proposed methodologies are evaluated using a family album database

    Image texture analysis for inferential sensing in the process industries

    Get PDF
    Thesis (MScEng)-- Stellenbosch University, 2013.ENGLISH ABSTRACT: The measurement of key process quality variables is important for the efficient and economical operation of many chemical and mineral processing systems, as these variables can be used in process monitoring and control systems to identify and maintain optimal process conditions. However, in many engineering processes the key quality variables cannot be measured directly with standard sensors. Inferential sensing is the real-time prediction of such variables from other, measurable process variables through some form of model. In vision-based inferential sensing, visual process data in the form of images or video frames are used as input variables to the inferential sensor. This is a suitable approach when the desired process quality variable is correlated with the visual appearance of the process. The inferential sensor model is then based on analysis of the image data. Texture feature extraction is an image analysis approach by which the texture or spatial organisation of pixels in an image can be described. Two texture feature extraction methods, namely the use of grey-level co-occurrence matrices (GLCMs) and wavelet analysis, have predominated in applications of texture analysis to engineering processes. While these two baseline methods are still widely considered to be the best available texture analysis methods, several newer and more advanced methods have since been developed, which have properties that should theoretically provide these methods with some advantages over the baseline methods. Specifically, three advanced texture analysis methods have received much attention in recent machine vision literature, but have not yet been applied extensively to process engineering applications: steerable pyramids, textons and local binary patterns (LBPs). The purpose of this study was to compare the use of advanced image texture analysis methods to baseline texture analysis methods for the prediction of key process quality variables in specific process engineering applications. Three case studies, in which texture is thought to play an important role, were considered: (i) the prediction of platinum grade classes from images of platinum flotation froths, (ii) the prediction of fines fraction classes from images of coal particles on a conveyor belt, and (iii) the prediction of mean particle size classes from images of hydrocyclone underflows. Each of the five texture feature sets were used as inputs to two different classifiers (K-nearest neighbours and discriminant analysis) to predict the output variable classes for each of the three case studies mentioned above. The quality of the features extracted with each method was assessed in a structured manner, based their classification performances after the optimisation of the hyperparameters associated with each method. In the platinum froth flotation case study, steerable pyramids and LBPs significantly outperformed the GLCM, wavelet and texton methods. In the case study of coal fines fractions, the GLCM method was significantly outperformed by all four other methods. Finally, in the hydrocyclone underflow case study, steerable pyramids and LBPs significantly outperformed GLCM and wavelet methods, while the result for textons was inconclusive. Considering all of these results together, the overall conclusion was drawn that two of the three advanced texture feature extraction methods, namely steerable pyramids and LBPs, can extract feature sets of superior quality, when compared to the baseline GLCM and wavelet methods in these three case studies. The application of steerable pyramids and LBPs to further image analysis data sets is therefore recommended as a viable alternative to the traditional GLCM and wavelet texture analysis methods.AFRIKAANSE OPSOMMING: Die meting van sleutelproseskwaliteitsveranderlikes is belangrik vir die doeltreffende en ekono-miese werking van baie chemiese– en mineraalprosesseringsisteme, aangesien hierdie verander-likes gebruik kan word in prosesmonitering– en beheerstelsels om die optimale prosestoestande te identifiseer en te handhaaf. In baie ingenieursprosesse kan die sleutelproseskwaliteits-veranderlikes egter nie direk met standaard sensors gemeet word nie. Inferensiële waarneming is die intydse voorspelling van sulke veranderlikes vanaf ander, meetbare prosesveranderlikes deur van ‘n model gebruik te maak. In beeldgebaseerde inferensiële waarneming word visuele prosesdata, in die vorm van beelde of videogrepe, gebruik as insetveranderlikes vir die inferensiële sensor. Hierdie is ‘n gepaste benadering wanneer die verlangde proseskwaliteitsveranderlike met die visuele voorkoms van die proses gekorreleer is. Die inferensiële sensormodel word dan gebaseer op die analise van die beelddata. Tekstuurkenmerkekstraksie is ‘n beeldanalisebenadering waarmee die tekstuur of ruimtelike organisering van die beeldelemente beskryf kan word. Twee tekstuurkenmerkekstraksiemetodes, naamlik die gebruik van grysskaalmede-aanwesigheidsmatrikse (GSMMs) en golfie-analise, is sterk verteenwoordig in ingenieursprosestoepassings van tekstuuranalise. Alhoewel hierdie twee grondlynmetodes steeds algemeen as die beste beskikbare tekstuuranalisemetodes beskou word, is daar sedertdien verskeie nuwer en meer gevorderde metodes ontwikkel, wat beskik oor eienskappe wat teoreties voordele vir hierdie metodes teenoor die grondlynmetodes behoort te verskaf. Meer spesifiek is daar drie gevorderde tekstuuranalisemetodes wat baie aandag in onlangse masjienvisieliteratuur geniet het, maar wat nog nie baie op ingenieursprosesse toegepas is nie: stuurbare piramiedes, tekstons en lokale binêre patrone (LBPs). Die doel van hierdie studie was om die gebruik van gevorderde tekstuuranalisemetodes te vergelyk met grondlyntekstuuranaliesemetodes vir die voorspelling van sleutelproseskwaliteits-veranderlikes in spesifieke prosesingenieurstoepassings. Drie gevallestudies, waarin tekstuur ‘n belangrike rol behoort te speel, is ondersoek: (i) die voorspelling van platinumgraadklasse vanaf beelde van platinumflottasieskuime, (ii) die voorspelling van fynfraksieklasse vanaf beelde van steenkoolpartikels op ‘n vervoerband, en (iii) die voorspelling van gemiddelde partikelgrootteklasse vanaf beelde van hidrosikloon ondervloeie. Elk van die vyf tekstuurkenmerkstelle is as insette vir twee verskillende klassifiseerders (K-naaste bure en diskriminantanalise) gebruik om die klasse van die uitsetveranderlikes te voorspeel, vir elk van die drie gevallestudies hierbo genoem. Die kwaliteit van die kenmerke wat deur elke metode ge-ekstraheer is, is op ‘n gestruktureerde manier bepaal, gebaseer op hul klassifikasieprestasie na die optimering van die hiperparameters wat verbonde is aan elke metode. In die platinumskuimflottasiegevallestudie het stuurbare piramiedes en LBPs betekenisvol beter as die GSMM–, golfie– en tekstonmetodes presteer. In die steenkoolfynfraksiegevallestudie het die GSMM-metode betekenisvol slegter as al vier ander metodes presteer. Laastens, in die hidrosikloon ondervloeigevallestudie het stuurbare piramiedes en LBPs betekenisvol beter as die GSMM– en golfiemetodes presteer, terwyl die resultaat vir tekstons nie beslissend was nie. Deur al hierdie resultate gesamentlik te beskou, is die oorkoepelende gevolgtrekking gemaak dat twee van die drie gevorderde tekstuurkenmerkekstraksiemetodes, naamlik stuurbare piramiedes en LBPs, hoër kwaliteit kenmerkstelle kan ekstraheer in vergelyking met die GSMM– en golfiemetodes, vir hierdie drie gevallestudies. Die toepassing van stuurbare piramiedes en LBPs op verdere beeldanalise-datastelle word dus aanbeveel as ‘n lewensvatbare alternatief tot die tradisionele GSMM– en golfietekstuuranalisemetodes

    TEXTURAL CLASSIFICATION OF MULTIPLE SCLEROSISLESIONS IN MULTIMODAL MRI VOLUMES

    Get PDF
    Background and objectives:Multiple Sclerosis is a common relapsing demyelinating diseasecausing the significant degradation of cognitive and motor skills and contributes towards areduced life expectancy of 5 to 10 years. The identification of Multiple Sclerosis Lesionsat early stages of a patient’s life can play a significant role in the diagnosis, treatment andprognosis for that individual. In recent years the process of disease detection has been aidedthrough the implementation of radiomic pipelines for texture extraction and classificationutilising Computer Vision and Machine Learning techniques. Eight Multiple Sclerosis Patient datasets have been supplied, each containing one standardclinical T2 MRI sequence and four diffusion-weighted sequences (T2, FA, ADC, AD, RD).This work proposes a Multimodal Multiple Sclerosis Lesion segmentation methodology util-ising supervised texture analysis, feature selection and classification. Three Machine Learningmodels were applied to Multimodal MRI data and tested using unseen patient datasets to eval-uate the classification performance of various extracted features, feature selection algorithmsand classifiers to MRI volumes uncommonly applied to MS Lesion detection. Method: First Order Statistics, Haralick Texture Features, Gray-Level Run-Lengths, His-togram of Oriented Gradients and Local Binary Patterns were extracted from MRI volumeswhich were minimally pre-processed using a skull stripping and background removal algorithm.mRMR and LASSO feature selection algorithms were applied to identify a subset of rankingsfor use in Machine Learning using Support Vector Machine, Random Forests and ExtremeLearning Machine classification. Results: ELM achieved a top slice classification accuracy of 85% while SVM achieved 79%and RF 78%. It was found that combining information from all MRI sequences increased theclassification performance when analysing unseen T2 scans in almost all cases. LASSO andmRMR feature selection methods failed to increase accuracy, and the highest-scoring groupof features were Haralick Texture Features, derived from Grey-Level Co-occurrence matrices

    Re-identifying people in the crowd

    Get PDF
    Developing an automated surveillance system is of great interest for various reasons including forensic and security applications. In the case of a network of surveillance cameras with non-overlapping fields of view, person detection and tracking alone are insufficient to track a subject of interest across the network. In this case, instances of a person captured in one camera view need to be retrieved among a gallery of different people, in other camera views. This vision problem is commonly known as person re-identification (re-id). Cross-view instances of pedestrians exhibit varied levels of illumination, viewpoint, and pose variations which makes the problem very challenging. Despite recent progress towards improving accuracy, existing systems suffer from low applicability to real-world scenarios. This is mainly caused by the need for large amounts of annotated data from pairwise camera views to be available for training. Given the difficulty of obtaining such data and annotating it, this thesis aims to bring the person re-id problem a step closer to real-world deployment. In the first contribution, the single-shot protocol, where each individual is represented by a pair of images that need to be matched, is considered. Following the extensive annotation of four datasets for six attributes, an evaluation of the most widely used feature extraction schemes is conducted. The results reveal two high-performing descriptors among those evaluated, and show illumination variation to have the most impact on re-id accuracy. Motivated by the wide availability of videos from surveillance cameras and the additional visual and temporal information they provide, video-based person re-id is then investigated, and a su-pervised system is developed. This is achieved by improving and extending the best performing image-based person descriptor into three dimensions and combining it with distance metric learn-ing. The system obtained achieves state-of-the-art results on two widely used datasets. Given the cost and difficulty of obtaining labelled data from pairwise cameras in a network to train the model, an unsupervised video-based person re-id method is also developed. It is based on a set-based distance measure that leverages rank vectors to estimate the similarity scores between person tracklets. The proposed system outperforms other unsupervised methods by a large margin on two datasets while competing with deep learning methods on another large-scale dataset
    corecore