333 research outputs found

    Analysis of facial ultrasonography images based on deep learning

    Get PDF
    Transfer learning using a pre-trained model with the ImageNet database is frequently used when obtaining large datasets in the medical imaging field is challenging. We tried to estimate the value of deep learning for facial US images by assessing the classification performance for facial US images through transfer learning using current representative deep learning models and analyzing the classification criteria. For this clinical study, we recruited 86 individuals from whom we acquired ultrasound images of nine facial regions. To classify these facial regions, 15 deep learning models were trained using augmented or non-augmented datasets and their performance was evaluated. The F-measure scores average of all models was about 93% regardless of augmentation in the dataset, and the best performing model was the classic model VGGs. The models regarded the contours of skin and bones, rather than muscles and blood vessels, as distinct features for distinguishing regions in the facial US images. The results of this study can be used as reference data for future deep learning research on facial US images and content development.ope

    Age Estimation using Deep Learning on 3D Facial Features

    Get PDF
    Intelligent Systems are designed to substitute the human component therefore they have a need to emulate a human's ability to quickly estimate biological traits of others, which is an integral part of social interactions. Age is one of the key characteristics used by marketing, entertainment and security tools. Existing age estimation systems can be easily fooled due to their reliance on human appearance based features, which can be easily manipulated. Over the years, while the complexity of models increased, the data fed to our systems was kept the same: a single 2D RGB image. This thesis addresses the current lack of studies made on the uses of 3D facial information ion the context of age estimation. This thesis encompasses a comprehensive study of how different 3D facial features can be used to improve current state of the art age estimation approaches using Deep Learning. Along with extensions to a baseline Convolutional Neural Network (CNN) model with a 2D image input, it is introduced a novel Multi-View CNN model which combines face descriptors from multiple perspectives within the model's architecture. Due to lack of 3D facial datasets aimed at age estimation, 2D age estimation datasets were synthetically augmented with landmark localization, 3DMM parametrization and 3D facial point cloud reconstruction. The last one was subsequently used to create a new synthetic dataset composed of renderings of each point cloud from different camera positions. A fully customizable data processing tool was introduced which supports image pre-processing, dataset splitting, image augmentation and synthetic feature extraction. Quantitative results show improvement of the 3D methods over traditional 2D although somewhat constrained by data quality

    Learning from small and imbalanced dataset of images using generative adversarial neural networks.

    Get PDF
    The performance of deep learning models is unmatched by any other approach in supervised computer vision tasks such as image classification. However, training these models requires a lot of labeled data, which are not always available. Labelling a massive dataset is largely a manual and very demanding process. Thus, this problem has led to the development of techniques that bypass the need for labelling at scale. Despite this, existing techniques such as transfer learning, data augmentation and semi-supervised learning have not lived up to expectations. Some of these techniques do not account for other classification challenges, such as a class-imbalance problem. Thus, these techniques mostly underperform when compared with fully supervised approaches. In this thesis, we propose new methods to train a deep model on image classification with a limited number of labeled examples. This was achieved by extending state-of-the-art generative adversarial networks with multiple fake classes and network switchers. These new features enabled us to train a classifier using large unlabeled data, while generating class specific samples. The proposed model is label agnostic and is suitable for different classification scenarios, ranging from weakly supervised to fully supervised settings. This was used to address classification challenges with limited labeled data and a class-imbalance problem. Extensive experiments were carried out on different benchmark datasets. Firstly, the proposed approach was used to train a classification model and our findings indicated that the proposed approach achieved better classification accuracies, especially when the number of labeled samples is small. Secondly, the proposed approach was able to generate high-quality samples from class-imbalance datasets. The samples' quality is evident in improved classification performances when generated samples were used in neutralising class-imbalance. The results are thoroughly analyzed and, overall, our method showed superior performances over popular resampling technique and the AC-GAN model. Finally, we successfully applied the proposed approach as a new augmentation technique to two challenging real-world problems: face with attributes and legacy engineering drawings. The results obtained demonstrate that the proposed approach is effective even in extreme cases

    Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature Review

    Full text link
    The task of multimedia geolocation is becoming an increasingly essential component of the digital forensics toolkit to effectively combat human trafficking, child sexual exploitation, and other illegal acts. Typically, metadata-based geolocation information is stripped when multimedia content is shared via instant messaging and social media. The intricacy of geolocating, geotagging, or finding geographical clues in this content is often overly burdensome for investigators. Recent research has shown that contemporary advancements in artificial intelligence, specifically computer vision and deep learning, show significant promise towards expediting the multimedia geolocation task. This systematic literature review thoroughly examines the state-of-the-art leveraging computer vision techniques for multimedia geolocation and assesses their potential to expedite human trafficking investigation. This includes a comprehensive overview of the application of computer vision-based approaches to multimedia geolocation, identifies their applicability in combating human trafficking, and highlights the potential implications of enhanced multimedia geolocation for prosecuting human trafficking. 123 articles inform this systematic literature review. The findings suggest numerous potential paths for future impactful research on the subject

    Wavelet-based Multi-level GANs for Facial Attributes Editing

    Get PDF
    Recently, both face aging and expression translation have received increasing attention from the computer vision community due to their wide applications in the real world. For face aging, age accuracy and identity preserving are two important indicators. Previous works usually rely on an extra pre-trained module for identity preserving and multi-level discriminators for fine-grained features extraction. In this work, we propose a cycle-consistent loss based method for face aging with wavelet-based multi-level facial attributes extraction from both generator and discriminators. The proposed model consists of one generator with three-level encoders and three levels of discriminators with an age and a gender classifier on top of each discriminator. Experiment results on both MORPH and CACD show that the application of multi-level generator can improve the identity preserving effects in face aging and reduce the training time significantly by eliminating the rely of an identity preserving module. Our model can outperform most of the existing approaches including the state-of-the-art techniques on two benchmark aging databases in terms of both aging accuracy and identity verification confidence, demonstrating the effectiveness and superiority of our method. In real world, expression synthesis is hard due to the non-linear properties of facial skin and muscle caused by different expressions. A recent study showed that the practice of using the same generator for both forward prediction and backward reconstruction as in current conditional GANs would force the generator to leave a potential "noise" in the generated images, therefore hindering the use of the images for further tasks. To eliminate the interference and break the unwanted link between the first and second translation, we design a parallel training mechanism with two generators that perform the same first translation but work as a reconstruction model for each other. Additionally, inspired by the successful application of wavelet-based multi-level Generative Adversarial Networks(GANs) in face aging and progressive training in geometric conversion, we further design a novel wavelet-based multi-level Generative Adversarial Network (WP2-GAN) for expression translation with a large gap based on a progressive and parallel training strategy. Extensive experiments show the effectiveness of our approach for expression translation compared with the state-of-the-art models by synthesizing photo-realistic images with high fidelity and vivid expression effect
    corecore