138 research outputs found

    Hybrid learning-based model for exaggeration style of facial caricature

    Get PDF
    Prediction of facial caricature based on exaggeration style of a particular artist is a significant task in computer generated caricature in order to produce an artistic facial caricature that is very similar to the real artist’s work without the need for skilled user (artist) input. The exaggeration style of an artist is difficult to be coded in algorithmic method. Fortunately, artificial neural network, which possesses self-learning and generalization ability, has shown great promise in addressing the problem of capturing and learning an artist’s style to predict a facial caricature. However, one of the main issues faced by this study is inconsistent artist style due to human factors and limited collection on image-caricature pair data. Thus, this study proposes facial caricature dataset preparation process to get good quality dataset which captures the artist’s exaggeration style and a hybrid model to generalize the inconsistent style so that a better, more accurate prediction can be obtained even using small amount of dataset. The proposed data preparation process involves facial features parameter extraction based on landmark-based geometric morphometric and modified data normalization method based on Procrustes superimposition method. The proposed hybrid model (BP-GANN) combines Backpropagation Neural Network (BPNN) and Genetic Algorithm Neural Network (GANN). The experimental result shows that the proposed hybrid BP-GANN model is outperform the traditional hybrid GA-BPNN model, individual BPNN model and individual GANN model. The modified Procrustes superimposition method also produces a better quality dataset than the original one

    To Affinity and Beyond: Interactive Digital Humans as a Human Computer Interface

    Get PDF
    The field of human computer interaction is increasingly exploring the use of more natural, human-like user interfaces to build intelligent agents to aid in everyday life. This is coupled with a move to people using ever more realistic avatars to represent themselves in their digital lives. As the ability to produce emotionally engaging digital human representations is only just now becoming technically possible, there is little research into how to approach such tasks. This is due to both technical complexity and operational implementation cost. This is now changing as we are at a nexus point with new approaches, faster graphics processing and enabling new technologies in machine learning and computer vision becoming available. I articulate the issues required for such digital humans to be considered successfully located on the other side of the phenomenon known as the Uncanny Valley. My results show that a complex mix of perceived and contextual aspects affect the sense making on digital humans and highlights previously undocumented effects of interactivity on the affinity. Users are willing to accept digital humans as a new form of user interface and they react to them emotionally in previously unanticipated ways. My research shows that it is possible to build an effective interactive digital human that crosses the Uncanny Valley. I directly explore what is required to build a visually realistic digital human as a primary research question and I explore if such a realistic face provides sufficient benefit to justify the challenges involved in building it. I conducted a Delphi study to inform the research approaches and then produced a complex digital human character based on these insights. This interactive and realistic digital human avatar represents a major technical undertaking involving multiple teams around the world. Finally, I explored a framework for examining the ethical implications and signpost future research areas

    Discriminative and Generative Learning with Style Information

    Get PDF
    Conventional machine learning approaches usually assume that the patterns follow the identical and independent distribution (i.i.d.). However, in many empirical cases, such condition might be violated when data are equipped with diverse and inconsistent style information. The effectiveness of those traditional predictors may be limited due to the violation of the i.i.d. assumption brought by the existence of the style inconsistency. In this thesis, we investigate how the style information can be appropriately utilized for further lifting up the performance of machine learning models. It is fulfilled by not only introducing the style information into some state-of-the-art models, some new architectures, frameworks are also designed and implemented with specific purposes to make proper use of the style information. The main work is listed as the following summaries: First, the idea of the style averaging is initially introduced by an example of an image process based sunglasses recovery algorithm to perform robust one-shot facial expression recognition task. It is named as Style Elimination Transformation (SET). By recovering the pixels corrupted by the dark colors of the sunglasses brought by the proposed algorithm, the classification performance is promoted on several state-of-the-art machine learning classifiers even in a one-shot training setting. Then the investigation of the style normalization and style neutralization is investigated with both discriminative and generative machine learning approaches respectively. In discriminative learning models with style information, the style normalization transformation (SNT) is integrated into the support vector machines (SVM) for both classification and regression, named as the field support vector classification (F-SVC) and field support vector regression (F-SVR) respectively. The SNT can be represented with the nonlinearity by mapping the sufficiently complicated style information to the high-dimensional reproducing kernel Hilbert space. The learned SNT would normalize the inconsistent style information, producing i.i.d. examples, on which the SVM will be applied. Furthermore, a self-training based transductive framework will be introduced to incorporate with the unseen styles during training. The transductive SNT (T-SNT) is learned by transferring the trained styles to the unknown ones. Besides, in generative learning with style information, the style neutralization generative adversarial classifier (SN-GAC) is investigated to incorporate with the style information when performing the classification. As a neural network based framework, the SN-GAC enables the nonlinear mapping due to the nature of the nonlinearity of the neural network transformation with the generative manner. As a generalized and novel classification framework, it is capable of synthesizing style-neutralized high-quality humanunderstandable patterns given any style-inconsistent ones. Being learned with the adversarial training strategy in the first step, the final classification performance will be further promoted by fine-tuning the classifier when those style-neutralized examples can be well generated. Finally, the reversed task of the upon-mentioned style neutralization in the SN-GAC model, namely, the generation of arbitrary-style patterns, is also investigated in this thesis. By introducing the W-Net, a deep architecture upgraded from the famous U-Net model for image-to-image translation tasks, the few-shot (even the one-shot) arbitrary-style Chinese character generation task will be fulfilled. Same as the SN-GAC model, the W-Net is also trained with the adversarial training strategy proposed by the generative adversarial network. Such W-Net architecture is capable of generating any Chinese characters with the similar style as those given a few, or even one single, stylized examples. For all the proposed algorithms, frameworks, and models mentioned above for both the prediction and generation tasks, the inconsistent style information is taken into appropriate consideration. Inconsistent sunglasses information is eliminated by an image processing based sunglasses recovery algorithm in the SET, producing style-consistent patterns. The facial expression recognition is performed based on those transformed i.i.d. examples. The SNT is integrated into the SVM model, normalizing the inconsistent style information nonlinearly with the kernelized mapping. The T-SNT further enables the field prediction on those unseen styles during training. In the SN-GAC model, the style neutralization is performed by the neural network based upgraded U-Net architecture. Trained with separated steps with the adversarial optimization strategy included, it produces the high-quality style-neutralized i.i.d. patterns. The following classification is learned to produce superior performance with no additional computation involved. The W-Net architecture enables the free manipulation of the style data generation task with only a few, or even one single, style reference(s) available. It makes the Few-shot, or even the One-shot, Chinese Character Generation with the Arbitrary-style information task to be realized. Such appealing property is hardly seen in the literature

    Perception and recognition of computer-enhanced facial attributes and abstracted prototypes

    Get PDF
    The influence of the human facial image was surveyed and the nature of its many interpretations were examined. The role of distinctiveness was considered particularly relevant as it accounted for many of the impressions of character and identity ascribed to individuals. The notion of structural differences with respect to some selective essence of normality is especially important as it allows a wide range of complex facial types to be considered and understood in an objective manner. A software tool was developed which permitted the manipulation of facial images. Quantitative distortions of digital images were examined using perceptual and recognition memory paradigms. Seven experiments investigated the role of distinctiveness in memory for faces using synthesised caricatures. The results showed that caricatures, both photographic and line-drawing, improved recognition speed and accuracy, indicating that both veridical and distinctiveness information are coded for familiar faces in long-term memory. The impact of feature metrics on perceptual estimates of facial age was examined using 'age-caricatured' images and were found to be in relative accordance with the 'intended' computed age. Further modifying the semantics permitted the differences between individual faces to be visualised in terms of facial structure and skin texture patterns. Transformations of identity between two, or more, faces established the necessary matrices which can offer an understanding of facial expression in a categorical manner and the inherent interactions. A procedural extension allowed generation of composite images in which all features are perfectly aligned. Prototypical facial types specified in this manner enabled high-level manipulations to be made of gender and attractiveness; two experiments corroborated previously speculative material and thus gave credence to the prototype model. In summary, psychological assessment of computer-manipulated facial images demonstrated the validity of the objective techniques and highlighted particular parameters which contribute to our perception and recognition of the individual and of underlying facial types

    Discriminative and Generative Learning with Style Information

    Get PDF
    Conventional machine learning approaches usually assume that the patterns follow the identical and independent distribution (i.i.d.). However, in many empirical cases, such condition might be violated when data are equipped with diverse and inconsistent style information. The effectiveness of those traditional predictors may be limited due to the violation of the i.i.d. assumption brought by the existence of the style inconsistency. In this thesis, we investigate how the style information can be appropriately utilized for further lifting up the performance of machine learning models. It is fulfilled by not only introducing the style information into some state-of-the-art models, some new architectures, frameworks are also designed and implemented with specific purposes to make proper use of the style information. The main work is listed as the following summaries: First, the idea of the style averaging is initially introduced by an example of an image process based sunglasses recovery algorithm to perform robust one-shot facial expression recognition task. It is named as Style Elimination Transformation (SET). By recovering the pixels corrupted by the dark colors of the sunglasses brought by the proposed algorithm, the classification performance is promoted on several state-of-the-art machine learning classifiers even in a one-shot training setting. Then the investigation of the style normalization and style neutralization is investigated with both discriminative and generative machine learning approaches respectively. In discriminative learning models with style information, the style normalization transformation (SNT) is integrated into the support vector machines (SVM) for both classification and regression, named as the field support vector classification (F-SVC) and field support vector regression (F-SVR) respectively. The SNT can be represented with the nonlinearity by mapping the sufficiently complicated style information to the high-dimensional reproducing kernel Hilbert space. The learned SNT would normalize the inconsistent style information, producing i.i.d. examples, on which the SVM will be applied. Furthermore, a self-training based transductive framework will be introduced to incorporate with the unseen styles during training. The transductive SNT (T-SNT) is learned by transferring the trained styles to the unknown ones. Besides, in generative learning with style information, the style neutralization generative adversarial classifier (SN-GAC) is investigated to incorporate with the style information when performing the classification. As a neural network based framework, the SN-GAC enables the nonlinear mapping due to the nature of the nonlinearity of the neural network transformation with the generative manner. As a generalized and novel classification framework, it is capable of synthesizing style-neutralized high-quality humanunderstandable patterns given any style-inconsistent ones. Being learned with the adversarial training strategy in the first step, the final classification performance will be further promoted by fine-tuning the classifier when those style-neutralized examples can be well generated. Finally, the reversed task of the upon-mentioned style neutralization in the SN-GAC model, namely, the generation of arbitrary-style patterns, is also investigated in this thesis. By introducing the W-Net, a deep architecture upgraded from the famous U-Net model for image-to-image translation tasks, the few-shot (even the one-shot) arbitrary-style Chinese character generation task will be fulfilled. Same as the SN-GAC model, the W-Net is also trained with the adversarial training strategy proposed by the generative adversarial network. Such W-Net architecture is capable of generating any Chinese characters with the similar style as those given a few, or even one single, stylized examples. For all the proposed algorithms, frameworks, and models mentioned above for both the prediction and generation tasks, the inconsistent style information is taken into appropriate consideration. Inconsistent sunglasses information is eliminated by an image processing based sunglasses recovery algorithm in the SET, producing style-consistent patterns. The facial expression recognition is performed based on those transformed i.i.d. examples. The SNT is integrated into the SVM model, normalizing the inconsistent style information nonlinearly with the kernelized mapping. The T-SNT further enables the field prediction on those unseen styles during training. In the SN-GAC model, the style neutralization is performed by the neural network based upgraded U-Net architecture. Trained with separated steps with the adversarial optimization strategy included, it produces the high-quality style-neutralized i.i.d. patterns. The following classification is learned to produce superior performance with no additional computation involved. The W-Net architecture enables the free manipulation of the style data generation task with only a few, or even one single, style reference(s) available. It makes the Few-shot, or even the One-shot, Chinese Character Generation with the Arbitrary-style information task to be realized. Such appealing property is hardly seen in the literature

    Embodied Cognitive Science of Music. Modeling Experience and Behavior in Musical Contexts

    Get PDF
    Recently, the role of corporeal interaction has gained wide recognition within cognitive musicology. This thesis reviews evidence from different directions in music research supporting the importance of body-based processes for the understanding of music-related experience and behaviour. Stressing the synthetic focus of cognitive science, cognitive science of music is discussed as a modeling approach that takes these processes into account and may theoretically be embedded within the theory of dynamic systems. In particular, arguments are presented for the use of robotic devices as tools for the investigation of processes underlying human music-related capabilities (musical robotics)

    A right hemisphere advantage for processing blurred faces

    Get PDF
    No description supplie
    corecore