240 research outputs found

    Example Based Caricature Synthesis

    Get PDF
    The likeness of a caricature to the original face image is an essential and often overlooked part of caricature production. In this paper we present an example based caricature synthesis technique, consisting of shape exaggeration, relationship exaggeration, and optimization for likeness. Rather than relying on a large training set of caricature face pairs, our shape exaggeration step is based on only one or a small number of examples of facial features. The relationship exaggeration step introduces two definitions which facilitate global facial feature synthesis. The first is the T-Shape rule, which describes the relative relationship between the facial elements in an intuitive manner. The second is the so called proportions, which characterizes the facial features in a proportion form. Finally we introduce a similarity metric as the likeness metric based on the Modified Hausdorff Distance (MHD) which allows us to optimize the configuration of facial elements, maximizing likeness while satisfying a number of constraints. The effectiveness of our algorithm is demonstrated with experimental results

    CASA 2009:International Conference on Computer Animation and Social Agents

    Get PDF

    RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

    Full text link
    Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generative Adversial Network (GAN), followed by the autoregressive Transformer. Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps. As an effect of the impressive results of diffusion models on image synthesis, it has been cemented as the major image decoder used by text-to-image models and brought text-to-image generation to the forefront of machine-learning (ML) research. In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models, resulting the generation result nearly indistinguishable from real-world images, revolutionizing the way we retrieval images. Our explorative study has incentivised us to think that there are further ways of scaling text-to-image models with the combination of innovative model architectures and prediction enhancement techniques. We have divided the work of this survey into five main sections wherein we detail the frameworks of major literature in order to delve into the different types of text-to-image generation methods. Following this we provide a detailed comparison and critique of these methods and offer possible pathways of improvement for future work. In the future work, we argue that TTI development could yield impressive productivity improvements for creation, particularly in the context of the AIGC era, and could be extended to more complex tasks such as video generation and 3D generation

    Statistical modelling for facial expression dynamics

    Get PDF
    PhDOne of the most powerful and fastest means of relaying emotions between humans are facial expressions. The ability to capture, understand and mimic those emotions and their underlying dynamics in the synthetic counterpart is a challenging task because of the complexity of human emotions, different ways of conveying them, non-linearities caused by facial feature and head motion, and the ever critical eye of the viewer. This thesis sets out to address some of the limitations of existing techniques by investigating three components of expression modelling and parameterisation framework: (1) Feature and expression manifold representation, (2) Pose estimation, and (3) Expression dynamics modelling and their parameterisation for the purpose of driving a synthetic head avatar. First, we introduce a hierarchical representation based on the Point Distribution Model (PDM). Holistic representations imply that non-linearities caused by the motion of facial features, and intrafeature correlations are implicitly embedded and hence have to be accounted for in the resulting expression space. Also such representations require large training datasets to account for all possible variations. To address those shortcomings, and to provide a basis for learning more subtle, localised variations, our representation consists of tree-like structure where a holistic root component is decomposed into leaves containing the jaw outline, each of the eye and eyebrows and the mouth. Each of the hierarchical components is modelled according to its intrinsic functionality, rather than the final, holistic expression label. Secondly, we introduce a statistical approach for capturing an underlying low-dimension expression manifold by utilising components of the previously defined hierarchical representation. As Principal Component Analysis (PCA) based approaches cannot reliably capture variations caused by large facial feature changes because of its linear nature, the underlying dynamics manifold for each of the hierarchical components is modelled using a Hierarchical Latent Variable Model (HLVM) approach. Whilst retaining PCA properties, such a model introduces a probability density model which can deal with missing or incomplete data and allows discovery of internal within cluster structures. All of the model parameters and underlying density model are automatically estimated during the training stage. We investigate the usefulness of such a model to larger and unseen datasets. Thirdly, we extend the concept of HLVM model to pose estimation to address the non-linear shape deformations and definition of the plausible pose space caused by large head motion. Since our head rarely stays still, and its movements are intrinsically connected with the way we perceive and understand the expressions, pose information is an integral part of their dynamics. The proposed 3 approach integrates into our existing hierarchical representation model. It is learned using sparse and discreetly sampled training dataset, and generalises to a larger and continuous view-sphere. Finally, we introduce a framework that models and extracts expression dynamics. In existing frameworks, explicit definition of expression intensity and pose information, is often overlooked, although usually implicitly embedded in the underlying representation. We investigate modelling of the expression dynamics based on use of static information only, and focus on its sufficiency for the task at hand. We compare a rule-based method that utilises the existing latent structure and provides a fusion of different components with holistic and Bayesian Network (BN) approaches. An Active Appearance Model (AAM) based tracker is used to extract relevant information from input sequences. Such information is subsequently used to define the parametric structure of the underlying expression dynamics. We demonstrate that such information can be utilised to animate a synthetic head avatar. Submitte

    Computational approaches to Explainable Artificial Intelligence:Advances in theory, applications and trends

    Get PDF
    Deep Learning (DL), a groundbreaking branch of Machine Learning (ML), has emerged as a driving force in both theoretical and applied Artificial Intelligence (AI). DL algorithms, rooted in complex and non-linear artificial neural systems, excel at extracting high-level features from data. DL has demonstrated human-level performance in real-world tasks, including clinical diagnostics, and has unlocked solutions to previously intractable problems in virtual agent design, robotics, genomics, neuroimaging, computer vision, and industrial automation. In this paper, the most relevant advances from the last few years in Artificial Intelligence (AI) and several applications to neuroscience, neuroimaging, computer vision, and robotics are presented, reviewed and discussed. In this way, we summarize the state-of-the-art in AI methods, models and applications within a collection of works presented at the 9th International Conference on the Interplay between Natural and Artificial Computation (IWINAC). The works presented in this paper are excellent examples of new scientific discoveries made in laboratories that have successfully transitioned to real-life applications.</p

    Computational approaches to Explainable Artificial Intelligence: Advances in theory, applications and trends

    Get PDF
    Financiado para publicaciĂłn en acceso aberto: Universidad de Granada / CBUA.[Abstract]: Deep Learning (DL), a groundbreaking branch of Machine Learning (ML), has emerged as a driving force in both theoretical and applied Artificial Intelligence (AI). DL algorithms, rooted in complex and non-linear artificial neural systems, excel at extracting high-level features from data. DL has demonstrated human-level performance in real-world tasks, including clinical diagnostics, and has unlocked solutions to previously intractable problems in virtual agent design, robotics, genomics, neuroimaging, computer vision, and industrial automation. In this paper, the most relevant advances from the last few years in Artificial Intelligence (AI) and several applications to neuroscience, neuroimaging, computer vision, and robotics are presented, reviewed and discussed. In this way, we summarize the state-of-the-art in AI methods, models and applications within a collection of works presented at the 9th International Conference on the Interplay between Natural and Artificial Computation (IWINAC). The works presented in this paper are excellent examples of new scientific discoveries made in laboratories that have successfully transitioned to real-life applications.Funding for open access charge: Universidad de Granada / CBUA. The work reported here has been partially funded by many public and private bodies: by the MCIN/AEI/10.13039/501100011033/ and FEDER “Una manera de hacer Europa” under the RTI2018-098913-B100 project, by the Consejeria de Economia, Innovacion, Ciencia y Empleo (Junta de Andalucia) and FEDER under CV20-45250, A-TIC-080-UGR18, B-TIC-586-UGR20 and P20-00525 projects, and by the Ministerio de Universidades under the FPU18/04902 grant given to C. Jimenez-Mesa, the Margarita-Salas grant to J.E. Arco, and the Juan de la Cierva grant to D. Castillo-Barnes. This work was supported by projects PGC2018-098813-B-C32 & RTI2018-098913-B100 (Spanish “Ministerio de Ciencia, InnovacĂłn y Universidades”), P18-RT-1624, UMA20-FEDERJA-086, CV20-45250, A-TIC-080-UGR18 and P20 00525 (ConsejerĂ­a de econnomĂ­a y conocimiento, Junta de AndalucĂ­a) and by European Regional Development Funds (ERDF). M.A. Formoso work was supported by Grant PRE2019-087350 funded by MCIN/AEI/10.13039/501100011033 by “ESF Investing in your future”. Work of J.E. Arco was supported by Ministerio de Universidades, Gobierno de España through grant “Margarita Salas”. The work reported here has been partially funded by Grant PID2020-115220RB-C22 funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR”. The work of Paulo Novais is financed by National Funds through the Portuguese funding agency, FCT - Fundaça̋o para a CiĂȘncia e a Tecnologia within project DSAIPA/AI/0099/2019. Ramiro Varela was supported by the Spanish State Agency for Research (AEI) grant PID2019-106263RB-I00. JosĂ© Santos was supported by the Xunta de Galicia and the European Union (European Regional Development Fund - Galicia 2014–2020 Program), with grants CITIC (ED431G 2019/01), GPC ED431B 2022/33, and by the Spanish Ministry of Science and Innovation (project PID2020-116201GB-I00). The work reported here has been partially funded by Project Fondecyt 1201572 (ANID). The work reported here has been partially funded by Project Fondecyt 1201572 (ANID). In [247], the project has received funding by grant RTI2018-098969-B-100 from the Spanish Ministerio de Ciencia InnovaciĂłn y Universidades and by grant PROMETEO/2019/119 from the Generalitat Valenciana (Spain). In [248], the research work has been partially supported by the National Science Fund of Bulgaria (scientific project “Digital Accessibility for People with Special Needs: Methodology, Conceptual Models and Innovative Ecosystems”), Grant Number KP-06-N42/4, 08.12.2020; EC for project CybSPEED, 777720, H2020-MSCA-RISE-2017 and OP Science and Education for Smart Growth (2014–2020) for project Competence Center “Intelligent mechatronic, eco- and energy saving sytems and technologies”BG05M2OP001-1.002-0023. The work reported here has been partially funded by the support of MICIN project PID2020-116346GB-I00. The work reported here has been partially funded by many public and private bodies: by MCIN/AEI/10.13039/501100011033 and “ERDF A way to make Europe” under the PID2020-115220RB-C21 and EQC2019-006063-P projects; by MCIN/AEI/10.13039/501100011033 and “ESF Investing in your future” under FPU16/03740 grant; by the CIBERSAM of the Instituto de Salud Carlos III; by MinCiencias project 1222-852-69927, contract 495-2020. The work is partially supported by the Autonomous Government of Andalusia (Spain) under project UMA18-FEDERJA-084, project name Detection of anomalous behavior agents by DL in low-cost video surveillance intelligent systems. Authors gratefully acknowledge the support of NVIDIA Corporation with the donation of a RTX A6000 48 Gb. This work was conducted in the context of the Horizon Europe project PRE-ACT, and it has received funding through the European Commission Horizon Europe Program (Grant Agreement number: 101057746). In addition, this work was supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract nummber 22 00058. S.B Cho was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)).Junta de AndalucĂ­a; CV20-45250Junta de AndalucĂ­a; A-TIC-080-UGR18Junta de AndalucĂ­a; B-TIC-586-UGR20Junta de AndalucĂ­a; P20-00525Junta de AndalucĂ­a; P18-RT-1624Junta de AndalucĂ­a; UMA20-FEDERJA-086Portugal. Fundação para a CiĂȘncia e a Tecnologia; DSAIPA/AI/0099/2019Xunta de Galicia; ED431G 2019/01Xunta de Galicia; GPC ED431B 2022/33Chile. Agencia Nacional de InvestigaciĂłn y Desarrollo; 1201572Generalitat Valenciana; PROMETEO/2019/119Bulgarian National Science Fund; KP-06-N42/4Bulgaria. Operational Programme Science and Education for Smart Growth; BG05M2OP001-1.002-0023Colombia. Ministerio de Ciencia, TecnologĂ­a e InnovaciĂłn; 1222-852-69927Junta de AndalucĂ­a; UMA18-FEDERJA-084SuĂ­za. State Secretariat for Education, Research and Innovation; 22 00058Institute of Information & Communications Technology Planning & Evaluation (Corea del Sur); 2020-0-0136

    Computational approaches to Explainable Artificial Intelligence: Advances in theory, applications and trends

    Get PDF
    Deep Learning (DL), a groundbreaking branch of Machine Learning (ML), has emerged as a driving force in both theoretical and applied Artificial Intelligence (AI). DL algorithms, rooted in complex and non-linear artificial neural systems, excel at extracting high-level features from data. DL has demonstrated human-level performance in real-world tasks, including clinical diagnostics, and has unlocked solutions to previously intractable problems in virtual agent design, robotics, genomics, neuroimaging, computer vision, and industrial automation. In this paper, the most relevant advances from the last few years in Artificial Intelligence (AI) and several applications to neuroscience, neuroimaging, computer vision, and robotics are presented, reviewed and discussed. In this way, we summarize the state-of-the-art in AI methods, models and applications within a collection of works presented at the 9 International Conference on the Interplay between Natural and Artificial Computation (IWINAC). The works presented in this paper are excellent examples of new scientific discoveries made in laboratories that have successfully transitioned to real-life applications

    Realtime Face Tracking and Animation

    Get PDF
    Capturing and processing human geometry, appearance, and motion is at the core of computer graphics, computer vision, and human-computer interaction. The high complexity of human geometry and motion dynamics, and the high sensitivity of the human visual system to variations and subtleties in faces and bodies make the 3D acquisition and reconstruction of humans in motion a challenging task. Digital humans are often created through a combination of 3D scanning, appearance acquisition, and motion capture, leading to stunning results in recent feature films. However, these methods typically require complex acquisition systems and substantial manual post-processing. As a result, creating and animating high-quality digital avatars entails long turn-around times and substantial production costs. Recent technological advances in RGB-D devices, such as Microsoft Kinect, brought new hopes for realtime, portable, and affordable systems allowing to capture facial expressions as well as hand and body motions. RGB-D devices typically capture an image and a depth map. This permits to formulate the motion tracking problem as a 2D/3D non-rigid registration of a deformable model to the input data. We introduce a novel face tracking algorithm that combines geometry and texture registration with pre-recorded animation priors in a single optimization. This led to unprecedented face tracking quality on a low cost consumer level device. The main drawback of this approach in the context of consumer applications is the need for an offline user-specific training. Robust and efficient tracking is achieved by building an accurate 3D expression model of the user's face who is scanned in a predefined set of facial expressions. We extended this approach removing the need of a user-specific training or calibration, or any other form of manual assistance, by modeling online a 3D user-specific dynamic face model. In complement of a realtime face tracking and modeling algorithm, we developed a novel system for animation retargeting that allows learning a high-quality mapping between motion capture data and arbitrary target characters. We addressed one of the main challenges of existing example-based retargeting methods, the need for a large number of accurate training examples to define the correspondence between source and target expression spaces. We showed that this number can be significantly reduced by leveraging the information contained in unlabeled data, i.e. facial expressions in the source or target space without corresponding poses. Finally, we present a novel realtime physics-based animation technique allowing to simulate a large range of deformable materials such as fat, flesh, hair, or muscles. This approach could be used to produce more lifelike animations by enhancing the animated avatars with secondary effects. We believe that the realtime face tracking and animation pipeline presented in this thesis has the potential to inspire numerous future research in the area of computer-generated animation. Already, several ideas presented in thesis have been successfully used in industry and this work gave birth to the startup company faceshift AG

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
    • 

    corecore