55 research outputs found

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    Computing Education in a Hybrid World

    Get PDF

    The dawn of the human-machine era: a forecast of new and emerging language technologies

    Get PDF
    New language technologies are coming, thanks to the huge and competing private investment fuelling rapid progress; we can either understand and foresee their effects, or be taken by surprise and spend our time trying to catch up. This report scketches out some transformative new technologies that are likely to fundamentally change our use of language. Some of these may feel unrealistically futuristic or far-fetched, but a central purpose of this report - and the wider LITHME network - is to illustrate that these are mostly just the logical development and maturation of technologies currently in prototype. But will everyone benefit from all these shiny new gadgets? Throughout this report we emphasise a range of groups who will be disadvantaged and issues of inequality. Important issues of security and privacy will accompany new language technologies. A further caution is to re-emphasise the current limitations of AI. Looking ahead, we see many intriguing opportunities and new capabilities, but a range of other uncertainties and inequalities. New devices will enable new ways to talk, to translate, to remember, and to learn. But advances in technology will reproduce existing inequalities among those who cannot afford these devices, among the world's smaller languages, and especially for sign language. Debates over privacy and security will flare and crackle with every new immersive gadget. We will move together into this curious new world with a mix of excitement and apprehension - reacting, debating, sharing and disagreeing as we always do. Plug in, as the human-machine era dawn

    Student Union Handbook 2016-17

    Get PDF
    Annual Handbook Publication of the OCAD Student Unio

    Robot mediated communication: Enhancing tele-presence using an avatar

    Get PDF
    In the past few years there has been a lot of development in the field of tele-presence. These developments have caused tele-presence technologies to become easily accessible and also for the experience to be enhanced. Since tele-presence is not only used for tele-presence assisted group meetings but also in some forms of Computer Supported Cooperative Work (CSCW), these activities have also been facilitated. One of the lingering issues has to do with how to properly transmit presence of non-co-located members to the rest of the group. Using current commercially available tele-presence technology it is possible to exhibit a limited level of social presence but no physical presence. In order to cater for this lack of presence a system is implemented here using tele-operated robots as avatars for remote team members and had its efficacy tested. This testing includes both the level of presence that can be exhibited by robot avatars but also how the efficacy of these robots for this task changes depending on the morphology of the robot. Using different types of robots, a humanoid robot and an industrial robot arm, as tele-presence avatars, it is found that the humanoid robot using an appropriate control system is better at exhibiting a social presence. Further, when compared to a voice only scenario, both robots proved significantly better than with only voice in terms of both cooperative task solving and social presence. These results indicate that using an appropriate control system, a humanoid robot can be better than an industrial robot in these types of tasks and the validity of aiming for a humanoid design behaving in a human-like way in order to emulate social interactions that are closer to human norms. This has implications for the design of autonomous socially interactive robot systems

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity

    Whole-Body Motion Capture and Beyond: From Model-Based Inference to Learning-Based Regression

    Get PDF
    Herkömmliche markerlose Motion Capture (MoCap)-Methoden sind zwar effektiv und erfolgreich, haben aber mehrere Einschränkungen: 1) Sie setzen ein charakterspezifi-sches Körpermodell voraus und erlauben daher keine vollautomatische Pipeline und keine Verallgemeinerung über verschiedene Korperformen; 2) es werden keine Objekte verfolgt, mit denen Menschen interagieren, während in der Realität die Interaktion zwischen Menschen und Objekten allgegenwärtig ist; 3) sie sind in hohem Maße von ausgeklügelten Optimierungen abhängig, die eine gute Initialisierung und starke Prioritäten erfordern. Dieser Prozess kann sehr zeitaufwändig sein. In dieser Arbeit befassen wir uns mit allen oben genannten Problemen. Zunächst schlagen wir eine vollautomatische Methode zur genauen 3D-Rekonstruktion des menschlichen Körpers aus RGB-Videos mit mehreren Ansichten vor. Wir verarbeiten alle RGB-Videos vor, um 2D-Keypoints und Silhouetten zu erhalten. Dann passen wir modell in zwei aufeinander folgenden Schritten an die 2D-Messungen an. In der ersten Phase werden die Formparameter und die Posenparameter der SMPL nacheinander und bildweise geschtäzt. In der zweiten Phase wird eine Reihe von Einzelbildern gemeinsam mit der zusätzlichen DCT-Priorisierung (Discrete Cosine Transformation) verfeinert. Unsere Methode kann verschiedene Körperformen und schwierige Posen ohne menschliches Zutun verarbeiten. Dann erweitern wir das MoCap-System, um die Verfolgung von starren Objekten zu unterstutzen, mit denen die Testpersonen interagieren. Unser System besteht aus 6 RGB-D Azure-Kameras. Zunächst werden alle RGB-D Videos vorverarbeitet, indem Menschen und Objekte segmentiert und 2D-Körpergelenke erkannt werden. Das SMPL-X Modell wird hier eingesetzt, um die Handhaltung besser zu erfassen. Das SMPL-XModell wird in 2D-Keypoints und akkumulierte Punktwolken eingepasst. Wir zeigen, dass die Körperhaltung wichtige Informationen für eine bessere Objektverfolgung liefert. Anschließend werden die Körper- und Objektposen gemeinsam mit Kontakt- und Durch-dringungsbeschrankungen optimiert. Mit diesem Ansatz haben wir den ersten Mensch-Objekt-Interaktionsdatensatz mit natürlichen RGB-Bildern und angemessenen Körper und Objektbewegungsinformationen erfasst. Schließlich präsentieren wir das erste praktische, leichtgewichtige MoCap-System, das nur 6 Inertialmesseinheiten (IMUs) benötigt. Unser Ansatz basiert auf bi-direktionalen rekurrenten neuronalen Netzen (Bi-RNN). Das Netzwerk soll die zeitliche Abhängigkeit besser ausnutzen, indem es vergangene und zukünftige Teilmessungen der IMUs zu- sammenfasst. Um das Problem der Datenknappheit zu lösen, erstellen wir synthetische Daten aus archivierten MoCap-Daten. Insgesamt läuft unser System 10 Mal schneller als die Optimierungsmethode und ist numerisch genauer. Wir zeigen auch, dass es möglich ist, die Aktivität der Testperson abzuschätzen, indem nur die IMU Messung der Smart-watch, die die Testperson trägt, betrachtet wird. Zusammenfassend lässt sich sagen, dass wir die markerlose MoCap-Methode weiter-entwickelt haben, indem wir das erste automatische und dennoch genaue System beisteuerten, die MoCap-Methoden zur Unterstützung der Verfolgung starrer Objekte erweiterten und einen praktischen und leichtgewichtigen Algorithmus mit 6 IMUs vorschlugen. Wir glauben, dass unsere Arbeit die markerlose MoCap billiger und praktikabler macht und somit den Endnutzern fur den taglichen Gebrauch näher bringt.Though effective and successful, traditional marker-less Motion Capture (MoCap) methods suffer from several limitations: 1) they presume a character-specific body model, thus they do not permit a fully automatic pipeline and generalization over diverse body shapes; 2) no objects humans interact with are tracked, while in reality interaction between humans and objects is ubiquitous; 3) they heavily rely on a sophisticated optimization process, which needs a good initialization and strong priors. This process can be slow. We address all the aforementioned issues in this thesis, as described below. Firstly we propose a fully automatic method to accurately reconstruct a 3D human body from multi-view RGB videos, the typical setup for MoCap systems. We pre-process all RGB videos to obtain 2D keypoints and silhouettes. Then we fit the SMPL body model into the 2D measurements in two successive stages. In the first stage, the shape and pose parameters of SMPL are estimated frame-wise sequentially. In the second stage, a batch of frames are refined jointly with an extra DCT prior. Our method can naturally handle different body shapes and challenging poses without human intervention. Then we extend this system to support tracking of rigid objects the subjects interact with. Our setup consists of 6 Azure Kinect cameras. Firstly we pre-process all the videos by segmenting humans and objects and detecting 2D body joints. We adopt the SMPL-X model here to capture body and hand pose. The model is fitted to 2D keypoints and point clouds. Then the body poses and object poses are jointly updated with contact and interpenetration constraints. With this approach, we capture a novel human-object interaction dataset with natural RGB images and plausible body and object motion information. Lastly, we present the first practical and lightweight MoCap system that needs only 6 IMUs. Our approach is based on Bi-directional RNNs. The network can make use of temporal information by jointly reasoning about past and future IMU measurements. To handle the data scarcity issue, we create synthetic data from archival MoCap data. Overall, our system runs ten times faster than traditional optimization-based methods, and is numerically more accurate. We also show it is feasible to estimate which activity the subject is doing by only observing the IMU measurement from a smartwatch worn by the subject. This not only can be useful for a high-level semantic understanding of the human behavior, but also alarms the public of potential privacy concerns. In summary, we advance marker-less MoCap by contributing the first automatic yet accurate system, extending the MoCap methods to support rigid object tracking, and proposing a practical and lightweight algorithm via 6 IMUs. We believe our work makes marker-less and IMUs-based MoCap cheaper and more practical, thus closer to end-users for daily usage

    Pre-conference proceedings of the 3rd IFIP TC 13.6 HWID working conference

    Get PDF
    The committees under IFIP include the Technical Committee TC13 on Human – Computer Interaction within which the work of this volume has been conducted. TC 13 on Human-Computer Interaction has as its aim to encourage theoretical and empirical human science research to promote the design and evaluation of human-oriented ICT. Within TC 13 there are different Working Groups concerned with different aspects of Human-Computer Interaction. The flagship event of TC13 is the bi-annual international conference called INTERACT at which both invited and contributed papers are presented. Contributed papers are rigorously refereed and the rejection rate is high. Publications arising from these TC13 events are published as conference proceedings such as the INTERACT proceedings or as collections of selected and edited papers from working conferences and workshops. See http://www.ifip.org/ for aims and scopes of TC13 and its associated Working Group

    Synthesization and reconstruction of 3D faces by deep neural networks

    Get PDF
    The past few decades have witnessed substantial progress towards 3D facial modelling and reconstruction as it is high importance for many computer vision and graphics applications including Augmented/Virtual Reality (AR/VR), computer games, movie post-production, image/video editing, medical applications, etc. In the traditional approaches, facial texture and shape are represented as triangle mesh that can cover identity and expression variation with non-rigid deformation. A dataset of 3D face scans is then densely registered into a common topology in order to construct a linear statistical model. Such models are called 3D Morphable Models (3DMMs) and can be used for 3D face synthesization or reconstruction by a single or few 2D face images. The works presented in this thesis focus on the modernization of these traditional techniques in the light of recent advances of deep learning and thanks to the availability of large-scale datasets. Ever since the introduction of 3DMMs by over two decades, there has been a lot of progress on it and they are still considered as one of the best methodologies to model 3D faces. Nevertheless, there are still several aspects of it that need to be upgraded to the "deep era". Firstly, the conventional 3DMMs are built by linear statistical approaches such as Principal Component Analysis (PCA) which omits high-frequency information by its nature. While this does not curtail shape, which is often smooth in the original data, texture models are heavily afflicted by losing high-frequency details and photorealism. Secondly, the existing 3DMM fitting approaches rely on very primitive (i.e. RGB values, sparse landmarks) or hand-crafted features (i.e. HOG, SIFT) as supervision that are sensitive to "in-the-wild" images (i.e. lighting, pose, occlusion), or somewhat missing identity/expression resemblance with the target image. Finally, shape, texture, and expression modalities are separately modelled by ignoring the correlation among them, placing a fundamental limit to the synthesization of semantically meaningful 3D faces. Moreover, photorealistic 3D face synthesis has not been studied thoroughly in the literature. This thesis attempts to address the above-mentioned issues by harnessing the power of deep neural network and generative adversarial networks as explained below: Due to the linear texture models, many of the state-of-the-art methods are still not capable of reconstructing facial textures with high-frequency details. For this, we take a radically different approach and build a high-quality texture model by Generative Adversarial Networks (GANs) that preserves details. That is, we utilize GANs to train a very powerful generator of facial texture in the UV space. And then show that it is possible to employ this generator network as a statistical texture prior to 3DMM fitting. The resulting texture reconstructions are plausible and photorealistic as GANs are faithful to the real-data distribution in both low- and high- frequency domains. Then, we revisit the conventional 3DMM fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We propose to optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. In order to be robust towards initialization and expedite the fitting process, we also propose a novel self-supervised regression-based approach. We demonstrate excellent 3D face reconstructions that are photorealistic and identity preserving and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details. In order to extend the non-linear texture model for photo-realistic 3D face synthesis, we present a methodology that generates high-quality texture, shape, and normals jointly. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. Additionally, we study another approach for photo-realistic face synthesis by 3D guidance. This study proposes to generate 3D faces by linear 3DMM and then augment their 2D rendering by an image-to-image translation network to the photorealistic face domain. Both works demonstrate excellent photorealistic face synthesis and show that the generated faces are improving face recognition benchmarks as synthetic training data. Finally, we study expression reconstruction for personalized 3D face models where we improve generalization and robustness of expression encoding. First, we propose a 3D augmentation approach on 2D head-mounted camera images to increase robustness to perspective changes. And, we also propose to train generic expression encoder network by populating the number of identities with a novel multi-id personalized model training architecture in a self-supervised manner. Both approaches show promising results in both qualitative and quantitative experiments.Open Acces
    corecore