2,430 research outputs found

    Advanced Content and Interface Personalization through Conversational Behavior and Affective Embodied Conversational Agents

    Get PDF
    Conversation is becoming one of the key interaction modes in HMI. As a result, the conversational agents (CAs) have become an important tool in various everyday scenarios. From Apple and Microsoft to Amazon, Google, and Facebook, all have adapted their own variations of CAs. The CAs range from chatbots and 2D, carton-like implementations of talking heads to fully articulated embodied conversational agents performing interaction in various concepts. Recent studies in the field of face-to-face conversation show that the most natural way to implement interaction is through synchronized verbal and co-verbal signals (gestures and expressions). Namely, co-verbal behavior represents a major source of discourse cohesion. It regulates communicative relationships and may support or even replace verbal counterparts. It effectively retains semantics of the information and gives a certain degree of clarity in the discourse. In this chapter, we will represent a model of generation and realization of more natural machine-generated output

    Facial Expression Retargeting from Human to Avatar Made Easy

    Full text link
    Facial expression retargeting from humans to virtual characters is a useful technique in computer graphics and animation. Traditional methods use markers or blendshapes to construct a mapping between the human and avatar faces. However, these approaches require a tedious 3D modeling process, and the performance relies on the modelers' experience. In this paper, we propose a brand-new solution to this cross-domain expression transfer problem via nonlinear expression embedding and expression domain translation. We first build low-dimensional latent spaces for the human and avatar facial expressions with variational autoencoder. Then we construct correspondences between the two latent spaces guided by geometric and perceptual constraints. Specifically, we design geometric correspondences to reflect geometric matching and utilize a triplet data structure to express users' perceptual preference of avatar expressions. A user-friendly method is proposed to automatically generate triplets for a system allowing users to easily and efficiently annotate the correspondences. Using both geometric and perceptual correspondences, we trained a network for expression domain translation from human to avatar. Extensive experimental results and user studies demonstrate that even nonprofessional users can apply our method to generate high-quality facial expression retargeting results with less time and effort.Comment: IEEE Transactions on Visualization and Computer Graphics (TVCG), to appea

    A Framework for the Semantics-aware Modelling of Objects

    Get PDF
    The evolution of 3D visual content calls for innovative methods for modelling shapes based on their intended usage, function and role in a complex scenario. Even if different attempts have been done in this direction, shape modelling still mainly focuses on geometry. However, 3D models have a structure, given by the arrangement of salient parts, and shape and structure are deeply related to semantics and functionality. Changing geometry without semantic clues may invalidate such functionalities or the meaning of objects or their parts. We approach the problem by considering semantics as the formalised knowledge related to a category of objects; the geometry can vary provided that the semantics is preserved. We represent the semantics and the variable geometry of a class of shapes through the parametric template: an annotated 3D model whose geometry can be deformed provided that some semantic constraints remain satisfied. In this work, we design and develop a framework for the semantics-aware modelling of shapes, offering the user a single application environment where the whole workflow of defining the parametric template and applying semantics-aware deformations can take place. In particular, the system provides tools for the selection and annotation of geometry based on a formalised contextual knowledge; shape analysis methods to derive new knowledge implicitly encoded in the geometry, and possibly enrich the given semantics; a set of constraints that the user can apply to salient parts and a deformation operation that takes into account the semantic constraints and provides an optimal solution. The framework is modular so that new tools can be continuously added. While producing some innovative results in specific areas, the goal of this work is the development of a comprehensive framework combining state of the art techniques and new algorithms, thus enabling the user to conceptualise her/his knowledge and model geometric shapes. The original contributions regard the formalisation of the concept of annotation, with attached properties, and of the relations between significant parts of objects; a new technique for guaranteeing the persistence of annotations after significant changes in shape's resolution; the exploitation of shape descriptors for the extraction of quantitative information and the assessment of shape variability within a class; and the extension of the popular cage-based deformation techniques to include constraints on the allowed displacement of vertices. In this thesis, we report the design and development of the framework as well as results in two application scenarios, namely product design and archaeological reconstruction

    Video Data Visualization System: Semantic Classification And Personalization

    Full text link
    We present in this paper an intelligent video data visualization tool, based on semantic classification, for retrieving and exploring a large scale corpus of videos. Our work is based on semantic classification resulting from semantic analysis of video. The obtained classes will be projected in the visualization space. The graph is represented by nodes and edges, the nodes are the keyframes of video documents and the edges are the relation between documents and the classes of documents. Finally, we construct the user's profile, based on the interaction with the system, to render the system more adequate to its references.Comment: graphic

    A common ontology for multi-dimensional shapes

    Get PDF
    In recent years, digital shapes have become more and more widespread and have been made available in a plethora of online repositories. A systematic and formal approach for capturing and representing shape-related information is needed to facilitate its reuse and enable the demonstration of useful cross-domain usage scenarios. In this paper we present an ontology for digital shapes, called the Common Shape Ontology (CSO). We discuss the rationale, the requirements and the scope of this ontology, we present in detail its structure and describe the most relevant choices related to its development. Finally, we show how the CSO conceptualization is used in domain-specific application scenarios

    Learning to Reconstruct People in Clothing from a Single RGB Camera

    No full text
    We present a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our model learns to predict the parameters of a statistical body model and instance displacements that add clothing and hair to the shape. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, the model can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 6mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach

    Towards high-throughput 3D insect capture for species discovery and diagnostics

    Full text link
    Digitisation of natural history collections not only preserves precious information about biological diversity, it also enables us to share, analyse, annotate and compare specimens to gain new insights. High-resolution, full-colour 3D capture of biological specimens yields color and geometry information complementary to other techniques (e.g., 2D capture, electron scanning and micro computed tomography). However 3D colour capture of small specimens is slow for reasons including specimen handling, the narrow depth of field of high magnification optics, and the large number of images required to resolve complex shapes of specimens. In this paper, we outline techniques to accelerate 3D image capture, including using a desktop robotic arm to automate the insect handling process; using a calibrated pan-tilt rig to avoid attaching calibration targets to specimens; using light field cameras to capture images at an extended depth of field in one shot; and using 3D Web and mixed reality tools to facilitate the annotation, distribution and visualisation of 3D digital models.Comment: 2 pages, 1 figure, for BigDig workshop at 2017 eScience conferenc

    TREE-D-SEEK: A Framework for Retrieving Three-Dimensional Scenes

    Get PDF
    In this dissertation, a strategy and framework for retrieving 3D scenes is proposed. The strategy is to retrieve 3D scenes based on a unified approach for indexing content from disparate information sources and information levels. The TREE-D-SEEK framework implements the proposed strategy for retrieving 3D scenes and is capable of indexing content from a variety of corpora at distinct information levels. A semantic annotation model for indexing 3D scenes in the TREE-D-SEEK framework is also proposed. The semantic annotation model is based on an ontology for rapid prototyping of 3D virtual worlds. With ongoing improvements in computer hardware and 3D technology, the cost associated with the acquisition, production and deployment of 3D scenes is decreasing. As a consequence, there is a need for efficient 3D retrieval systems for the increasing number of 3D scenes in corpora. An efficient 3D retrieval system provides several benefits such as enhanced sharing and reuse of 3D scenes and 3D content. Existing 3D retrieval systems are closed systems and provide search solutions based on a predefined set of indexing and matching algorithms Existing 3D search systems and search solutions cannot be customized for specific requirements, type of information source and information level. In this research, TREE-D-SEEK—an open, extensible framework for retrieving 3D scenes—is proposed. The TREE-D-SEEK framework is capable of retrieving 3D scenes based on indexing low level content to high-level semantic metadata. The TREE-D-SEEK framework is discussed from a software architecture perspective. The architecture is based on a common process flow derived from indexing disparate information sources. Several indexing and matching algorithms are implemented. Experiments are conducted to evaluate the usability and performance of the framework. Retrieval performance of the framework is evaluated using benchmarks and manually collected corpora. A generic, semantic annotation model is proposed for indexing a 3D scene. The primary objective of using the semantic annotation model in the TREE-D-SEEK framework is to improve retrieval relevance and to support richer queries within a 3D scene. The semantic annotation model is driven by an ontology. The ontology is derived from a 3D rapid prototyping framework. The TREE-D-SEEK framework supports querying by example, keyword based and semantic annotation based query types for retrieving 3D scenes

    DAP3D-Net: Where, What and How Actions Occur in Videos?

    Full text link
    Action parsing in videos with complex scenes is an interesting but challenging task in computer vision. In this paper, we propose a generic 3D convolutional neural network in a multi-task learning manner for effective Deep Action Parsing (DAP3D-Net) in videos. Particularly, in the training phase, action localization, classification and attributes learning can be jointly optimized on our appearancemotion data via DAP3D-Net. For an upcoming test video, we can describe each individual action in the video simultaneously as: Where the action occurs, What the action is and How the action is performed. To well demonstrate the effectiveness of the proposed DAP3D-Net, we also contribute a new Numerous-category Aligned Synthetic Action dataset, i.e., NASA, which consists of 200; 000 action clips of more than 300 categories and with 33 pre-defined action attributes in two hierarchical levels (i.e., low-level attributes of basic body part movements and high-level attributes related to action motion). We learn DAP3D-Net using the NASA dataset and then evaluate it on our collected Human Action Understanding (HAU) dataset. Experimental results show that our approach can accurately localize, categorize and describe multiple actions in realistic videos
    • …
    corecore