21 research outputs found

    Virtual Friend: Tracking and Generating Natural Interactive Behaviours in Real Video

    Get PDF
    The aim of our research is to create a "virtual friend" i.e., a virtual character capable of responding to actions obtained from observing a real person in video in a realistic and sensible manner. In this paper, we present a novel approach for generating a variety of complex behavioural responses for a fully articulated "virtual friend" in three dimensional (3D) space. Our approach is model-based. First of all, we train a collection of dual hidden Markov models (HMMs) on 3D motion capture (MoCap) data representing a number of interactions between two people. Secondly, we track 3D articulated motion of a single person in ordinary 2D video. Finally, using the dual HMM, we generate a moving "virtual friend" reacting to the motion of the tracked person and place it in the original video footage. In this paper, we describe our approach in depth as well as present the results of experiments, which show that the produced behaviours are very close to those of real people

    XAI & I: Self-explanatory AI facilitating mutual understanding between AI and human experts

    Get PDF
    Traditionally, explainable artificial intelligence seeks to provide explanation and interpretability of high-performing black-box models such as deep neural networks. Interpretation of such models remains difficult, because of their high complexity. An alternative method is to instead force a deep-neural network to use human-intelligible features as the basis for its decisions. We tested this approach using the natural category domain of rock types. We compared the performance of a black-box implementation of transfer-learning using Resnet50 to that of a network first trained to predict expert-identified features and then forced to use these features to categorise rock images. The performance of this feature-constrained network was virtually identical to that of the unconstrained network. Further, a partially constrained network forced to condense down to a small number of features that was not trained with expert features did not result in these abstracted features being intelligible; nevertheless, an affine transformation of these features could be found that aligned well with expert-intelligible features. These findings show that making an AI intrinsically intelligible need not be at the cost of performance

    Automatic carotid ultrasound segmentation using deep convolutional neural networks and phase congruency maps

    No full text
    The segmentation of media-adventitia and lumen-intima boundaries of the Carotid Artery forms an essential part in assessing plaque morphology in Ultrasound Imaging. Manual methods are tedious and prone to variability and thus, developing automated segmentation algorithms is preferable. In this paper, we propose to use deep convolutional networks for automated segmentation of the media-adventitia boundary in transverse and longitudinal sections of carotid ultrasound images. Deep networks have recently been employed with good success on image segmentation tasks, and we thus propose their application on ultrasound data, using an encoder-decoder convolutional structure which allows the network to be trained end-to-end for pixel-wise classification. Concurrently, we evaluate the performance for various configurations, depths and filter sizes within the network. In addition, we further propose a novel fusion of envelope and phase congruency data as an input to the network, as the latter provides an intensity-invariant data source to the network. We show that this data fusion and the proposed network structure yields higher segmentation performance than the state-of-the-art techniques

    Video assisted speech source separation

    No full text
    We investigate the problem of integrating the complementary audio and visual modalities for speech separation. Rather than using independence criteria suggested in most blind source separation (BSS) systems, we use visual features from a video signal as additional information to optimize the unmixing matrix. We achieve this by using a statistical model characterizing the nonlinear coherence between audio and visual features as a separation criterion for both instantaneous and convolutive mixtures. We acquire the model by applying the Bayesian framework to the fused feature observations based on a training corpus. We point out several key existing challenges to the success of the system. Experimental results verify the proposed approach, which outperforms the audio only separation system in a noisy environment, and also provides a solution to the permutation problem
    corecore