Search CORE

49,191 research outputs found

A Convolutional Neural Network for the Automatic Diagnosis of Collagen VI related Muscular Dystrophies

Author: Badosa Carmen
Bazaga Adrián
Jiménez-Mallebrera Cecilia
Porta Josep M.
Roldán Mònica
Publication venue
Publication date: 30/01/2019
Field of study

The development of machine learning systems for the diagnosis of rare diseases is challenging mainly due the lack of data to study them. Despite this challenge, this paper proposes a system for the Computer Aided Diagnosis (CAD) of low-prevalence, congenital muscular dystrophies from confocal microscopy images. The proposed CAD system relies on a Convolutional Neural Network (CNN) which performs an independent classification for non-overlapping patches tiling the input image, and generates an overall decision summarizing the individual decisions for the patches on the query image. This decision scheme points to the possibly problematic areas in the input images and provides a global quantitative evaluation of the state of the patients, which is fundamental for diagnosis and to monitor the efficiency of therapies.Comment: Submitted for review to Expert Systems With Application

arXiv.org e-Print Archive

Digital.CSIC

MonoPerfCap: Human Performance Capture from Monocular Video

Author: Chatterjee Avishek
Mehta Dushyant
Rhodin Helge
Seidel Hans-Peter
Theobalt Christian
Xu Weipeng
Zollhöfer Michael
Publication venue
Publication date: 01/01/2018
Field of study

We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

MPG.PuRe

Text-based Editing of Talking-head Video

Author: Agrawala M.
Finkelstein A.
Fried O.
Genova K.
Goldman D.
Jin Z.
Shechtman E.
Tewari A.
Theobalt C.
Zollhöfer M.
Publication venue
Publication date: 01/01/2019
Field of study

Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis

MPG.PuRe