967 research outputs found

    Real-time deep hair matting on mobile devices

    Full text link
    Augmented reality is an emerging technology in many application domains. Among them is the beauty industry, where live virtual try-on of beauty products is of great importance. In this paper, we address the problem of live hair color augmentation. To achieve this goal, hair needs to be segmented quickly and accurately. We show how a modified MobileNet CNN architecture can be used to segment the hair in real-time. Instead of training this network using large amounts of accurate segmentation data, which is difficult to obtain, we use crowd sourced hair segmentation data. While such data is much simpler to obtain, the segmentations there are noisy and coarse. Despite this, we show how our system can produce accurate and fine-detailed hair mattes, while running at over 30 fps on an iPad Pro tablet.Comment: 7 pages, 7 figures, submitted to CRV 201

    Fast Deep Matting for Portrait Animation on Mobile Phone

    Full text link
    Image matting plays an important role in image and video editing. However, the formulation of image matting is inherently ill-posed. Traditional methods usually employ interaction to deal with the image matting problem with trimaps and strokes, and cannot run on the mobile phone in real-time. In this paper, we propose a real-time automatic deep matting approach for mobile devices. By leveraging the densely connected blocks and the dilated convolution, a light full convolutional network is designed to predict a coarse binary mask for portrait images. And a feathering block, which is edge-preserving and matting adaptive, is further developed to learn the guided filter and transform the binary mask into alpha matte. Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps. The experiments show that the proposed approach achieves comparable results with the state-of-the-art matting solvers.Comment: ACM Multimedia Conference (MM) 2017 camera-read

    Virtual Occlusions Through Implicit Depth

    Get PDF
    For augmented reality (AR), it is important that virtual assets appear to 'sit among' real world objects. The virtual element should variously occlude and be occluded by real matter, based on a plausible depth ordering. This occlusion should be consistent over time as the viewer's camera moves. Unfortunately, small mistakes in the estimated scene depth can ruin the downstream occlusion mask, and thereby the AR illusion. Especially in real-time settings, depths inferred near boundaries or across time can be inconsistent. In this paper, we challenge the need for depth-regression as an intermediate step. We instead propose an implicit model for depth and use that to predict the occlusion mask directly. The inputs to our network are one or more color images, plus the known depths of any virtual geometry. We show how our occlusion predictions are more accurate and more temporally stable than predictions derived from traditional depth-estimation models. We obtain state-of-the-art occlusion results on the challenging ScanNetv2 dataset and superior qualitative results on real scenes

    Workflow for reducing semantic segmentation annotation time

    Get PDF
    Abstract. Semantic segmentation is a challenging task within the field of pattern recognition from digital images. Current semantic segmentation methods that are based on neural networks show great promise in accurate pixel-level classification, but the methods seem to be limited at least to some extent by the availability of accurate training data. Semantic segmentation training data is typically curated by humans, but the task is rather slow and tedious even for humans. While humans are fast at checking whether a segmentation is accurate or not, creating segmentations is rather slow as the human visual system becomes limited by physical interfaces such as hand coordination for drawing segmentations by hand. This thesis evaluates a workflow that aims to reduce the need for drawing segmentations by hand to create an accurate set of training data. A publicly available dataset is used as the starting-point for the annotation process, and four different evaluation sets are used to evaluate the introduced annotation workflow in labour efficiency and annotation accuracy. Evaluation of the results indicates that the workflow can produce annotations that are comparable to manually corrected annotations in accuracy while requiring significantly less manual labour to produce annotations.Työnkulku semanttisen segmentoinnin annotointiajan vähentämiseen. Tiivistelmä. Semanttinen segmentointi on haastava osa-alue hahmontunnistusta digitaalisista kuvista. Tämänhetkiset semanttiset segmentaatiomenetelmät, jotka perustuvat neuroverkkoihin, osoittavat suurta potentiaalia tarkassa pikselitason luokittelussa, mutta ovat ainakin osittain tarkan koulutusdatan saatavuuden rajoittamia. Semanttisen segmentaation koulutusdata on tyypillisesti täysin ihmisten annotoimaa, mutta segmentaatioiden annotointi on hidasta ja pitkäveteistä. Vaikka ihmiset ovat nopeita tarkistamaan ovatko annotaatiot tarkkoja, niiden luonti on hidasta, koska ihmisen visuaalisen järjestelmän nopeuden ja tarkkuuden rajoittavaksi tekijäksi lisätään fyysinen rajapinta, kuten silmä-käsi-koordinaatio piirtäessä segmentaatioita käsin. Tämä opinnäytetyö arvioi kokonaisvaltaisen semanttisten segmentaatioiden annotointitavan, joka pyrkii vähentämään käsin piirtämisen tarvetta tarkan koulutusdatan luomiseksi. Julkisesti saatavilla olevaa datajoukkoa käytetään annotoinnin lähtökohtana, ja neljää erilaista evaluointijoukkoa käytetään esitetyn annotointitavan työtehokkuuden sekä annotaatiotarkkuuden arviointiin. Evaluaatiotulokset osoittavat, että esitetty tapa kykenee tuottamaan annotaatioita jotka ovat yhtä tarkkoja kuin käsin korjatut annotaatiot samalla merkittävästi vähentäen käsin tehtävän työn määrää
    corecore