337 research outputs found

    Real-time deep hair matting on mobile devices

    Full text link
    Augmented reality is an emerging technology in many application domains. Among them is the beauty industry, where live virtual try-on of beauty products is of great importance. In this paper, we address the problem of live hair color augmentation. To achieve this goal, hair needs to be segmented quickly and accurately. We show how a modified MobileNet CNN architecture can be used to segment the hair in real-time. Instead of training this network using large amounts of accurate segmentation data, which is difficult to obtain, we use crowd sourced hair segmentation data. While such data is much simpler to obtain, the segmentations there are noisy and coarse. Despite this, we show how our system can produce accurate and fine-detailed hair mattes, while running at over 30 fps on an iPad Pro tablet.Comment: 7 pages, 7 figures, submitted to CRV 201

    Classification of Humans into Ayurvedic Prakruti Types using Computer Vision

    Get PDF
    Ayurveda, a 5000 years old Indian medical science, believes that the universe and hence humans are made up of five elements namely ether, fire, water, earth, and air. The three Doshas (Tridosha) Vata, Pitta, and Kapha originated from the combinations of these elements. Every person has a unique combination of Tridosha elements contributing to a person’s ‘Prakruti’. Prakruti governs the physiological and psychological tendencies in all living beings as well as the way they interact with the environment. This balance influences their physiological features like the texture and colour of skin, hair, eyes, length of fingers, the shape of the palm, body frame, strength of digestion and many more as well as the psychological features like their nature (introverted, extroverted, calm, excitable, intense, laidback), and their reaction to stress and diseases. All these features are coded in the constituents at the time of a person’s creation and do not change throughout their lifetime. Ayurvedic doctors analyze the Prakruti of a person either by assessing the physical features manually and/or by examining the nature of their heartbeat (pulse). Based on this analysis, they diagnose, prevent and cure the disease in patients by prescribing precision medicine. This project focuses on identifying Prakruti of a person by analysing his facial features like hair, eyes, nose, lips and skin colour using facial recognition techniques in computer vision. This is the first of its kind research in this problem area that attempts to bring image processing into the domain of Ayurveda

    2kenize: Tying Subword Sequences for Chinese Script Conversion

    Full text link
    Simplified Chinese to Traditional Chinese character conversion is a common preprocessing step in Chinese NLP. Despite this, current approaches have poor performance because they do not take into account that a simplified Chinese character can correspond to multiple traditional characters. Here, we propose a model that can disambiguate between mappings and convert between the two scripts. The model is based on subword segmentation, two language models, as well as a method for mapping between subword sequences. We further construct benchmark datasets for topic classification and script conversion. Our proposed method outperforms previous Chinese Character conversion approaches by 6 points in accuracy. These results are further confirmed in a downstream application, where 2kenize is used to convert pretraining dataset for topic classification. An error analysis reveals that our method's particular strengths are in dealing with code-mixing and named entities.Comment: Accepted to ACL 202

    Dynamic Face Video Segmentation via Reinforcement Learning

    Full text link
    For real-time semantic video segmentation, most recent works utilised a dynamic framework with a key scheduler to make online key/non-key decisions. Some works used a fixed key scheduling policy, while others proposed adaptive key scheduling methods based on heuristic strategies, both of which may lead to suboptimal global performance. To overcome this limitation, we model the online key decision process in dynamic video segmentation as a deep reinforcement learning problem and learn an efficient and effective scheduling policy from expert information about decision history and from the process of maximising global return. Moreover, we study the application of dynamic video segmentation on face videos, a field that has not been investigated before. By evaluating on the 300VW dataset, we show that the performance of our reinforcement key scheduler outperforms that of various baselines in terms of both effective key selections and running speed. Further results on the Cityscapes dataset demonstrate that our proposed method can also generalise to other scenarios. To the best of our knowledge, this is the first work to use reinforcement learning for online key-frame decision in dynamic video segmentation, and also the first work on its application on face videos.Comment: CVPR 2020. 300VW with segmentation labels is available at: https://github.com/mapleandfire/300VW-Mas

    Convolutional Neural Fabrics

    Get PDF
    Despite the success of CNNs, selecting the optimal architecture for a given task remains an open problem. Instead of aiming to select a single optimal architecture, we propose a "fabric" that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyper-parameters of a fabric are the number of channels and layers. While individual architectures can be recovered as paths, the fabric can in addition ensemble all embedded architectures together, sharing their weights where their paths overlap. Parameters can be learned using standard methods based on back-propagation, at a cost that scales linearly in the fabric size. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels dataset.Comment: Corrected typos (In proceedings of NIPS16
    • …
    corecore