Search CORE

8,307 research outputs found

Fast Deep Matting for Portrait Animation on Mobile Phone

Author: Cho Donghyeon
Gastal Eduardo SL
He Kaiming
Huang Gao
Jégou Simon
Paszke Adam
Qin Hongwei
Redmon Joseph
Shen Xiaoyong
Szegedy Christian
Publication venue
Publication date: 26/07/2017
Field of study

Image matting plays an important role in image and video editing. However, the formulation of image matting is inherently ill-posed. Traditional methods usually employ interaction to deal with the image matting problem with trimaps and strokes, and cannot run on the mobile phone in real-time. In this paper, we propose a real-time automatic deep matting approach for mobile devices. By leveraging the densely connected blocks and the dilated convolution, a light full convolutional network is designed to predict a coarse binary mask for portrait images. And a feathering block, which is edge-preserving and matting adaptive, is further developed to learn the guided filter and transform the binary mask into alpha matte. Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps. The experiments show that the proposed approach achieves comparable results with the state-of-the-art matting solvers.Comment: ACM Multimedia Conference (MM) 2017 camera-read

arXiv.org e-Print Archive

Crossref

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

Author: Lin Chen-Hsuan
Lucey Simon
Shechtman Eli
Wang Oliver
Yumer Ersin
Publication venue
Publication date: 05/03/2018
Field of study

We address the problem of finding realistic geometric corrections to a foreground object such that it appears natural when composited into a background image. To achieve this, we propose a novel Generative Adversarial Network (GAN) architecture that utilizes Spatial Transformer Networks (STNs) as the generator, which we call Spatial Transformer GANs (ST-GANs). ST-GANs seek image realism by operating in the geometric warp parameter space. In particular, we exploit an iterative STN warping scheme and propose a sequential training strategy that achieves better results compared to naive training of a single generator. One of the key advantages of ST-GAN is its applicability to high-resolution images indirectly since the predicted warp parameters are transferable between reference frames. We demonstrate our approach in two applications: (1) visualizing how indoor furniture (e.g. from product images) might be perceived in a room, (2) hallucinating how accessories like glasses would look when matched with real portraits.Comment: Accepted to CVPR 2018 (website & code: https://chenhsuanlin.bitbucket.io/spatial-transformer-GAN/

arXiv.org e-Print Archive

Crossref

Integrating Segmentation and Similarity in Melodic Analysis

Author: Weyde T.
Publication venue
Publication date: 01/01/2002
Field of study

The recognition of melodic structure depends on both the segmentation into structural units, the melodic motifs, and relations of motifs which are mainly determined by similarity. Existing models and studies of segmentation and motivic similarity cover only certain aspects and do not provide a comprehensive or coherent theory. In this paper an Integrated Segmentation and Similarity Model (ISSM) for melodic analysis is introduced. The ISSM yields an interpretation similar to a paradigmatic analysis for a given melody. An interpretation comprises a segmentation, assignments of related motifs and notes, and detailed information on the differences of assigned motifs and notes. The ISSM is based on generating and rating interpretations to find the most adequate one. For this rating a neuro-fuzzy-system is used, which combines knowledge with learning from data. The ISSM is an extension of a system for rhythm analysis. This paper covers the model structure and the features relevant for melodic and motivic analysis. Melodic segmentation and similarity ratings are described and results of a small experiment which show that the ISSM can learn structural interpretations from data and that integrating similarity improves segmentation performance of the model

CiteSeerX

City Research Online

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization

Author: Abdal Rameen
Qin Yipeng
Wonka Peter
Zhu Peihao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/05/2020
Field of study

We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.Comment: Accepted as a CVPR 2020 oral paper. The interactive demo is available at https://youtu.be/0Vbj9xFgoU

arXiv.org e-Print Archive

Crossref

From Sparse to Precise: A Practical Editing Approach for Intracardiac Echocardiography Segmentation

Author: El-Zehiry Noha
Shahin Ahmed H.
Zhuang Yan
Publication venue
Publication date: 23/07/2023
Field of study

Accurate and safe catheter ablation procedures for patients with atrial fibrillation require precise segmentation of cardiac structures in Intracardiac Echocardiography (ICE) imaging. Prior studies have suggested methods that employ 3D geometry information from the ICE transducer to create a sparse ICE volume by placing 2D frames in a 3D grid, enabling training of 3D segmentation models. However, the resulting 3D masks from these models can be inaccurate and may lead to serious clinical complications due to the sparse sampling in ICE data, frames misalignment, and cardiac motion. To address this issue, we propose an interactive editing framework that allows users to edit segmentation output by drawing scribbles on a 2D frame. The user interaction is mapped to the 3D grid and utilized to execute an editing step that modifies the segmentation in the vicinity of the interaction while preserving the previous segmentation away from the interaction. Furthermore, our framework accommodates multiple edits to the segmentation output in a sequential manner without compromising previous edits. This paper presents a novel loss function and a novel evaluation metric specifically designed for editing. Results from cross-validation and testing indicate that our proposed loss function outperforms standard losses and training strategies in terms of segmentation quality and following user input. Additionally, we show quantitatively and qualitatively that subsequent edits do not compromise previous edits when using our method, as opposed to standard segmentation losses. Overall, our approach enhances the accuracy of the segmentation while avoiding undesired changes away from user interactions and without compromising the quality of previously edited regions, leading to better patient outcomes.Comment: Accepted to MICCAI 202

arXiv.org e-Print Archive