Search CORE

180 research outputs found

Fast Deep Matting for Portrait Animation on Mobile Phone

Author: Cho Donghyeon
Gastal Eduardo SL
He Kaiming
Huang Gao
Jégou Simon
Paszke Adam
Qin Hongwei
Redmon Joseph
Shen Xiaoyong
Szegedy Christian
Publication venue
Publication date: 26/07/2017
Field of study

Image matting plays an important role in image and video editing. However, the formulation of image matting is inherently ill-posed. Traditional methods usually employ interaction to deal with the image matting problem with trimaps and strokes, and cannot run on the mobile phone in real-time. In this paper, we propose a real-time automatic deep matting approach for mobile devices. By leveraging the densely connected blocks and the dilated convolution, a light full convolutional network is designed to predict a coarse binary mask for portrait images. And a feathering block, which is edge-preserving and matting adaptive, is further developed to learn the guided filter and transform the binary mask into alpha matte. Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps. The experiments show that the proposed approach achieves comparable results with the state-of-the-art matting solvers.Comment: ACM Multimedia Conference (MM) 2017 camera-read

arXiv.org e-Print Archive

Crossref

Video Logo Retrieval based on local Features

Author: Guan Bochen
Liu Hong
Sethares William A.
Ye Hanrong
Publication venue
Publication date: 18/05/2020
Field of study

Estimation of the frequency and duration of logos in videos is important and challenging in the advertisement industry as a way of estimating the impact of ad purchases. Since logos occupy only a small area in the videos, the popular methods of image retrieval could fail. This paper develops an algorithm called Video Logo Retrieval (VLR), which is an image-to-video retrieval algorithm based on the spatial distribution of local image descriptors that measure the distance between the query image (the logo) and a collection of video images. VLR uses local features to overcome the weakness of global feature-based models such as convolutional neural networks (CNN). Meanwhile, VLR is flexible and does not require training after setting some hyper-parameters. The performance of VLR is evaluated on two challenging open benchmark tasks (SoccerNet and Standford I2V), and compared with other state-of-the-art logo retrieval or detection algorithms. Overall, VLR shows significantly higher accuracy compared with the existing methods.Comment: Accepted by ICIP 20. Contact author: Bochen Guan ([email protected]

arXiv.org e-Print Archive

Crossref

Operational Neural Networks

Author: Gabbouj Moncef
Ince Turker
Iosifidis Alexandros
Kiranyaz Serkan
Publication venue
Publication date: 18/10/2019
Field of study

Feed-forward, fully-connected Artificial Neural Networks (ANNs) or the so-called Multi-Layer Perceptrons (MLPs) are well-known universal approximators. However, their learning performance varies significantly depending on the function or the solution space that they attempt to approximate. This is mainly because of their homogenous configuration based solely on the linear neuron model. Therefore, while they learn very well those problems with a monotonous, relatively simple and linearly separable solution space, they may entirely fail to do so when the solution space is highly nonlinear and complex. Sharing the same linear neuron model with two additional constraints (local connections and weight sharing), this is also true for the conventional Convolutional Neural Networks (CNNs) and, it is, therefore, not surprising that in many challenging problems only the deep CNNs with a massive complexity and depth can achieve the required diversity and the learning performance. In order to address this drawback and also to accomplish a more generalized model over the convolutional neurons, this study proposes a novel network model, called Operational Neural Networks (ONNs), which can be heterogeneous and encapsulate neurons with any set of operators to boost diversity and to learn highly complex and multi-modal functions or spaces with minimal network complexity and training data. Finally, a novel training method is formulated to back-propagate the error through the operational layers of ONNs. Experimental results over highly challenging problems demonstrate the superior learning capabilities of ONNs even with few neurons and hidden layers.Comment: 21 page

arXiv.org e-Print Archive

Crossref

Trepo - Institutional Repository of Tampere University

Fine-tuning U-net for medical image segmentation based on activation function, optimizer and pooling layer

Author: Al Saraireh Jaafer
Ghnemat Rawan
Younisse Remah
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2023
Field of study

U-net convolutional neural network (CNN) is a famous architecture developed to deal with medical images. Fine-tuning CNNs is a common technique used to enhance their performance by selecting the building blocks which can provide the ultimate results. This paper introduces a method for tuning U-net architecture to improve its performance in medical image segmentation. The experiment is conducted using an x-ray image segmentation approach. The performance of U-net CNN in lung x-ray image segmentation is studied with different activation functions, optimizers, and pooling-bottleneck-layers. The analysis focuses on creating a method that can be applied for tuning U-net, like CNNs. It also provides the best activation function, optimizer, and pooling layer to enhance U-net CNN’s performance on x-ray image segmentation. The findings of this research showed that a U-net architecture worked supremely when we used the LeakyReLU activation function and average pooling layer as well as RMSProb optimizer. The U-net model accuracy is raised from 89.59 to 93.81% when trained and tested with lung x-ray images and uses the LeakyReLU activation function, average pooling layer, and RMSProb optimizer. The fine-tuned model also enhanced accuracy results with three other datasets

Institute of Advanced Engineering and Science

Visual Saliency Detection Based on Multiscale Deep CNN Features

Author: LI G
Yu Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

postprin

arXiv.org e-Print Archive

Crossref

HKU Scholars Hub

Artificial Intelligence for Multimedia Signal Processing

Author
Publication venue: 'MDPI AG'
Publication date: 16/09/2022
Field of study

Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

Directory of Open Access Books (DOAB)

Graph convolutional neural network for multi-scale feature learning

Author: Gary Tam
Michael Edwards
Robert Palmer
Xianghua Xie
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Cronfa at Swansea University