180 research outputs found
Fast Deep Matting for Portrait Animation on Mobile Phone
Image matting plays an important role in image and video editing. However,
the formulation of image matting is inherently ill-posed. Traditional methods
usually employ interaction to deal with the image matting problem with trimaps
and strokes, and cannot run on the mobile phone in real-time. In this paper, we
propose a real-time automatic deep matting approach for mobile devices. By
leveraging the densely connected blocks and the dilated convolution, a light
full convolutional network is designed to predict a coarse binary mask for
portrait images. And a feathering block, which is edge-preserving and matting
adaptive, is further developed to learn the guided filter and transform the
binary mask into alpha matte. Finally, an automatic portrait animation system
based on fast deep matting is built on mobile devices, which does not need any
interaction and can realize real-time matting with 15 fps. The experiments show
that the proposed approach achieves comparable results with the
state-of-the-art matting solvers.Comment: ACM Multimedia Conference (MM) 2017 camera-read
Video Logo Retrieval based on local Features
Estimation of the frequency and duration of logos in videos is important and
challenging in the advertisement industry as a way of estimating the impact of
ad purchases. Since logos occupy only a small area in the videos, the popular
methods of image retrieval could fail. This paper develops an algorithm called
Video Logo Retrieval (VLR), which is an image-to-video retrieval algorithm
based on the spatial distribution of local image descriptors that measure the
distance between the query image (the logo) and a collection of video images.
VLR uses local features to overcome the weakness of global feature-based models
such as convolutional neural networks (CNN). Meanwhile, VLR is flexible and
does not require training after setting some hyper-parameters. The performance
of VLR is evaluated on two challenging open benchmark tasks (SoccerNet and
Standford I2V), and compared with other state-of-the-art logo retrieval or
detection algorithms. Overall, VLR shows significantly higher accuracy compared
with the existing methods.Comment: Accepted by ICIP 20. Contact author: Bochen Guan ([email protected]
Operational Neural Networks
Feed-forward, fully-connected Artificial Neural Networks (ANNs) or the
so-called Multi-Layer Perceptrons (MLPs) are well-known universal
approximators. However, their learning performance varies significantly
depending on the function or the solution space that they attempt to
approximate. This is mainly because of their homogenous configuration based
solely on the linear neuron model. Therefore, while they learn very well those
problems with a monotonous, relatively simple and linearly separable solution
space, they may entirely fail to do so when the solution space is highly
nonlinear and complex. Sharing the same linear neuron model with two additional
constraints (local connections and weight sharing), this is also true for the
conventional Convolutional Neural Networks (CNNs) and, it is, therefore, not
surprising that in many challenging problems only the deep CNNs with a massive
complexity and depth can achieve the required diversity and the learning
performance. In order to address this drawback and also to accomplish a more
generalized model over the convolutional neurons, this study proposes a novel
network model, called Operational Neural Networks (ONNs), which can be
heterogeneous and encapsulate neurons with any set of operators to boost
diversity and to learn highly complex and multi-modal functions or spaces with
minimal network complexity and training data. Finally, a novel training method
is formulated to back-propagate the error through the operational layers of
ONNs. Experimental results over highly challenging problems demonstrate the
superior learning capabilities of ONNs even with few neurons and hidden layers.Comment: 21 page
Fine-tuning U-net for medical image segmentation based on activation function, optimizer and pooling layer
U-net convolutional neural network (CNN) is a famous architecture developed to deal with medical images. Fine-tuning CNNs is a common technique used to enhance their performance by selecting the building blocks which can provide the ultimate results. This paper introduces a method for tuning U-net architecture to improve its performance in medical image segmentation. The experiment is conducted using an x-ray image segmentation approach. The performance of U-net CNN in lung x-ray image segmentation is studied with different activation functions, optimizers, and pooling-bottleneck-layers. The analysis focuses on creating a method that can be applied for tuning U-net, like CNNs. It also provides the best activation function, optimizer, and pooling layer to enhance U-net CNN’s performance on x-ray image segmentation. The findings of this research showed that a U-net architecture worked supremely when we used the LeakyReLU activation function and average pooling layer as well as RMSProb optimizer. The U-net model accuracy is raised from 89.59 to 93.81% when trained and tested with lung x-ray images and uses the LeakyReLU activation function, average pooling layer, and RMSProb optimizer. The fine-tuned model also enhanced accuracy results with three other datasets
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
- …