5 research outputs found
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
Flow prediction meets flow learning : combining different learning strategies for computing the optical flow
Optical flow estimation is an important topic in computer vision. The goal is to computethe inter-frame displacement field between two consecutive frames of an image sequence. In practice, optical flow estimation plays a significant role in multiple application domains including autonomous driving and medical imaging. Different categories of methods exist for solving the optical flow problem. The most common technique is based on a variational framework, where an energy functional is designed and minimized in order to calculate the optical flow. Recently, other approaches like pipeline-based approach and learning-based approach also attract much attention. Despite the great advances achieved by these algorithms, it is still difficult to find an algorithm that can perform well under all the challenges, e.g. lightning changes, large displacements, and occlusions. Hence, it is worth combining different algorithms to create a new approach that can combine their advantages. Inspired by this idea, in this thesis we select two top-performing algorithms PWC-Net and ProFlow as candidate approaches and conduct a combination of these two algorithms. While PWC-Net performs generally well in the estimation of non-occluded areas, ProFlow can especially provide an accurate estimation for the occluded areas. Thereby, we expect that the combination of these two algorithms can yield an algorithm that performs well in both occluded and non-occluded areas. Since ProFlow is a pipeline approach, we first integrate the PWC-Net in the ProFlow pipeline, then evaluate the new created pipeline PWC-ProFlow based on the MPI Sintel and KITTI 2015 benchmarks. Contrary to our expectations, the newly created algorithm does not exceed the candidate methods PWC-Net and ProFlow on either benchmark. Through the analysis of the evaluation results, we explore the problems hidden in the PWC-ProFlow pipeline that can lead to its underperformance, and organize some modification ideas. Based on these ideas, we propose six new pipelines with the purpose of improving the estimation accuracy of PWC-ProFlow. All the new generated pipelines are also evaluated on the Sintel and KITTI benchmarks. The experiment results demonstrate that all the modifications created achieve great improvements on both datasets compared to PWC-ProFlow. Further, all of them also outperform the ProFlow pipeline on both benchmarks. Compared to PWC-Net, one modification exceeds PWC-Net on the KITTI dataset, however, all our modifications achieve a better performance on the Sintel dataset, in particular, one modification presents a significant improvement with a more than 10% lower average endpoint error on the Sintel dataset
Sparse Cost Volume for Efficient Stereo Matching
Stereo matching has been solved as a supervised learning task with convolutional neural network (CNN). However, CNN based approaches basically require huge memory use. In addition, it is still challenging to find correct correspondences between images at ill-posed dim and sensor noise regions. To solve these problems, we propose Sparse Cost Volume Net (SCV-Net) achieving high accuracy, low memory cost and fast computation. The idea of the cost volume for stereo matching was initially proposed in GC-Net. In our work, by making the cost volume compact and proposing an efficient similarity evaluation for the volume, we achieved faster stereo matching while improving the accuracy. Moreover, we propose to use weight normalization instead of commonly-used batch normalization for stereo matching tasks. This improves the robustness to not only sensor noises in images but also batch size in the training process. We evaluated our proposed network on the Scene Flow and KITTI 2015 datasets, its performance overall surpasses the GC-Net. Comparing with the GC-Net, our SCV-Net achieved to: (1) reduce 73.08 % GPU memory cost; (2) reduce 61.11 % processing time; (3) improve the 3PE from 2.87 % to 2.61 % on the KITTI 2015 dataset