282,679 research outputs found
What value do explicit high level concepts have in vision to language problems?
Much of the recent progress in Vision-to-Language (V2L) problems has been
achieved through a combination of Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs). This approach does not explicitly represent
high-level semantic concepts, but rather seeks to progress directly from image
features to text. We propose here a method of incorporating high-level concepts
into the very successful CNN-RNN approach, and show that it achieves a
significant improvement on the state-of-the-art performance in both image
captioning and visual question answering. We also show that the same mechanism
can be used to introduce external semantic information and that doing so
further improves performance. In doing so we provide an analysis of the value
of high level semantic information in V2L problems.Comment: Accepted to IEEE Conf. Computer Vision and Pattern Recognition 2016.
Fixed titl
Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy
Generative adversarial networks (GANs) have been extensively studied in the
past few years. Arguably their most significant impact has been in the area of
computer vision where great advances have been made in challenges such as
plausible image generation, image-to-image translation, facial attribute
manipulation and similar domains. Despite the significant successes achieved to
date, applying GANs to real-world problems still poses significant challenges,
three of which we focus on here. These are: (1) the generation of high quality
images, (2) diversity of image generation, and (3) stable training. Focusing on
the degree to which popular GAN technologies have made progress against these
challenges, we provide a detailed review of the state of the art in GAN-related
research in the published scientific literature. We further structure this
review through a convenient taxonomy we have adopted based on variations in GAN
architectures and loss functions. While several reviews for GANs have been
presented to date, none have considered the status of this field based on their
progress towards addressing practical challenges relevant to computer vision.
Accordingly, we review and critically discuss the most popular
architecture-variant, and loss-variant GANs, for tackling these challenges. Our
objective is to provide an overview as well as a critical analysis of the
status of GAN research in terms of relevant progress towards important computer
vision application requirements. As we do this we also discuss the most
compelling applications in computer vision in which GANs have demonstrated
considerable success along with some suggestions for future research
directions. Code related to GAN-variants studied in this work is summarized on
https://github.com/sheqi/GAN_Review.Comment: Accepted by ACM Computing Surveys, 23 November 202
Discrete Visual Perception
International audienceComputational vision and biomedical image have made tremendous progress of the past decade. This is mostly due the development of efficient learning and inference algorithms which allow better, faster and richer modeling of visual perception tasks. Graph-based representations are among the most prominent tools to address such perception through the casting of perception as a graph optimization problem. In this paper, we briefly introduce the interest of such representations, discuss their strength and limitations and present their application to address a variety of problems in computer vision and biomedical image analysis
Deep Learning based 3D Segmentation: A Survey
3D object segmentation is a fundamental and challenging problem in computer
vision with applications in autonomous driving, robotics, augmented reality and
medical image analysis. It has received significant attention from the computer
vision, graphics and machine learning communities. Traditionally, 3D
segmentation was performed with hand-crafted features and engineered methods
which failed to achieve acceptable accuracy and could not generalize to
large-scale data. Driven by their great success in 2D computer vision, deep
learning techniques have recently become the tool of choice for 3D segmentation
tasks as well. This has led to an influx of a large number of methods in the
literature that have been evaluated on different benchmark datasets. This paper
provides a comprehensive survey of recent progress in deep learning based 3D
segmentation covering over 150 papers. It summarizes the most commonly used
pipelines, discusses their highlights and shortcomings, and analyzes the
competitive results of these segmentation methods. Based on the analysis, it
also provides promising research directions for the future.Comment: Under review of ACM Computing Surveys, 36 pages, 10 tables, 9 figure
Probing Convolutional Neural Networks for Event Reconstruction in {\gamma}-Ray Astronomy with Cherenkov Telescopes
A dramatic progress in the field of computer vision has been made in recent
years by applying deep learning techniques. State-of-the-art performance in
image recognition is thereby reached with Convolutional Neural Networks (CNNs).
CNNs are a powerful class of artificial neural networks, characterized by
requiring fewer connections and free parameters than traditional neural
networks and exploiting spatial symmetries in the input data. Moreover, CNNs
have the ability to automatically extract general characteristic features from
data sets and create abstract data representations which can perform very
robust predictions. This suggests that experiments using Cherenkov telescopes
could harness these powerful machine learning algorithms to improve the
analysis of particle-induced air-showers, where the properties of primary
shower particles are reconstructed from shower images recorded by the
telescopes. In this work, we present initial results of a CNN-based analysis
for background rejection and shower reconstruction, utilizing simulation data
from the H.E.S.S. experiment. We concentrate on supervised training methods and
outline the influence of image sampling on the performance of the CNN-model
predictions.Comment: 8 pages, 4 figures, Proceedings of the 35th International Cosmic Ray
Conference (ICRC 2017), Busan, Kore
Semantic-Aware Image Analysis
Extracting and utilizing high-level semantic information from images is one of the important goals of computer vision. The ultimate objective of image analysis is to be able to understand each pixel of an image with regard to high-level semantics, e.g. the objects, the stuff, and their spatial, functional and semantic relations. In recent years, thanks to large labeled datasets and deep learning, great progress has been made to solve image analysis problems, such as image classification, object detection, and object pose estimation. In this work, we explore several aspects of semantic-aware image analysis. First, we explore semantic segmentation of man-made scenes using fully connected conditional random fields which can model long-range connections within the image of man-made scenes and make use of contextual information of scene structures. Second, we introduce a semantic smoothing method by exploiting the semantic information to accomplish semantic structure-preserving image smoothing. Semantic segmentation has achieved significant progress recently and has been widely used in many computer vision tasks. We observe that high-level semantic image labeling information can provide a meaningful structure prior to image smoothing naturally. Third, we present a deep object co-segmentation approach for segmenting common objects of the same class within a pair of images. To address this task, we propose a CNN-based Siamese encoder-decoder architecture. The encoder extracts high-level semantic features of the foreground objects, a mutual correlation layer detects the common objects, and finally, the decoder generates the output foreground masks for each image. Finally, we propose an approach to localize common objects from novel object categories in a set of images. We solve this problem using a new common component activation map in which we treat the class-specific activation maps as components to discover the common components in the image set. We show that our approach can generalize on novel object categories in our experiments
- …