297 research outputs found
Language-Based Image Editing with Recurrent Attentive Models
We investigate the problem of Language-Based Image Editing (LBIE). Given a
source image and a natural language description, we want to generate a target
image by editing the source image based on the description. We propose a
generic modeling framework for two sub-tasks of LBIE: language-based image
segmentation and image colorization. The framework uses recurrent attentive
models to fuse image and language features. Instead of using a fixed step size,
we introduce for each region of the image a termination gate to dynamically
determine after each inference step whether to continue extrapolating
additional information from the textual description. The effectiveness of the
framework is validated on three datasets. First, we introduce a synthetic
dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE
system. Second, we show that the framework leads to state-of-the-art
performance on image segmentation on the ReferIt dataset. Third, we present the
first language-based colorization result on the Oxford-102 Flowers dataset.Comment: Accepted to CVPR 2018 as a Spotligh
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
In the rapidly advancing field of multi-modal machine learning (MMML), the
convergence of multiple data modalities has the potential to reshape various
applications. This paper presents a comprehensive overview of the current
state, advancements, and challenges of MMML within the sphere of engineering
design. The review begins with a deep dive into five fundamental concepts of
MMML:multi-modal information representation, fusion, alignment, translation,
and co-learning. Following this, we explore the cutting-edge applications of
MMML, placing a particular emphasis on tasks pertinent to engineering design,
such as cross-modal synthesis, multi-modal prediction, and cross-modal
information retrieval. Through this comprehensive overview, we highlight the
inherent challenges in adopting MMML in engineering design, and proffer
potential directions for future research. To spur on the continued evolution of
MMML in engineering design, we advocate for concentrated efforts to construct
extensive multi-modal design datasets, develop effective data-driven MMML
techniques tailored to design applications, and enhance the scalability and
interpretability of MMML models. MMML models, as the next generation of
intelligent design tools, hold a promising future to impact how products are
designed
Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions
Generative Adversarial Networks (GANs) is a novel class of deep generative
models which has recently gained significant attention. GANs learns complex and
high-dimensional distributions implicitly over images, audio, and data.
However, there exists major challenges in training of GANs, i.e., mode
collapse, non-convergence and instability, due to inappropriate design of
network architecture, use of objective function and selection of optimization
algorithm. Recently, to address these challenges, several solutions for better
design and optimization of GANs have been investigated based on techniques of
re-engineered network architectures, new objective functions and alternative
optimization algorithms. To the best of our knowledge, there is no existing
survey that has particularly focused on broad and systematic developments of
these solutions. In this study, we perform a comprehensive survey of the
advancements in GANs design and optimization solutions proposed to handle GANs
challenges. We first identify key research issues within each design and
optimization technique and then propose a new taxonomy to structure solutions
by key research issues. In accordance with the taxonomy, we provide a detailed
discussion on different GANs variants proposed within each solution and their
relationships. Finally, based on the insights gained, we present the promising
research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table
- …