2,636 research outputs found
SketchBodyNet: A Sketch-Driven Multi-faceted Decoder Network for 3D Human Reconstruction
Reconstructing 3D human shapes from 2D images has received increasing
attention recently due to its fundamental support for many high-level 3D
applications. Compared with natural images, freehand sketches are much more
flexible to depict various shapes, providing a high potential and valuable way
for 3D human reconstruction. However, such a task is highly challenging. The
sparse abstract characteristics of sketches add severe difficulties, such as
arbitrariness, inaccuracy, and lacking image details, to the already badly
ill-posed problem of 2D-to-3D reconstruction. Although current methods have
achieved great success in reconstructing 3D human bodies from a single-view
image, they do not work well on freehand sketches. In this paper, we propose a
novel sketch-driven multi-faceted decoder network termed SketchBodyNet to
address this task. Specifically, the network consists of a backbone and three
separate attention decoder branches, where a multi-head self-attention module
is exploited in each decoder to obtain enhanced features, followed by a
multi-layer perceptron. The multi-faceted decoders aim to predict the camera,
shape, and pose parameters, respectively, which are then associated with the
SMPL model to reconstruct the corresponding 3D human mesh. In learning,
existing 3D meshes are projected via the camera parameters into 2D synthetic
sketches with joints, which are combined with the freehand sketches to optimize
the model. To verify our method, we collect a large-scale dataset of about 26k
freehand sketches and their corresponding 3D meshes containing various poses of
human bodies from 14 different angles. Extensive experimental results
demonstrate our SketchBodyNet achieves superior performance in reconstructing
3D human meshes from freehand sketches.Comment: 9 pages, to appear in Pacific Graphics 202
A survey of comics research in computer science
Graphical novels such as comics and mangas are well known all over the world.
The digital transition started to change the way people are reading comics,
more and more on smartphones and tablets and less and less on paper. In the
recent years, a wide variety of research about comics has been proposed and
might change the way comics are created, distributed and read in future years.
Early work focuses on low level document image analysis: indeed comic books are
complex, they contains text, drawings, balloon, panels, onomatopoeia, etc.
Different fields of computer science covered research about user interaction
and content generation such as multimedia, artificial intelligence,
human-computer interaction, etc. with different sets of values. We propose in
this paper to review the previous research about comics in computer science, to
state what have been done and to give some insights about the main outlooks
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
A Survey for Graphic Design Intelligence
Graphic design is an effective language for visual communication. Using
complex composition of visual elements (e.g., shape, color, font) guided by
design principles and aesthetics, design helps produce more visually-appealing
content. The creation of a harmonious design requires carefully selecting and
combining different visual elements, which can be challenging and
time-consuming. To expedite the design process, emerging AI techniques have
been proposed to automatize tedious tasks and facilitate human creativity.
However, most current works only focus on specific tasks targeting at different
scenarios without a high-level abstraction. This paper aims to provide a
systematic overview of graphic design intelligence and summarize literature in
the taxonomy of representation, understanding and generation. Specifically we
consider related works for individual visual elements as well as the overall
design composition. Furthermore, we highlight some of the potential directions
for future explorations.Comment: 10 pages, 2 figure
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Text-to-image models offer unprecedented freedom to guide creation through
natural language. Yet, it is unclear how such freedom can be exercised to
generate images of specific unique concepts, modify their appearance, or
compose them in new roles and novel scenes. In other words, we ask: how can we
use language-guided models to turn our cat into a painting, or imagine a new
product based on our favorite toy? Here we present a simple approach that
allows such creative freedom. Using only 3-5 images of a user-provided concept,
like an object or a style, we learn to represent it through new "words" in the
embedding space of a frozen text-to-image model. These "words" can be composed
into natural language sentences, guiding personalized creation in an intuitive
way. Notably, we find evidence that a single word embedding is sufficient for
capturing unique and varied concepts. We compare our approach to a wide range
of baselines, and demonstrate that it can more faithfully portray the concepts
across a range of applications and tasks.
Our code, data and new words will be available at:
https://textual-inversion.github.ioComment: Project page: https://textual-inversion.github.i
- …