2,571 research outputs found
A survey of comics research in computer science
Graphical novels such as comics and mangas are well known all over the world.
The digital transition started to change the way people are reading comics,
more and more on smartphones and tablets and less and less on paper. In the
recent years, a wide variety of research about comics has been proposed and
might change the way comics are created, distributed and read in future years.
Early work focuses on low level document image analysis: indeed comic books are
complex, they contains text, drawings, balloon, panels, onomatopoeia, etc.
Different fields of computer science covered research about user interaction
and content generation such as multimedia, artificial intelligence,
human-computer interaction, etc. with different sets of values. We propose in
this paper to review the previous research about comics in computer science, to
state what have been done and to give some insights about the main outlooks
SketchBodyNet: A Sketch-Driven Multi-faceted Decoder Network for 3D Human Reconstruction
Reconstructing 3D human shapes from 2D images has received increasing
attention recently due to its fundamental support for many high-level 3D
applications. Compared with natural images, freehand sketches are much more
flexible to depict various shapes, providing a high potential and valuable way
for 3D human reconstruction. However, such a task is highly challenging. The
sparse abstract characteristics of sketches add severe difficulties, such as
arbitrariness, inaccuracy, and lacking image details, to the already badly
ill-posed problem of 2D-to-3D reconstruction. Although current methods have
achieved great success in reconstructing 3D human bodies from a single-view
image, they do not work well on freehand sketches. In this paper, we propose a
novel sketch-driven multi-faceted decoder network termed SketchBodyNet to
address this task. Specifically, the network consists of a backbone and three
separate attention decoder branches, where a multi-head self-attention module
is exploited in each decoder to obtain enhanced features, followed by a
multi-layer perceptron. The multi-faceted decoders aim to predict the camera,
shape, and pose parameters, respectively, which are then associated with the
SMPL model to reconstruct the corresponding 3D human mesh. In learning,
existing 3D meshes are projected via the camera parameters into 2D synthetic
sketches with joints, which are combined with the freehand sketches to optimize
the model. To verify our method, we collect a large-scale dataset of about 26k
freehand sketches and their corresponding 3D meshes containing various poses of
human bodies from 14 different angles. Extensive experimental results
demonstrate our SketchBodyNet achieves superior performance in reconstructing
3D human meshes from freehand sketches.Comment: 9 pages, to appear in Pacific Graphics 202
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
A Survey for Graphic Design Intelligence
Graphic design is an effective language for visual communication. Using
complex composition of visual elements (e.g., shape, color, font) guided by
design principles and aesthetics, design helps produce more visually-appealing
content. The creation of a harmonious design requires carefully selecting and
combining different visual elements, which can be challenging and
time-consuming. To expedite the design process, emerging AI techniques have
been proposed to automatize tedious tasks and facilitate human creativity.
However, most current works only focus on specific tasks targeting at different
scenarios without a high-level abstraction. This paper aims to provide a
systematic overview of graphic design intelligence and summarize literature in
the taxonomy of representation, understanding and generation. Specifically we
consider related works for individual visual elements as well as the overall
design composition. Furthermore, we highlight some of the potential directions
for future explorations.Comment: 10 pages, 2 figure
Browse-to-search
This demonstration presents a novel interactive online shopping application based on visual search technologies. When users want to buy something on a shopping site, they usually have the requirement of looking for related information from other web sites. Therefore users need to switch between the web page being browsed and other websites that provide search results. The proposed application enables users to naturally search products of interest when they browse a web page, and make their even causal purchase intent easily satisfied. The interactive shopping experience is characterized by: 1) in session - it allows users to specify the purchase intent in the browsing session, instead of leaving the current page and navigating to other websites; 2) in context - -the browsed web page provides implicit context information which helps infer user purchase preferences; 3) in focus - users easily specify their search interest using gesture on touch devices and do not need to formulate queries in search box; 4) natural-gesture inputs and visual-based search provides users a natural shopping experience. The system is evaluated against a data set consisting of several millions commercial product images. © 2012 Authors
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
AI-generated Content for Various Data Modalities: A Survey
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D
assets, and other media using AI algorithms. Due to its wide range of
applications and the demonstrated potential of recent works, AIGC developments
have been attracting lots of attention recently, and AIGC methods have been
developed for various data modalities, such as image, video, text, 3D shape (as
voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human
avatar (body and head), 3D motion, and audio -- each presenting different
characteristics and challenges. Furthermore, there have also been many
significant developments in cross-modality AIGC methods, where generative
methods can receive conditioning input in one modality and produce outputs in
another. Examples include going from various modalities to image, video, 3D
shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar),
and audio modalities. In this paper, we provide a comprehensive review of AIGC
methods across different data modalities, including both single-modality and
cross-modality methods, highlighting the various challenges, representative
works, and recent technical directions in each setting. We also survey the
representative datasets throughout the modalities, and present comparative
results for various modalities. Moreover, we also discuss the challenges and
potential future research directions
- …