380 research outputs found
Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis
We propose a novel Edge guided Generative Adversarial Network (EdgeGAN) for
photo-realistic image synthesis from semantic layouts. Although considerable
improvement has been achieved, the quality of synthesized images is far from
satisfactory due to two largely unresolved challenges. First, the semantic
labels do not provide detailed structural information, making it difficult to
synthesize local details and structures. Second, the widely adopted CNN
operations such as convolution, down-sampling and normalization usually cause
spatial resolution loss and thus are unable to fully preserve the original
semantic information, leading to semantically inconsistent results (e.g.,
missing small objects). To tackle the first challenge, we propose to use the
edge as an intermediate representation which is further adopted to guide image
generation via a proposed attention guided edge transfer module. Edge
information is produced by a convolutional generator and introduces detailed
structure information. Further, to preserve the semantic information, we design
an effective module to selectively highlight class-dependent feature maps
according to the original semantic layout. Extensive experiments on two
challenging datasets show that the proposed EdgeGAN can generate significantly
better results than state-of-the-art methods. The source code and trained
models are available at https://github.com/Ha0Tang/EdgeGAN.Comment: 40 pages, 29 figure
Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation
In this paper, we address the task of semantic-guided scene generation. One
open challenge in scene generation is the difficulty of the generation of small
objects and detailed local texture, which has been widely observed in global
image-level generation methods. To tackle this issue, in this work we consider
learning the scene generation in a local context, and correspondingly design a
local class-specific generative network with semantic maps as a guidance, which
separately constructs and learns sub-generators concentrating on the generation
of different classes, and is able to provide more scene details. To learn more
discriminative class-specific feature representations for the local generation,
a novel classification module is also proposed. To combine the advantage of
both the global image-level and the local class-specific generation, a joint
generation network is designed with an attention fusion module and a
dual-discriminator structure embedded. Extensive experiments on two scene image
generation tasks show superior generation performance of the proposed model.
The state-of-the-art results are established by large margins on both tasks and
on challenging public benchmarks. The source code and trained models are
available at https://github.com/Ha0Tang/LGGAN.Comment: Accepted to CVPR 2020, camera ready (10 pages) + supplementary (18
pages
ELVIS: Entertainment-led video summaries
© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Multimedia Computing, Communications, and Applications, 6(3): Article no. 17 (2010) http://doi.acm.org/10.1145/1823746.1823751Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative
Recommended from our members
Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization
This work addresses the problem of anonymizing the identity of faces in a dataset of images, such that the privacy of those depicted is not violated, while at the same time the dataset is useful for downstream task such as for training machine learning models. To the best of our knowledge, we are the first to explicitly address this issue and deal with two major drawbacks of the existing state-of-the-art approaches, namely that they (i) require the costly training of additional, purpose-trained neural networks, and/or (ii) fail to retain the facial attributes of the original images in the anonymized counterparts, the preservation of which is of paramount importance for their use in downstream tasks. We accordingly present a task-agnostic anonymization procedure that directly optimizes the images' latent representation in the latent space of a pretrained GAN. By optimizing the latent codes directly, we ensure both that the identity is of a desired distance away from the original (with an identity obfuscation loss), whilst preserving the facial attributes (using a novel feature-matching loss in FaRL's [48] deep feature space). We demonstrate through a series of both qualitative and quantitative experiments that our method is capable of anonymizing the identity of the images whilst-crucially-better-preserving the facial attributes. We make the code and the pretrained models publicly available at: https://github.com/chi0tzp/FALCO
Microscopic shell-model description of the exotic nucleus ^{16}C
The structure of the neutron-rich carbon nucleus ^{16}C is described by
introducing a new microscopic shell model of no-core type. The model space is
composed of the 0s, 0p, 1s0d, and 1p0f shells. The effective interaction is
microscopically derived from the CD-Bonn potential and the Coulomb force
through a unitary transformation theory. Calculated low-lying energy levels of
^{16}C agree well with the experiment. The B(E2;2_{1}^{+} \to 0_{1}^{+}) value
is calculated with the bare charges. The anomalously hindered B(E2) value for
^{16}C, measured recently, is well reproduced.Comment: 14 pages, 4 figures, considerable results and discussion are added,
but the main conclusion is unchanged, accepted for publication in Phys. Lett.
Human-centered Computing: Toward a Human Revolution
Human-centered computing studies the design, development, and deployment of mixed-initiative human-computer systems. HCC is emerging from the convergence of multiple disciplines that are concerned both with understanding human beings and with the design of computational artifacts
What does touch tell us about emotions in touchscreen-based gameplay?
This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2012 ACM. It is posted here by permission of ACM for your personal use. Not for redistribution.Nowadays, more and more people play games on touch-screen mobile phones. This phenomenon raises a very interesting question: does touch behaviour reflect the player’s emotional state? If possible, this would not only be a valuable evaluation indicator for game designers, but also for real-time personalization of the game experience. Psychology studies on acted touch behaviour show the existence of discriminative affective profiles. In this paper, finger-stroke features during gameplay on an iPod were extracted and their discriminative power analysed. Based on touch-behaviour, machine learning algorithms were used to build systems for automatically discriminating between four emotional states (Excited, Relaxed, Frustrated, Bored), two levels of arousal and two levels of valence. The results were very interesting reaching between 69% and 77% of correct discrimination between the four emotional states. Higher results (~89%) were obtained for discriminating between two levels of arousal and two levels of valence
AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks
State-of-the-art methods in the unpaired image-to-image translation are
capable of learning a mapping from a source domain to a target domain with
unpaired image data. Though the existing methods have achieved promising
results, they still produce unsatisfied artifacts, being able to convert
low-level information while limited in transforming high-level semantics of
input images. One possible reason is that generators do not have the ability to
perceive the most discriminative semantic parts between the source and target
domains, thus making the generated images low quality. In this paper, we
propose a new Attention-Guided Generative Adversarial Networks (AttentionGAN)
for the unpaired image-to-image translation task. AttentionGAN can identify the
most discriminative semantic objects and minimize changes of unwanted parts for
semantic manipulation problems without using extra data and models. The
attention-guided generators in AttentionGAN are able to produce attention masks
via a built-in attention mechanism, and then fuse the generation output with
the attention masks to obtain high-quality target images. Accordingly, we also
design a novel attention-guided discriminator which only considers attended
regions. Extensive experiments are conducted on several generative tasks,
demonstrating that the proposed model is effective to generate sharper and more
realistic images compared with existing competitive models. The source code for
the proposed AttentionGAN is available at
https://github.com/Ha0Tang/AttentionGAN.Comment: An extended version of a paper published in IJCNN2019. arXiv admin
note: substantial text overlap with arXiv:1903.1229
- …