1,438 research outputs found
Laplacian-Steered Neural Style Transfer
Neural Style Transfer based on Convolutional Neural Networks (CNN) aims to
synthesize a new image that retains the high-level structure of a content
image, rendered in the low-level texture of a style image. This is achieved by
constraining the new image to have high-level CNN features similar to the
content image, and lower-level CNN features similar to the style image. However
in the traditional optimization objective, low-level features of the content
image are absent, and the low-level features of the style image dominate the
low-level detail structures of the new image. Hence in the synthesized image,
many details of the content image are lost, and a lot of inconsistent and
unpleasing artifacts appear. As a remedy, we propose to steer image synthesis
with a novel loss function: the Laplacian loss. The Laplacian matrix
("Laplacian" in short), produced by a Laplacian operator, is widely used in
computer vision to detect edges and contours. The Laplacian loss measures the
difference of the Laplacians, and correspondingly the difference of the detail
structures, between the content image and a new image. It is flexible and
compatible with the traditional style transfer constraints. By incorporating
the Laplacian loss, we obtain a new optimization objective for neural style
transfer named Lapstyle. Minimizing this objective will produce a stylized
image that better preserves the detail structures of the content image and
eliminates the artifacts. Experiments show that Lapstyle produces more
appealing stylized images with less artifacts, without compromising their
"stylishness".Comment: Accepted by the ACM Multimedia Conference (MM) 2017. 9 pages, 65
figure
Variational Deep Semantic Hashing for Text Documents
As the amount of textual data has been rapidly increasing over the past
decade, efficient similarity search methods have become a crucial component of
large-scale information retrieval systems. A popular strategy is to represent
original data samples by compact binary codes through hashing. A spectrum of
machine learning methods have been utilized, but they often lack expressiveness
and flexibility in modeling to learn effective representations. The recent
advances of deep learning in a wide range of applications has demonstrated its
capability to learn robust and powerful feature representations for complex
data. Especially, deep generative models naturally combine the expressiveness
of probabilistic generative models with the high capacity of deep neural
networks, which is very suitable for text modeling. However, little work has
leveraged the recent progress in deep learning for text hashing.
In this paper, we propose a series of novel deep document generative models
for text hashing. The first proposed model is unsupervised while the second one
is supervised by utilizing document labels/tags for hashing. The third model
further considers document-specific factors that affect the generation of
words. The probabilistic generative formulation of the proposed models provides
a principled framework for model extension, uncertainty estimation, simulation,
and interpretability. Based on variational inference and reparameterization,
the proposed models can be interpreted as encoder-decoder deep neural networks
and thus they are capable of learning complex nonlinear distributed
representations of the original documents. We conduct a comprehensive set of
experiments on four public testbeds. The experimental results have demonstrated
the effectiveness of the proposed supervised learning models for text hashing.Comment: 11 pages, 4 figure
Predicting Audio Advertisement Quality
Online audio advertising is a particular form of advertising used abundantly
in online music streaming services. In these platforms, which tend to host tens
of thousands of unique audio advertisements (ads), providing high quality ads
ensures a better user experience and results in longer user engagement.
Therefore, the automatic assessment of these ads is an important step toward
audio ads ranking and better audio ads creation. In this paper we propose one
way to measure the quality of the audio ads using a proxy metric called Long
Click Rate (LCR), which is defined by the amount of time a user engages with
the follow-up display ad (that is shown while the audio ad is playing) divided
by the impressions. We later focus on predicting the audio ad quality using
only acoustic features such as harmony, rhythm, and timbre of the audio,
extracted from the raw waveform. We discuss how the characteristics of the
sound can be connected to concepts such as the clarity of the audio ad message,
its trustworthiness, etc. Finally, we propose a new deep learning model for
audio ad quality prediction, which outperforms the other discussed models
trained on hand-crafted features. To the best of our knowledge, this is the
first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on
Web Search and Data Mining, 9 page
Learning the Structure of Auto-Encoding Recommenders
Autoencoder recommenders have recently shown state-of-the-art performance in
the recommendation task due to their ability to model non-linear item
relationships effectively. However, existing autoencoder recommenders use
fully-connected neural network layers and do not employ structure learning.
This can lead to inefficient training, especially when the data is sparse as
commonly found in collaborative filtering. The aforementioned results in lower
generalization ability and reduced performance. In this paper, we introduce
structure learning for autoencoder recommenders by taking advantage of the
inherent item groups present in the collaborative filtering domain. Due to the
nature of items in general, we know that certain items are more related to each
other than to other items. Based on this, we propose a method that first learns
groups of related items and then uses this information to determine the
connectivity structure of an auto-encoding neural network. This results in a
network that is sparsely connected. This sparse structure can be viewed as a
prior that guides the network training. Empirically we demonstrate that the
proposed structure learning enables the autoencoder to converge to a local
optimum with a much smaller spectral norm and generalization error bound than
the fully-connected network. The resultant sparse network considerably
outperforms the state-of-the-art methods like \textsc{Mult-vae/Mult-dae} on
multiple benchmarked datasets even when the same number of parameters and flops
are used. It also has a better cold-start performance.Comment: Proceedings of The Web Conference 202
Real-time feedback to reduce low-back load in lifting and lowering
Low-back pain (LBP) is a common health problem. Literature indicates an exposure-response relation between work-related lifting and LBP. Therefore, this study investigated effects of three kinds of real-time feedback on low-back load, quantified as lumbar moments, during lifting. We recruited 97 healthy male and female participants without a recent history of LBP and without prior biomechanical knowledge on lifting. Participants were assigned to groups based on the time of enrollment, filling the four groups in the following order: moment feedback, trunk inclination angle feedback, lumbar flexion feedback, and a control group not receiving feedback. Feedback was given by a sound when a threshold level of the input variable was exceeded. Participants were unaware of the input variable for the feedback, but were instructed to try to avoid the audio feedback by changing their lifting strategy. The groups with feedback were able to reduce the audio feedback and thus changed the input variable towards a more desired level. Lumbar moments significantly decreased over trials in the inclination and moment feedback groups, remained similar in the lumbar flexion group and increased in the control group. Between group comparisons revealed that low-back load was significantly lower in the moment and inclination groups compared to the control group. Additionally, moments were lower in the inclination group than in the lumbar flexion group. Real-time feedback on moments or trunk inclination is a promising tool to reduce low-back load during lifting and lowering
SchNet - a deep learning architecture for molecules and materials
Deep learning has led to a paradigm shift in artificial intelligence,
including web, text and image search, speech recognition, as well as
bioinformatics, with growing impact in chemical physics. Machine learning in
general and deep learning in particular is ideally suited for representing
quantum-mechanical interactions, enabling to model nonlinear potential-energy
surfaces or enhancing the exploration of chemical compound space. Here we
present the deep learning architecture SchNet that is specifically designed to
model atomistic systems by making use of continuous-filter convolutional
layers. We demonstrate the capabilities of SchNet by accurately predicting a
range of properties across chemical space for \emph{molecules and materials}
where our model learns chemically plausible embeddings of atom types across the
periodic table. Finally, we employ SchNet to predict potential-energy surfaces
and energy-conserving force fields for molecular dynamics simulations of small
molecules and perform an exemplary study of the quantum-mechanical properties
of C-fullerene that would have been infeasible with regular ab initio
molecular dynamics
- …