9,353 research outputs found
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
While it is nearly effortless for humans to quickly assess the perceptual
similarity between two images, the underlying processes are thought to be quite
complex. Despite this, the most widely used perceptual metrics today, such as
PSNR and SSIM, are simple, shallow functions, and fail to account for many
nuances of human perception. Recently, the deep learning community has found
that features of the VGG network trained on ImageNet classification has been
remarkably useful as a training loss for image synthesis. But how perceptual
are these so-called "perceptual losses"? What elements are critical for their
success? To answer these questions, we introduce a new dataset of human
perceptual similarity judgments. We systematically evaluate deep features
across different architectures and tasks and compare them with classic metrics.
We find that deep features outperform all previous metrics by large margins on
our dataset. More surprisingly, this result is not restricted to
ImageNet-trained VGG features, but holds across different deep architectures
and levels of supervision (supervised, self-supervised, or even unsupervised).
Our results suggest that perceptual similarity is an emergent property shared
across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at
https://www.github.com/richzhang/PerceptualSimilarit
Nonlinear Switched-Capacitor Networks: Basic Principles and Piecewise-Linear Design
The applicability of switched-capacitor (SC) components to the design of nonlinear networks is extensively discussed in this paper. The main objective is to show that SC's can be efficiently used for designing nonlinear networks. Moreover, the design methods to be proposed here are fully compatible with general synthesis methods for nonlinear n -ports. Different circuit alternatives are given and their potentials are evaluated.Office of Naval Research (USA) N00014-76-C-0572Comisión Interministerial de Ciencia y Tecnología 0235/81Semiconductor Research Corporation (USA) 82-11-00
Tacotron: Towards End-to-End Speech Synthesis
A text-to-speech synthesis system typically consists of multiple stages, such
as a text analysis frontend, an acoustic model and an audio synthesis module.
Building these components often requires extensive domain expertise and may
contain brittle design choices. In this paper, we present Tacotron, an
end-to-end generative text-to-speech model that synthesizes speech directly
from characters. Given pairs, the model can be trained completely
from scratch with random initialization. We present several key techniques to
make the sequence-to-sequence framework perform well for this challenging task.
Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English,
outperforming a production parametric system in terms of naturalness. In
addition, since Tacotron generates speech at the frame level, it's
substantially faster than sample-level autoregressive methods.Comment: Submitted to Interspeech 2017. v2 changed paper title to be
consistent with our conference submission (no content change other than typo
fixes
- …