Search CORE

9,353 research outputs found

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Author: Efros Alexei A.
Isola Phillip
Shechtman Eli
Wang Oliver
Zhang Richard
Publication venue
Publication date: 10/04/2018
Field of study

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at https://www.github.com/richzhang/PerceptualSimilarit

arXiv.org e-Print Archive

Crossref

Nonlinear Switched-Capacitor Networks: Basic Principles and Piecewise-Linear Design

Author: Chua Leon O.
Huertas Díaz José Luis
Rodríguez Vázquez Ángel Benito
Rueda Rueda Adoración
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1985
Field of study

The applicability of switched-capacitor (SC) components to the design of nonlinear networks is extensively discussed in this paper. The main objective is to show that SC's can be efficiently used for designing nonlinear networks. Moreover, the design methods to be proposed here are fully compatible with general synthesis methods for nonlinear n -ports. Different circuit alternatives are given and their potentials are evaluated.Office of Naval Research (USA) N00014-76-C-0572Comisión Interministerial de Ciencia y Tecnología 0235/81Semiconductor Research Corporation (USA) 82-11-00

idUS. Depósito de Investigación Universidad de Sevilla

Tacotron: Towards End-to-End Speech Synthesis

Author: Agiomyrgiannakis Yannis
Bengio Samy
Chen Zhifeng
Clark Rob
Jaitly Navdeep
Le Quoc
Saurous Rif A.
Skerry-Ryan RJ
Stanton Daisy
Wang Yuxuan
Weiss Ron J.
Wu Yonghui
Xiao Ying
Yang Zongheng
Publication venue
Publication date: 06/04/2017
Field of study

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design choices. In this paper, we present Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters. Given pairs, the model can be trained completely from scratch with random initialization. We present several key techniques to make the sequence-to-sequence framework perform well for this challenging task. Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness. In addition, since Tacotron generates speech at the frame level, it's substantially faster than sample-level autoregressive methods.Comment: Submitted to Interspeech 2017. v2 changed paper title to be consistent with our conference submission (no content change other than typo fixes

arXiv.org e-Print Archive

Crossref