8 research outputs found
The Perception-Distortion Tradeoff
Image restoration algorithms are typically evaluated by some distortion
measure (e.g. PSNR, SSIM, IFC, VIF) or by human opinion scores that quantify
perceived perceptual quality. In this paper, we prove mathematically that
distortion and perceptual quality are at odds with each other. Specifically, we
study the optimal probability for correctly discriminating the outputs of an
image restoration algorithm from real images. We show that as the mean
distortion decreases, this probability must increase (indicating worse
perceptual quality). As opposed to the common belief, this result holds true
for any distortion measure, and is not only a problem of the PSNR or SSIM
criteria. We also show that generative-adversarial-nets (GANs) provide a
principled way to approach the perception-distortion bound. This constitutes
theoretical support to their observed success in low-level vision tasks. Based
on our analysis, we propose a new methodology for evaluating image restoration
methods, and use it to perform an extensive comparison between recent
super-resolution algorithms.Comment: CVPR 2018 (long oral presentation), see talk at:
https://youtu.be/_aXbGqdEkjk?t=39m43
Supplement 1: Nanoscale shaping and focusing of visible light in planar metal–oxide–silicon waveguides
Supplemental-document Originally published in Optica on 20 December 2015 (optica-2-12-1045
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Accurate recognition of specific categories, such as persons' names, dates or
other identifiers is critical in many Automatic Speech Recognition (ASR)
applications. As these categories represent personal information, ethical use
of this data including collection, transcription, training and evaluation
demands special care. One way of ensuring the security and privacy of
individuals is to redact or eliminate Personally Identifiable Information (PII)
from collection altogether. However, this results in ASR models that tend to
have lower recognition accuracy of these categories. We use text-injection to
improve the recognition of PII categories by including fake textual substitutes
of PII categories in the training data using a text injection method. We
demonstrate substantial improvement to Recall of Names and Dates in medical
notes while improving overall WER. For alphanumeric digit sequences we show
improvements to Character Error Rate and Sentence Accuracy.Comment: Accepted to Interspeech 202