5,999 research outputs found
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
We present an integrated framework for using Convolutional Networks for
classification, localization and detection. We show how a multiscale and
sliding window approach can be efficiently implemented within a ConvNet. We
also introduce a novel deep learning approach to localization by learning to
predict object boundaries. Bounding boxes are then accumulated rather than
suppressed in order to increase detection confidence. We show that different
tasks can be learned simultaneously using a single shared network. This
integrated framework is the winner of the localization task of the ImageNet
Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very
competitive results for the detection and classifications tasks. In
post-competition work, we establish a new state of the art for the detection
task. Finally, we release a feature extractor from our best model called
OverFeat
Disentangled Speech Embeddings using Cross-modal Self-supervision
The objective of this paper is to learn representations of speaker identity
without access to manually annotated data. To do so, we develop a
self-supervised learning objective that exploits the natural cross-modal
synchrony between faces and audio in video. The key idea behind our approach is
to tease apart--without annotation--the representations of linguistic content
and speaker identity. We construct a two-stream architecture which: (1) shares
low-level features common to both representations; and (2) provides a natural
mechanism for explicitly disentangling these factors, offering the potential
for greater generalisation to novel combinations of content and identity and
ultimately producing speaker identity representations that are more robust. We
train our method on a large-scale audio-visual dataset of talking heads `in the
wild', and demonstrate its efficacy by evaluating the learned speaker
representations for standard speaker recognition performance.Comment: ICASSP 2020. The first three authors contributed equally to this wor
The Incremental Multiresolution Matrix Factorization Algorithm
Multiresolution analysis and matrix factorization are foundational tools in
computer vision. In this work, we study the interface between these two
distinct topics and obtain techniques to uncover hierarchical block structure
in symmetric matrices -- an important aspect in the success of many vision
problems. Our new algorithm, the incremental multiresolution matrix
factorization, uncovers such structure one feature at a time, and hence scales
well to large matrices. We describe how this multiscale analysis goes much
farther than what a direct global factorization of the data can identify. We
evaluate the efficacy of the resulting factorizations for relative leveraging
within regression tasks using medical imaging data. We also use the
factorization on representations learned by popular deep networks, providing
evidence of their ability to infer semantic relationships even when they are
not explicitly trained to do so. We show that this algorithm can be used as an
exploratory tool to improve the network architecture, and within numerous other
settings in vision.Comment: Computer Vision and Pattern Recognition (CVPR) 2017, 10 page
- …