2,164 research outputs found
Detecting cover songs with pitch class key-invariant networks
Deep Learning (DL) has recently been applied successfully to the task of Cover Song Identification (CSI). Meanwhile, neural networks that consider music signal data structure in their design have been developed. In this paper, we propose a Pitch Class Key-Invariant Network, PiCKINet, for CSI. Like some other CSI networks, PiCKINet inputs a Constant-Q Transform (CQT) pitch feature. Unlike other such networks, large multi-octave kernels produce a latent representation with pitch class dimensions that are maintained throughout PiCKINet by key-invariant convolutions. PiCKINet is seen to be more effective, and efficient, than other CQT-based networks. We also propose an extended variant, PiCKINet+, that employs a centre loss penalty, squeeze and excite units, and octave swapping data augmentation. PiCKINet+ shows an improvement of ~17% MAP relative to the well-known CQTNet when tested on a set of ~16K tracks
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Domain Generalization in Vision: A Survey
Generalization to out-of-distribution (OOD) data is a capability natural to
humans yet challenging for machines to reproduce. This is because most learning
algorithms strongly rely on the i.i.d.~assumption on source/target data, which
is often violated in practice due to domain shift. Domain generalization (DG)
aims to achieve OOD generalization by using only source data for model
learning. Since first introduced in 2011, research in DG has made great
progresses. In particular, intensive research in this topic has led to a broad
spectrum of methodologies, e.g., those based on domain alignment,
meta-learning, data augmentation, or ensemble learning, just to name a few; and
has covered various vision applications such as object recognition,
segmentation, action recognition, and person re-identification. In this paper,
for the first time a comprehensive literature review is provided to summarize
the developments in DG for computer vision over the past decade. Specifically,
we first cover the background by formally defining DG and relating it to other
research fields like domain adaptation and transfer learning. Second, we
conduct a thorough review into existing methods and present a categorization
based on their methodologies and motivations. Finally, we conclude this survey
with insights and discussions on future research directions.Comment: v4: includes the word "vision" in the title; improves the
organization and clarity in Section 2-3; adds future directions; and mor
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
- …