180 research outputs found
Sound Transformation: Applying Image Neural Style Transfer Networks to Audio Spectrograms
Image style transfer networks are used to blend images, producing images that are a mix of source images. The process is based on controlled extraction of style and content aspects of images, using pre-trained Convolutional Neural Networks (CNNs). Our interest lies in adopting these image style transfer networks for the purpose of transforming sounds. Audio signals can be presented as grey-scale images of audio spectrograms. The purpose of our work is to investigate whether audio spectrogram inputs can be used with image neural transfer networks to produce new sounds. Using musical instrument sounds as source sounds, we apply and compare three existing image neural style transfer networks for the task of sound mixing. Our evaluation shows that all three networks are successful in producing consistent, new sounds based on the two source sounds. We use classification models to demonstrate that the new audio signals are consistent and distinguishable from the source instrument sounds. We further apply t-SNE cluster visualisation to visualise the feature maps of the new sounds and original source sounds, confirming that they form different sound groups from the source sounds. Our work paves the way to using CNNs for creative and targeted production of new sounds from source sounds, with specified source qualities, including pitch and timbre
Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation
Domain adaptation (DA) is the topical problem of adapting models from
labelled source datasets so that they perform well on target datasets where
only unlabelled or partially labelled data is available. Many methods have been
proposed to address this problem through different ways to minimise the domain
shift between source and target datasets. In this paper we take an orthogonal
perspective and propose a framework to further enhance performance by
meta-learning the initial conditions of existing DA algorithms. This is
challenging compared to the more widely considered setting of few-shot
meta-learning, due to the length of the computation graph involved. Therefore
we propose an online shortest-path meta-learning framework that is both
computationally tractable and practically effective for improving DA
performance. We present variants for both multi-source unsupervised domain
adaptation (MSDA), and semi-supervised domain adaptation (SSDA). Importantly,
our approach is agnostic to the base adaptation algorithm, and can be applied
to improve many techniques. Experimentally, we demonstrate improvements on
classic (DANN) and recent (MCD and MME) techniques for MSDA and SSDA, and
ultimately achieve state of the art results on several DA benchmarks including
the largest scale DomainNet.Comment: ECCV 2020 CR versio
Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti
Machine learning techniques are presented for automatic recognition of the
historical letters (XI-XVIII centuries) carved on the stoned walls of St.Sophia
cathedral in Kyiv (Ukraine). A new image dataset of these carved Glagolitic and
Cyrillic letters (CGCL) was assembled and pre-processed for recognition and
prediction by machine learning methods. The dataset consists of more than 4000
images for 34 types of letters. The explanatory data analysis of CGCL and
notMNIST datasets shown that the carved letters can hardly be differentiated by
dimensionality reduction methods, for example, by t-distributed stochastic
neighbor embedding (tSNE) due to the worse letter representation by stone
carving in comparison to hand writing. The multinomial logistic regression
(MLR) and a 2D convolutional neural network (CNN) models were applied. The MLR
model demonstrated the area under curve (AUC) values for receiver operating
characteristic (ROC) are not lower than 0.92 and 0.60 for notMNIST and CGCL,
respectively. The CNN model gave AUC values close to 0.99 for both notMNIST and
CGCL (despite the much smaller size and quality of CGCL in comparison to
notMNIST) under condition of the high lossy data augmentation. CGCL dataset was
published to be available for the data science community as an open source
resource.Comment: 11 pages, 9 figures, accepted for 25th International Conference on
Neural Information Processing (ICONIP 2018), 14-16 December, 2018 (Siem Reap,
Cambodia
SNE: Signed Network Embedding
Several network embedding models have been developed for unsigned networks.
However, these models based on skip-gram cannot be applied to signed networks
because they can only deal with one type of link. In this paper, we present our
signed network embedding model called SNE. Our SNE adopts the log-bilinear
model, uses node representations of all nodes along a given path, and further
incorporates two signed-type vectors to capture the positive or negative
relationship of each edge along the path. We conduct two experiments, node
classification and link prediction, on both directed and undirected signed
networks and compare with four baselines including a matrix factorization
method and three state-of-the-art unsigned network embedding models. The
experimental results demonstrate the effectiveness of our signed network
embedding.Comment: To appear in PAKDD 201
Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition
Aerial scene recognition is a fundamental task in remote sensing and has
recently received increased interest. While the visual information from
overhead images with powerful models and efficient algorithms yields
considerable performance on scene recognition, it still suffers from the
variation of ground objects, lighting conditions etc. Inspired by the
multi-channel perception theory in cognition science, in this paper, for
improving the performance on the aerial scene recognition, we explore a novel
audiovisual aerial scene recognition task using both images and sounds as
input. Based on an observation that some specific sound events are more likely
to be heard at a given geographic location, we propose to exploit the knowledge
from the sound events to improve the performance on the aerial scene
recognition. For this purpose, we have constructed a new dataset named AuDio
Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this
dataset, we evaluate three proposed approaches for transferring the sound event
knowledge to the aerial scene recognition task in a multimodal learning
framework, and show the benefit of exploiting the audio information for the
aerial scene recognition. The source code is publicly available for
reproducibility purposes.Comment: ECCV 202
Learning to Generate Novel Domains for Domain Generalization
This paper focuses on domain generalization (DG), the task of learning from
multiple source domains a model that generalizes well to unseen domains. A main
challenge for DG is that the available source domains often exhibit limited
diversity, hampering the model's ability to learn to generalize. We therefore
employ a data generator to synthesize data from pseudo-novel domains to augment
the source domains. This explicitly increases the diversity of available
training domains and leads to a more generalizable model. To train the
generator, we model the distribution divergence between source and synthesized
pseudo-novel domains using optimal transport, and maximize the divergence. To
ensure that semantics are preserved in the synthesized data, we further impose
cycle-consistency and classification losses on the generator. Our method,
L2A-OT (Learning to Augment by Optimal Transport) outperforms current
state-of-the-art DG methods on four benchmark datasets.Comment: To appear in ECCV'2
Relationship between conservation biology and ecology shown through machine reading of 32,000 articles
Conservation biology was founded on the idea that efforts to save nature depend on a scientific understanding of how it works. It sought to apply ecological principles to conservation problems. We investigated whether the relationship between these fields has changed over time through machine reading the full texts of 32,000 research articles published in 16 ecology and conservation biology journals. We examined changes in research topics in both fields and how the fields have evolved from 2000 to 2014. As conservation biology matured, its focus shifted from ecology to social and political aspects of conservation. The 2 fields diverged and now occupy distinct niches in modern science. We hypothesize this pattern resulted from increasing recognition that social, economic, and political factors are critical for successful conservation and possibly from rising skepticism about the relevance of contemporary ecological theory to practical conservation
Deep Shape Matching
We cast shape matching as metric learning with convolutional networks. We
break the end-to-end process of image representation into two parts. Firstly,
well established efficient methods are chosen to turn the images into edge
maps. Secondly, the network is trained with edge maps of landmark images, which
are automatically obtained by a structure-from-motion pipeline. The learned
representation is evaluated on a range of different tasks, providing
improvements on challenging cases of domain generalization, generic
sketch-based image retrieval or its fine-grained counterpart. In contrast to
other methods that learn a different model per task, object category, or
domain, we use the same network throughout all our experiments, achieving
state-of-the-art results in multiple benchmarks.Comment: ECCV 201
Region Graph Embedding Network for Zero-Shot Learning
© 2020, Springer Nature Switzerland AG. Most of the existing Zero-Shot Learning (ZSL) approaches learn direct embeddings from global features or image parts (regions) to the semantic space, which, however, fail to capture the appearance relationships between different local regions within a single image. In this paper, to model the relations among local image regions, we incorporate the region-based relation reasoning into ZSL. Our method, termed as Region Graph Embedding Network (RGEN), is trained end-to-end from raw image data. Specifically, RGEN consists of two branches: the Constrained Part Attention (CPA) branch and the Parts Relation Reasoning (PRR) branch. CPA branch is built upon attention and produces the image regions. To exploit the progressive interactions among these regions, we represent them as a region graph, on which the parts relation reasoning is performed with graph convolutions, thus leading to our PRR branch. To train our model, we introduce both a transfer loss and a balance loss to contrast class similarities and pursue the maximum response consistency among seen and unseen outputs, respectively. Extensive experiments on four datasets well validate the effectiveness of the proposed method under both ZSL and generalized ZSL settings
Classifying Candidate Axioms via Dimensionality Reduction Techniques
We assess the role of similarity measures and learning methods in classifying candidate axioms for automated schema induction through kernel-based learning algorithms. The evaluation is based on (i) three different similarity measures between axioms, and (ii) two alternative dimensionality reduction techniques to check the extent to which the considered similarities allow to separate true axioms from false axioms. The result of the dimensionality reduction process is subsequently fed to several learning algorithms, comparing the accuracy of all combinations of similarity, dimensionality reduction technique, and classification method. As a result, it is observed that it is not necessary to use sophisticated semantics-based similarity measures to obtain accurate predictions, and furthermore that classification performance only marginally depends on the choice of the learning method. Our results open the way to implementing efficient surrogate models for axiom scoring to speed up ontology learning and schema induction methods
- …