106 research outputs found
Manifold-Aware Deep Clustering: Maximizing Angles between Embedding Vectors Based on Regular Simplex
This paper presents a new deep clustering (DC) method called manifold-aware
DC (M-DC) that can enhance hyperspace utilization more effectively than the
original DC. The original DC has a limitation in that a pair of two speakers
has to be embedded having an orthogonal relationship due to its use of the
one-hot vector-based loss function, while our method derives a unique loss
function aimed at maximizing the target angle in the hyperspace based on the
nature of a regular simplex. Our proposed loss imposes a higher penalty than
the original DC when the speaker is assigned incorrectly. The change from DC to
M-DC can be easily achieved by rewriting just one term in the loss function of
DC, without any other modifications to the network architecture or model
parameters. As such, our method has high practicability because it does not
affect the original inference part. The experimental results show that the
proposed method improves the performances of the original DC and its expansion
method.Comment: Accepted by Interspeech 202
An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification
Although music is typically multi-label, many works have studied hierarchical
music tagging with simplified settings such as single-label data. Moreover,
there lacks a framework to describe various joint training methods under the
multi-label setting. In order to discuss the above topics, we introduce
hierarchical multi-label music instrument classification task. The task
provides a realistic setting where multi-instrument real music data is assumed.
Various hierarchical methods that jointly train a DNN are summarized and
explored in the context of the fusion of deep learning and conventional
techniques. For the effective joint training in the multi-label setting, we
propose two methods to model the connection between fine- and coarse-level
tags, where one uses rule-based grouped max-pooling, the other one uses the
attention mechanism obtained in a data-driven manner. Our evaluation reveals
that the proposed methods have advantages over the method without joint
training. In addition, the decision procedure within the proposed methods can
be interpreted by visualizing attention maps or referring to fixed rules.Comment: To appear at ICASSP 202
Heterogeneous Impacts of Grazing Animals and Vegetational Change in Japanese Native Pastures
Defoliation, defecation and trampling are the major modes whereby grazing animals give impacts on vegetation. Due to the uneven distribution, such grazing behavior can have profound effects on vegetation. For extensive grazing systems in native pastures, understandings of the plant-animal interaction are vital for adequate control of vegetation and animal conditions and sustainable use of natural resources. This paper reviews recent studies of the grazing impacts on vegetation in Japanese native pastures. Most of the studies were carried out in the Kawatabi Field Science Center (Kawatabi FSC), Tohoku University. 1.Native pastures in the Kawatabi FSC are composed of 61-155 plant species, of which cattle graze upon 26-76 species. Among these species, Miscanthus sinensis (Japanese plume-grass) was the most frequently grazed by cattle. The spatial distribution of available forage is a major factor affecting diet selection and consumption of cattle. Such selective grazing results in significant reduction of M. sinensis in native pastures. 2.Seed dispersal of plants by defecation of grazing animals can also result in significant vegetational change. Recent studies have shown that Carex spp. is the major plant whose seeds are dispersed by defecation of animals rotationally grazed in a native and a sown pasture. The mechanisms of the seed dispersal and its possible effects on vegetational succession are discussed. 3.Heavy trampling is known to degrade vegetative ground cover. Our research has shown that trampling by cattle promotes the invasion of a shrub, Weigela hortensis into Miscanthus-dominant pastures. Because the seeds of W. hortensis are light sensitive germinators, trampling by removing ground cover promotes its seed germination. These findings provide new perspectives on plantanimal interactions in Japanese native pastures and help estimate the impact of animals on plant succession. They also contribute to efforts to ensure sustainable grazing use of pastures
Zero- and Few-shot Sound Event Localization and Detection
Sound event localization and detection (SELD) systems estimate
direction-of-arrival (DOA) and temporal activation for sets of target classes.
Neural network (NN)-based SELD systems have performed well in various sets of
target classes, but they only output the DOA and temporal activation of preset
classes that are trained before inference. To customize target classes after
training, we tackle zero- and few-shot SELD tasks, in which we set new classes
with a text sample or a few audio samples. While zero-shot sound classification
tasks are achievable by embedding from contrastive language-audio pretraining
(CLAP), zero-shot SELD tasks require assigning an activity and a DOA to each
embedding, especially in overlapping cases. To tackle the assignment problem in
overlapping cases, we propose an embed-ACCDOA model, which is trained to output
track-wise CLAP embedding and corresponding activity-coupled Cartesian
direction-of-arrival (ACCDOA). In our experimental evaluations on zero- and
few-shot SELD tasks, the embed-ACCDOA model showed a better location-dependent
scores than a straightforward combination of the CLAP audio encoder and a DOA
estimation model. Moreover, the proposed combination of the embed-ACCDOA model
and CLAP audio encoder with zero- or few-shot samples performed comparably to
an official baseline system trained with complete train data in an evaluation
dataset.Comment: 5 pages, 4 figure
STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
This report presents the Sony-TAu Realistic Spatial Soundscapes 2022
(STARS22) dataset for sound event localization and detection, comprised of
spatial recordings of real scenes collected in various interiors of two
different sites. The dataset is captured with a high resolution spherical
microphone array and delivered in two 4-channel formats, first-order Ambisonics
and tetrahedral microphone array. Sound events in the dataset belonging to 13
target sound classes are annotated both temporally and spatially through a
combination of human annotation and optical tracking. The dataset serves as the
development and evaluation dataset for the Task 3 of the DCASE2022 Challenge on
Sound Event Localization and Detection and introduces significant new
challenges for the task compared to the previous iterations, which were based
on synthetic spatialized sound scene recordings. Dataset specifications are
detailed including recording and annotation process, target classes and their
presence, and details on the development and evaluation splits. Additionally,
the report presents the baseline system that accompanies the dataset in the
challenge with emphasis on the differences with the baseline of the previous
iterations; namely, introduction of the multi-ACCDOA representation to handle
multiple simultaneous occurences of events of the same class, and support for
additional improved input features for the microphone array format. Results of
the baseline indicate that with a suitable training strategy a reasonable
detection and localization performance can be achieved on real sound scene
recordings. The dataset is available in https://zenodo.org/record/6387880
A self-consistent first-principles calculation scheme for correlated electron systems
A self-consistent calculation scheme for correlated electron systems is
created based on the density-functional theory (DFT). Our scheme is a
multi-reference DFT (MR-DFT) calculation in which the electron charge density
is reproduced by an auxiliary interacting Fermion system. A short-range
Hubbard-type interaction is introduced by a rigorous manner with a residual
term for the exchange-correlation energy. The Hubbard term is determined
uniquely by referencing the density fluctuation at a selected localized
orbital. This strategy to obtain an extension of the Kohn-Sham scheme provides
a self-consistent electronic structure calculation for the materials design.
Introducing an approximation for the residual exchange-correlation energy
functional, we have the LDA+U energy functional. Practical self-consistent
calculations are exemplified by simulations of Hydrogen systems, i.e. a
molecule and a periodic one-dimensional array, which is a proof of existence of
the interaction strength U as a continuous function of the local fluctuation
and structural parameters of the system.Comment: 23 pages, 8 figures, to appear in J. Phys. Condens. Matte
- …