6,123 research outputs found
PIANO: Proximity-based User Authentication on Voice-Powered Internet-of-Things Devices
Voice is envisioned to be a popular way for humans to interact with
Internet-of-Things (IoT) devices. We propose a proximity-based user
authentication method (called PIANO) for access control on such voice-powered
IoT devices. PIANO leverages the built-in speaker, microphone, and Bluetooth
that voice-powered IoT devices often already have. Specifically, we assume that
a user carries a personal voice-powered device (e.g., smartphone, smartwatch,
or smartglass), which serves as the user's identity. When another voice-powered
IoT device of the user requires authentication, PIANO estimates the distance
between the two devices by playing and detecting certain acoustic signals;
PIANO grants access if the estimated distance is no larger than a user-selected
threshold. We implemented a proof-of-concept prototype of PIANO. Through
theoretical and empirical evaluations, we find that PIANO is secure, reliable,
personalizable, and efficient.Comment: To appear in ICDCS'1
Convolution channel separation and frequency sub-bands aggregation for music genre classification
In music, short-term features such as pitch and tempo constitute long-term
semantic features such as melody and narrative. A music genre classification
(MGC) system should be able to analyze these features. In this research, we
propose a novel framework that can extract and aggregate both short- and
long-term features hierarchically. Our framework is based on ECAPA-TDNN, where
all the layers that extract short-term features are affected by the layers that
extract long-term features because of the back-propagation training. To prevent
the distortion of short-term features, we devised the convolution channel
separation technique that separates short-term features from long-term feature
extraction paths. To extract more diverse features from our framework, we
incorporated the frequency sub-bands aggregation method, which divides the
input spectrogram along frequency bandwidths and processes each segment. We
evaluated our framework using the Melon Playlist dataset which is a large-scale
dataset containing 600 times more data than GTZAN which is a widely used
dataset in MGC studies. As the result, our framework achieved 70.4% accuracy,
which was improved by 16.9% compared to a conventional framework
Integrated Parameter-Efficient Tuning for General-Purpose Audio Models
The advent of hyper-scale and general-purpose pre-trained models is shifting
the paradigm of building task-specific models for target tasks. In the field of
audio research, task-agnostic pre-trained models with high transferability and
adaptability have achieved state-of-the-art performances through fine-tuning
for downstream tasks. Nevertheless, re-training all the parameters of these
massive models entails an enormous amount of time and cost, along with a huge
carbon footprint. To overcome these limitations, the present study explores and
applies efficient transfer learning methods in the audio domain. We also
propose an integrated parameter-efficient tuning (IPET) framework by
aggregating the embedding prompt (a prompt-based learning approach), and the
adapter (an effective transfer learning method). We demonstrate the efficacy of
the proposed framework using two backbone pre-trained audio models with
different characteristics: the audio spectrogram transformer and wav2vec 2.0.
The proposed IPET framework exhibits remarkable performance compared to
fine-tuning method with fewer trainable parameters in four downstream tasks:
sound event classification, music genre classification, keyword spotting, and
speaker verification. Furthermore, the authors identify and analyze the
shortcomings of the IPET framework, providing lessons and research directions
for parameter efficient tuning in the audio domain.Comment: 5 pages, 3 figures, submit to ICASSP202
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
The application of speech self-supervised learning (SSL) models has achieved
remarkable performance in speaker verification (SV). However, there is a
computational cost hurdle in employing them, which makes development and
deployment difficult. Several studies have simply compressed SSL models through
knowledge distillation (KD) without considering the target task. Consequently,
these methods could not extract SV-tailored features. This paper suggests
One-Step Knowledge Distillation and Fine-Tuning (OS-KDFT), which incorporates
KD and fine-tuning (FT). We optimize a student model for SV during KD training
to avert the distillation of inappropriate information for the SV. OS-KDFT
could downsize Wav2Vec 2.0 based ECAPA-TDNN size by approximately 76.2%, and
reduce the SSL model's inference time by 79% while presenting an EER of 0.98%.
The proposed OS-KDFT is validated across VoxCeleb1 and VoxCeleb2 datasets and
W2V2 and HuBERT SSL models. Experiments are available on our GitHub
PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification
Background noise reduces speech intelligibility and quality, making speaker
verification (SV) in noisy environments a challenging task. To improve the
noise robustness of SV systems, additive noise data augmentation method has
been commonly used. In this paper, we propose a new additive noise method,
partial additive speech (PAS), which aims to train SV systems to be less
affected by noisy environments. The experimental results demonstrate that PAS
outperforms traditional additive noise in terms of equal error rates (EER),
with relative improvements of 4.64% and 5.01% observed in SE-ResNet34 and
ECAPA-TDNN. We also show the effectiveness of proposed method by analyzing
attention modules and visualizing speaker embeddings.Comment: 5 pages, 2 figures, 1 table, accepted to CKAIA2023 as a conference
pape
Aerobic training with rhythmic functional movement: Influence on cardiopulmonary function, functional movement and Quality of life in the elderly women
The purpose of this study was to investigate the effects of rhythm of aerobic exercise in elderly women. Thirty subjects were randomly divided into two groups: The aerobic exercise with rhythm (experimental group, n=9) and aerobic exercise without rhythm (control group, n=10). All subjects performed aerobic exercise composed of functional movements. During the exercise, control group subjects were performed the functional movement exercise only to the beat without music or rhythm and experimental group subjects were performed the functional movement exercise to the rhythm of the music. All subjects performed exercise for 50 minutes, twice a week, total of 8 weeks. The forced vital capacity (FVC), forced expiratory volume in one second (FEV1), and maximal voluntary ventilation (MVV) were measured. Functional movements were assessed using FMS (Functional Movement Screen). Quality of life (QOL) were assessed using SF-36. Evaluation was performed before and after 8 weeks of exercise and one month later for follow-up. The FVV, FVC1, MVV, FMS, and SF-36 have shown a significant difference in time as a result of the two-way repeated-measures analysis. The post mean change of FVC1, MVV, FMS, and SF-36 were significantly different between groups. In this study, aerobic exercise, which is composed of rhythmic functional movement, helped improve functional movement and QOL for the elderly women. When the experimental group and the control group were compared, the improvement of the experimental group with music and rhythm was more positive than the exercise using the same functional movement.This research was supported by the Daejeon University fund (2017)
Dibenzoatobis[3-(pyrrol-1-ylmethÂyl)pyridine]Âzinc(II)
In the title compound, [Zn(C7H5O2)2(C10H10N2)2], the ZnII ion, located on a twofold axis, is coordinated by two N atoms from two 3-(pyrrol-1-ylmethÂyl)pyridine ligands and two O atoms from two benzoate ligands in a distorted tetraÂhedral geometry. The pyridine and the pyrrole rings are nearly perpendicular to each other, making a dihedral angle of 84.83 (7)°
- …