824 research outputs found
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning
Many existing privacy-enhanced speech emotion recognition (SER) frameworks
focus on perturbing the original speech data through adversarial training
within a centralized machine learning setup. However, this privacy protection
scheme can fail since the adversary can still access the perturbed data. In
recent years, distributed learning algorithms, especially federated learning
(FL), have gained popularity to protect privacy in machine learning
applications. While FL provides good intuition to safeguard privacy by keeping
the data on local devices, prior work has shown that privacy attacks, such as
attribute inference attacks, are achievable for SER systems trained using FL.
In this work, we propose to evaluate the user-level differential privacy (UDP)
in mitigating the privacy leaks of the SER system in FL. UDP provides
theoretical privacy guarantees with privacy parameters and .
Our results show that the UDP can effectively decrease attribute information
leakage while keeping the utility of the SER system with the adversary
accessing one model update. However, the efficacy of the UDP suffers when the
FL system leaks more model updates to the adversary. We make the code publicly
available to reproduce the results in
https://github.com/usc-sail/fed-ser-leakage
Learning Behavioral Representations of Routines From Large-scale Unlabeled Wearable Time-series Data Streams using Hawkes Point Process
Continuously-worn wearable sensors enable researchers to collect copious
amounts of rich bio-behavioral time series recordings of real-life activities
of daily living, offering unprecedented opportunities to infer novel human
behavior patterns during daily routines. Existing approaches to routine
discovery through bio-behavioral data rely either on pre-defined notions of
activities or use additional non-behavioral measurements as contexts, such as
GPS location or localization within the home, presenting risks to user privacy.
In this work, we propose a novel wearable time-series mining framework, Hawkes
point process On Time series clusters for ROutine Discovery (HOT-ROD), for
uncovering behavioral routines from completely unlabeled wearable recordings.
We utilize a covariance-based method to generate time-series clusters and
discover routines via the Hawkes point process learning algorithm. We
empirically validate our approach for extracting routine behaviors using a
completely unlabeled time-series collected continuously from over 100
individuals both in and outside of the workplace during a period of ten weeks.
Furthermore, we demonstrate this approach intuitively captures daily
transitional relationships between physical activity states without using prior
knowledge. We also show that the learned behavioral patterns can assist in
illuminating an individual's personality and affect.Comment: 2023 9th ACM SIGKDD International Workshop on Mining and Learning
From Time Series (MiLeTS 2023
Emotion-Aligned Contrastive Learning Between Images and Music
Traditional music search engines rely on retrieval methods that match natural
language queries with music metadata. There have been increasing efforts to
expand retrieval methods to consider the audio characteristics of music itself,
using queries of various modalities including text, video, and speech. Most
approaches aim to match general music semantics to the input queries, while
only a few focus on affective qualities. We address the task of retrieving
emotionally-relevant music from image queries by proposing a framework for
learning an affective alignment between images and music audio. Our approach
focuses on learning an emotion-aligned joint embedding space between images and
music. This joint embedding space is learned via emotion-supervised contrastive
learning, using an adapted cross-modal version of the SupCon loss. We directly
evaluate the joint embeddings with cross-modal retrieval tasks (image-to-music
and music-to-image) based on emotion labels. In addition, we investigate the
generalizability of the learned music embeddings with automatic music tagging
as a downstream task. Our experiments show that our approach successfully
aligns images and music, and that the learned embedding space is effective for
cross-modal retrieval applications.Comment: Under revie
Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content
Automatic Speech Understanding (ASU) leverages the power of deep learning
models for accurate interpretation of human speech, leading to a wide range of
speech applications that enrich the human experience. However, training a
robust ASU model requires the curation of a large number of speech samples,
creating risks for privacy breaches. In this work, we investigate using
foundation models to assist privacy-enhancing speech computing. Unlike
conventional works focusing primarily on data perturbation or distributed
algorithms, our work studies the possibilities of using pre-trained generative
models to synthesize speech content as training data with just label guidance.
We show that zero-shot learning with training label-guided synthetic speech
content remains a challenging task. On the other hand, our results demonstrate
that the model trained with synthetic speech samples provides an effective
initialization point for low-resource ASU training. This result reveals the
potential to enhance privacy by reducing user data collection but using
label-guided synthetic speech content
Electromagnetic Scattering of Electrically Large Ship above Sea Surface with SBR-SDFM Method
Hybrid scheme combining shooting and bouncing ray with semi-deterministic facet model is proposed to analyze composite scattering from ship-ocean scene in this study. This model can deal with complex electromagnetic interaction between ship and sea surface. Thus, scattering properties of composite ship-ocean scenes with influence of various parameters (such as incident angle and wind speed) can be studied and analyzed efficiently. Studying such properties is of significance for target detection and high-resolution radar imaging in sea environments. Accuracy and performance of this method are validated and evaluated by comparing with multilevel fast multipole method of FEKO for electrically small objects. All simulation results indicate that the proposed method is suitable for providing preliminary radar cross section prediction of electrically large composite model
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Existing audio-visual event localization (AVE) handles manually trimmed
videos with only a single instance in each of them. However, this setting is
unrealistic as natural videos often contain numerous audio-visual events with
different categories. To better adapt to real-life applications, in this paper
we focus on the task of dense-localizing audio-visual events, which aims to
jointly localize and recognize all audio-visual events occurring in an
untrimmed video. The problem is challenging as it requires fine-grained
audio-visual scene and context understanding. To tackle this problem, we
introduce the first Untrimmed Audio-Visual (UnAV-100) dataset, which contains
10K untrimmed videos with over 30K audio-visual events. Each video has 2.8
audio-visual events on average, and the events are usually related to each
other and might co-occur as in real-life scenes. Next, we formulate the task
using a new learning-based framework, which is capable of fully integrating
audio and visual modalities to localize audio-visual events with various
lengths and capture dependencies between them in a single pass. Extensive
experiments demonstrate the effectiveness of our method as well as the
significance of multi-scale cross-modal perception and dependency modeling for
this task.Comment: Accepted by CVPR202
LOG-LIO: A LiDAR-Inertial Odometry with Efficient Local Geometric Information Estimation
Local geometric information, i.e. normal and distribution of points, is
crucial for LiDAR-based simultaneous localization and mapping (SLAM) because it
provides constraints for data association, which further determines the
direction of optimization and ultimately affects the accuracy of localization.
However, estimating normal and distribution of points are time-consuming tasks
even with the assistance of kdtree or volumetric maps. To achieve fast normal
estimation, we look into the structure of LiDAR scan and propose a ring-based
fast approximate least squares (Ring FALS) method. With the Ring structural
information, estimating the normal requires only the range information of the
points when a new scan arrives. To efficiently estimate the distribution of
points, we extend the ikd-tree to manage the map in voxels and update the
distribution of points in each voxel incrementally while maintaining its
consistency with the normal estimation. We further fix the distribution after
its convergence to balance the time consumption and the correctness of
representation. Based on the extracted and maintained local geometric
information, we devise a robust and accurate hierarchical data association
scheme where point-to-surfel association is prioritized over point-to-plane.
Extensive experiments on diverse public datasets demonstrate the advantages of
our system compared to other state-of-the-art methods. Our open source
implementation is available at https://github.com/tiev-tongji/LOG-LIO.Comment: 8 pages, 4 figure
Scale Estimation with Dual Quadrics for Monocular Object SLAM
The scale ambiguity problem is inherently unsolvable to monocular SLAM
without the metric baseline between moving cameras. In this paper, we present a
novel scale estimation approach based on an object-level SLAM system. To obtain
the absolute scale of the reconstructed map, we derive a nonlinear optimization
method to make the scaled dimensions of objects conforming to the distribution
of their sizes in the physical world, without relying on any prior information
of gravity direction. We adopt the dual quadric to represent objects for its
ability to fit objects compactly and accurately. In the proposed monocular
object-level SLAM system, dual quadrics are fastly initialized based on
constraints of 2-D detections and fitted oriented bounding box and are further
optimized to provide reliable dimensions for scale estimation.Comment: 8 pages, 6 figures, accepted by IROS202
- …