Search CORE

824 research outputs found

User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning

Author: Feng Tiantian
Narayanan Shrikanth
Peri Raghuveer
Publication venue
Publication date: 05/04/2022
Field of study

Many existing privacy-enhanced speech emotion recognition (SER) frameworks focus on perturbing the original speech data through adversarial training within a centralized machine learning setup. However, this privacy protection scheme can fail since the adversary can still access the perturbed data. In recent years, distributed learning algorithms, especially federated learning (FL), have gained popularity to protect privacy in machine learning applications. While FL provides good intuition to safeguard privacy by keeping the data on local devices, prior work has shown that privacy attacks, such as attribute inference attacks, are achievable for SER systems trained using FL. In this work, we propose to evaluate the user-level differential privacy (UDP) in mitigating the privacy leaks of the SER system in FL. UDP provides theoretical privacy guarantees with privacy parameters

\epsilon

and

\delta

. Our results show that the UDP can effectively decrease attribute information leakage while keeping the utility of the SER system with the adversary accessing one model update. However, the efficacy of the UDP suffers when the FL system leaks more model updates to the adversary. We make the code publicly available to reproduce the results in https://github.com/usc-sail/fed-ser-leakage

arXiv.org e-Print Archive

Learning Behavioral Representations of Routines From Large-scale Unlabeled Wearable Time-series Data Streams using Hawkes Point Process

Author: Booth Brandon M
Feng Tiantian
Narayanan Shrikanth
Publication venue
Publication date: 10/07/2023
Field of study

Continuously-worn wearable sensors enable researchers to collect copious amounts of rich bio-behavioral time series recordings of real-life activities of daily living, offering unprecedented opportunities to infer novel human behavior patterns during daily routines. Existing approaches to routine discovery through bio-behavioral data rely either on pre-defined notions of activities or use additional non-behavioral measurements as contexts, such as GPS location or localization within the home, presenting risks to user privacy. In this work, we propose a novel wearable time-series mining framework, Hawkes point process On Time series clusters for ROutine Discovery (HOT-ROD), for uncovering behavioral routines from completely unlabeled wearable recordings. We utilize a covariance-based method to generate time-series clusters and discover routines via the Hawkes point process learning algorithm. We empirically validate our approach for extracting routine behaviors using a completely unlabeled time-series collected continuously from over 100 individuals both in and outside of the workplace during a period of ten weeks. Furthermore, we demonstrate this approach intuitively captures daily transitional relationships between physical activity states without using prior knowledge. We also show that the learned behavioral patterns can assist in illuminating an individual's personality and affect.Comment: 2023 9th ACM SIGKDD International Workshop on Mining and Learning From Time Series (MiLeTS 2023

arXiv.org e-Print Archive

Emotion-Aligned Contrastive Learning Between Images and Music

Author: Avramidis Kleanthis
Feng Tiantian
Narayanan Shrikanth
Stewart Shanti
Publication venue
Publication date: 24/08/2023
Field of study

Traditional music search engines rely on retrieval methods that match natural language queries with music metadata. There have been increasing efforts to expand retrieval methods to consider the audio characteristics of music itself, using queries of various modalities including text, video, and speech. Most approaches aim to match general music semantics to the input queries, while only a few focus on affective qualities. We address the task of retrieving emotionally-relevant music from image queries by proposing a framework for learning an affective alignment between images and music audio. Our approach focuses on learning an emotion-aligned joint embedding space between images and music. This joint embedding space is learned via emotion-supervised contrastive learning, using an adapted cross-modal version of the SupCon loss. We directly evaluate the joint embeddings with cross-modal retrieval tasks (image-to-music and music-to-image) based on emotion labels. In addition, we investigate the generalizability of the learned music embeddings with automatic music tagging as a downstream task. Our experiments show that our approach successfully aligns images and music, and that the learned embedding space is effective for cross-modal retrieval applications.Comment: Under revie

arXiv.org e-Print Archive

Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

Author: Bose Digbalay
Feng Tiantian
Narayanan Shrikanth
Shi Xuan
Publication venue
Publication date: 13/06/2023
Field of study

Automatic Speech Understanding (ASU) leverages the power of deep learning models for accurate interpretation of human speech, leading to a wide range of speech applications that enrich the human experience. However, training a robust ASU model requires the curation of a large number of speech samples, creating risks for privacy breaches. In this work, we investigate using foundation models to assist privacy-enhancing speech computing. Unlike conventional works focusing primarily on data perturbation or distributed algorithms, our work studies the possibilities of using pre-trained generative models to synthesize speech content as training data with just label guidance. We show that zero-shot learning with training label-guided synthetic speech content remains a challenging task. On the other hand, our results demonstrate that the model trained with synthetic speech samples provides an effective initialization point for low-resource ASU training. This result reveals the potential to enhance privacy by reducing user data collection but using label-guided synthetic speech content

arXiv.org e-Print Archive

Electromagnetic Scattering of Electrically Large Ship above Sea Surface with SBR-SDFM Method

Author: Lixin Guo
Tiantian Feng
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Hybrid scheme combining shooting and bouncing ray with semi-deterministic facet model is proposed to analyze composite scattering from ship-ocean scene in this study. This model can deal with complex electromagnetic interaction between ship and sea surface. Thus, scattering properties of composite ship-ocean scenes with influence of various parameters (such as incident angle and wind speed) can be studied and analyzed efficiently. Studying such properties is of significance for target detection and high-resolution radar imaging in sea environments. Accuracy and performance of this method are validated and evaluated by comparing with multilevel fast multipole method of FEKO for electrically small objects. All simulation results indicate that the proposed method is suitable for providing preliminary radar cross section prediction of electrically large composite model

Crossref

Directory of Open Access Journals

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

Author: Cong Runmin
Duan Jinming
Geng Tiantian
Wang Teng
Zheng Feng
Publication venue
Publication date: 24/03/2023
Field of study

Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often contain numerous audio-visual events with different categories. To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video. The problem is challenging as it requires fine-grained audio-visual scene and context understanding. To tackle this problem, we introduce the first Untrimmed Audio-Visual (UnAV-100) dataset, which contains 10K untrimmed videos with over 30K audio-visual events. Each video has 2.8 audio-visual events on average, and the events are usually related to each other and might co-occur as in real-life scenes. Next, we formulate the task using a new learning-based framework, which is capable of fully integrating audio and visual modalities to localize audio-visual events with various lengths and capture dependencies between them in a single pass. Extensive experiments demonstrate the effectiveness of our method as well as the significance of multi-scale cross-modal perception and dependency modeling for this task.Comment: Accepted by CVPR202

arXiv.org e-Print Archive

LOG-LIO: A LiDAR-Inertial Odometry with Efficient Local Geometric Information Estimation

Author: Feng Tiantian
Huang Kai
Ye Chen
Zhao Junqiao
Zhu Zhongyang
Publication venue
Publication date: 13/08/2023
Field of study

Local geometric information, i.e. normal and distribution of points, is crucial for LiDAR-based simultaneous localization and mapping (SLAM) because it provides constraints for data association, which further determines the direction of optimization and ultimately affects the accuracy of localization. However, estimating normal and distribution of points are time-consuming tasks even with the assistance of kdtree or volumetric maps. To achieve fast normal estimation, we look into the structure of LiDAR scan and propose a ring-based fast approximate least squares (Ring FALS) method. With the Ring structural information, estimating the normal requires only the range information of the points when a new scan arrives. To efficiently estimate the distribution of points, we extend the ikd-tree to manage the map in voxels and update the distribution of points in each voxel incrementally while maintaining its consistency with the normal estimation. We further fix the distribution after its convergence to balance the time consumption and the correctness of representation. Based on the extracted and maintained local geometric information, we devise a robust and accurate hierarchical data association scheme where point-to-surfel association is prioritized over point-to-plane. Extensive experiments on diverse public datasets demonstrate the advantages of our system compared to other state-of-the-art methods. Our open source implementation is available at https://github.com/tiev-tongji/LOG-LIO.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive

Scale Estimation with Dual Quadrics for Monocular Object SLAM

Author: Feng Tiantian
Song Shuangfu
Xiong Lu
Ye Chen
Zhao Junqiao
Publication venue
Publication date: 02/11/2022
Field of study

The scale ambiguity problem is inherently unsolvable to monocular SLAM without the metric baseline between moving cameras. In this paper, we present a novel scale estimation approach based on an object-level SLAM system. To obtain the absolute scale of the reconstructed map, we derive a nonlinear optimization method to make the scaled dimensions of objects conforming to the distribution of their sizes in the physical world, without relying on any prior information of gravity direction. We adopt the dual quadric to represent objects for its ability to fit objects compactly and accurately. In the proposed monocular object-level SLAM system, dual quadrics are fastly initialized based on constraints of 2-D detections and fitted oriented bounding box and are further optimized to provide reliable dimensions for scale estimation.Comment: 8 pages, 6 figures, accepted by IROS202

arXiv.org e-Print Archive