44 research outputs found
Analysis and Forecasting of Trending Topics in Online Media Streams
Among the vast information available on the web, social media streams capture
what people currently pay attention to and how they feel about certain topics.
Awareness of such trending topics plays a crucial role in multimedia systems
such as trend aware recommendation and automatic vocabulary selection for video
concept detection systems.
Correctly utilizing trending topics requires a better understanding of their
various characteristics in different social media streams. To this end, we
present the first comprehensive study across three major online and social
media streams, Twitter, Google, and Wikipedia, covering thousands of trending
topics during an observation period of an entire year. Our results indicate
that depending on one's requirements one does not necessarily have to turn to
Twitter for information about current events and that some media streams
strongly emphasize content of specific categories. As our second key
contribution, we further present a novel approach for the challenging task of
forecasting the life cycle of trending topics in the very moment they emerge.
Our fully automated approach is based on a nearest neighbor forecasting
technique exploiting our assumption that semantically similar topics exhibit
similar behavior.
We demonstrate on a large-scale dataset of Wikipedia page view statistics
that forecasts by the proposed approach are about 9-48k views closer to the
actual viewing statistics compared to baseline methods and achieve a mean
average percentage error of 45-19% for time periods of up to 14 days.Comment: ACM Multimedia 201
FedTabDiff: Federated Learning of Diffusion Probabilistic Models for Synthetic Mixed-Type Tabular Data Generation
Realistic synthetic tabular data generation encounters significant challenges
in preserving privacy, especially when dealing with sensitive information in
domains like finance and healthcare. In this paper, we introduce
\textit{Federated Tabular Diffusion} (FedTabDiff) for generating high-fidelity
mixed-type tabular data without centralized access to the original tabular
datasets. Leveraging the strengths of \textit{Denoising Diffusion Probabilistic
Models} (DDPMs), our approach addresses the inherent complexities in tabular
data, such as mixed attribute types and implicit relationships. More
critically, FedTabDiff realizes a decentralized learning scheme that permits
multiple entities to collaboratively train a generative model while respecting
data privacy and locality. We extend DDPMs into the federated setting for
tabular data generation, which includes a synchronous update scheme and
weighted averaging for effective model aggregation. Experimental evaluations on
real-world financial and medical datasets attest to the framework's capability
to produce synthetic data that maintains high fidelity, utility, privacy, and
coverage.Comment: 9 pages, 2 figures, 2 tables, preprint version, currently under
revie
Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models
With growing size of Neural Networks (NNs), model sparsification to reduce
the computational cost and memory demand for model inference has become of
vital interest for both research and production. While many sparsification
methods have been proposed and successfully applied on individual models, to
the best of our knowledge their behavior and robustness has not yet been
studied on large populations of models. With this paper, we address that gap by
applying two popular sparsification methods on populations of models (so called
model zoos) to create sparsified versions of the original zoos. We investigate
the performance of these two methods for each zoo, compare sparsification
layer-wise, and analyse agreement between original and sparsified populations.
We find both methods to be very robust with magnitude pruning able outperform
variational dropout with the exception of high sparsification ratios above 80%.
Further, we find sparsified models agree to a high degree with their original
non-sparsified counterpart, and that the performance of original and sparsified
model is highly correlated. Finally, all models of the model zoos and their
sparsified model twins are publicly available: modelzoos.cc.Comment: Accepted at ICLR 2023 Workshop on Sparsity in Neural Network
Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities
State-of-the-art Text-To-Speech (TTS) models are capable of producing
high-quality speech. The generated speech, however, is usually neutral in
emotional expression, whereas very often one would want fine-grained emotional
control of words or phonemes. Although still challenging, the first TTS models
have been recently proposed that are able to control voice by manually
assigning emotion intensity. Unfortunately, due to the neglect of intra-class
distance, the intensity differences are often unrecognizable. In this paper, we
propose a fine-grained controllable emotional TTS, that considers both inter-
and intra-class distances and be able to synthesize speech with recognizable
intensity difference. Our subjective and objective experiments demonstrate that
our model exceeds two state-of-the-art controllable TTS models for
controllability, emotion expressiveness and naturalness.Comment: Accepted by ICASSP202
FinDiff: Diffusion Models for Financial Tabular Data Generation
The sharing of microdata, such as fund holdings and derivative instruments,
by regulatory institutions presents a unique challenge due to strict data
confidentiality and privacy regulations. These challenges often hinder the
ability of both academics and practitioners to conduct collaborative research
effectively. The emergence of generative models, particularly diffusion models,
capable of synthesizing data mimicking the underlying distributions of
real-world data presents a compelling solution. This work introduces 'FinDiff',
a diffusion model designed to generate real-world financial tabular data for a
variety of regulatory downstream tasks, for example economic scenario modeling,
stress tests, and fraud detection. The model uses embedding encodings to model
mixed modality financial data, comprising both categorical and numeric
attributes. The performance of FinDiff in generating synthetic tabular
financial data is evaluated against state-of-the-art baseline models using
three real-world financial datasets (including two publicly available datasets
and one proprietary dataset). Empirical results demonstrate that FinDiff excels
in generating synthetic tabular financial data with high fidelity, privacy, and
utility.Comment: 9 pages, 5 figures, 3 tables, preprint version, currently under
revie
Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech
Effective speech emotional representations play a key role in Speech Emotion
Recognition (SER) and Emotional Text-To-Speech (TTS) tasks. However, emotional
speech samples are more difficult and expensive to acquire compared with
Neutral style speech, which causes one issue that most related works
unfortunately neglect: imbalanced datasets. Models might overfit to the
majority Neutral class and fail to produce robust and effective emotional
representations. In this paper, we propose an Emotion Extractor to address this
issue. We use augmentation approaches to train the model and enable it to
extract effective and generalizable emotional representations from imbalanced
datasets. Our empirical results show that (1) for the SER task, the proposed
Emotion Extractor surpasses the state-of-the-art baseline on three imbalanced
datasets; (2) the produced representations from our Emotion Extractor benefit
the TTS model, and enable it to synthesize more expressive speech.Comment: Accepted by INTERSPEECH202
AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis
Recently, sound recognition has been used to identify sounds, such as car and
river. However, sounds have nuances that may be better described by
adjective-noun pairs such as slow car, and verb-noun pairs such as flying
insects, which are under explored. Therefore, in this work we investigate the
relation between audio content and both adjective-noun pairs and verb-noun
pairs. Due to the lack of datasets with these kinds of annotations, we
collected and processed the AudioPairBank corpus consisting of a combined total
of 1,123 pairs and over 33,000 audio files. One contribution is the previously
unavailable documentation of the challenges and implications of collecting
audio recordings with these type of labels. A second contribution is to show
the degree of correlation between the audio content and the labels through
sound recognition experiments, which yielded results of 70% accuracy, hence
also providing a performance benchmark. The results and study in this paper
encourage further exploration of the nuances in audio and are meant to
complement similar research performed on images and text in multimedia
analysis.Comment: This paper is a revised version of "AudioSentibank: Large-scale
Semantic Ontology of Acoustic Concepts for Audio Content Analysis