793 research outputs found
Coulomb interaction on pion production in Au+Au collisions at relativistic energies
Coulomb effects on charged pion transverse momentum spectra produced in Au+Au
collisions at RHIC-BES energies are investigated. From these spectra the
{\pi}-/{\pi}+ ratios as a function of transverse momentum are obtained and used
to extract the Coulomb kick (a momentum change due to Coulomb interaction) and
initial pion ratio for three different collision energies and various
centrality classes. The Coulomb kick shows a decrease with the increase of beam
energy and a clear centrality dependence, with larger values for the most
central collisions. The results are connected with the kinetic freeze-out
dynamics and discussed
Cascaded Cross-Modal Transformer for Request and Complaint Detection
We propose a novel cascaded cross-modal transformer (CCMT) that combines
speech and text transcripts to detect customer requests and complaints in phone
conversations. Our approach leverages a multimodal paradigm by transcribing the
speech using automatic speech recognition (ASR) models and translating the
transcripts into different languages. Subsequently, we combine
language-specific BERT-based models with Wav2Vec2.0 audio features in a novel
cascaded cross-attention transformer model. We apply our system to the Requests
Sub-Challenge of the ACM Multimedia 2023 Computational Paralinguistics
Challenge, reaching unweighted average recalls (UAR) of 65.41% and 85.87% for
the complaint and request classes, respectively.Comment: Accepted at ACMMM 202
Sea Ice Segmentation From SAR Data by Convolutional Transformer Networks
Sea ice is a crucial component of the Earth's climate system and is highly
sensitive to changes in temperature and atmospheric conditions. Accurate and
timely measurement of sea ice parameters is important for understanding and
predicting the impacts of climate change. Nevertheless, the amount of satellite
data acquired over ice areas is huge, making the subjective measurements
ineffective. Therefore, automated algorithms must be used in order to fully
exploit the continuous data feeds coming from satellites. In this paper, we
present a novel approach for sea ice segmentation based on SAR satellite
imagery using hybrid convolutional transformer (ConvTr) networks. We show that
our approach outperforms classical convolutional networks, while being
considerably more efficient than pure transformer models. ConvTr obtained a
mean intersection over union (mIoU) of 63.68% on the AI4Arctic data set,
assuming an inference time of 120ms for a 400 x 400 squared km product
Multi-dimensional Speech Quality Assessment in Crowdsourcing
Subjective speech quality assessment is the gold standard for evaluating
speech enhancement processing and telecommunication systems. The commonly used
standard ITU-T Rec. P.800 defines how to measure speech quality in lab
environments, and ITU-T Rec.~P.808 extended it for crowdsourcing. ITU-T Rec.
P.835 extends P.800 to measure the quality of speech in the presence of noise.
ITU-T Rec. P.804 targets the conversation test and introduces perceptual speech
quality dimensions which are measured during the listening phase of the
conversation. The perceptual dimensions are noisiness, coloration,
discontinuity, and loudness. We create a crowdsourcing implementation of a
multi-dimensional subjective test following the scales from P.804 and extend it
to include reverberation, the speech signal, and overall quality. We show the
tool is both accurate and reproducible. The tool has been used in the ICASSP
2023 Speech Signal Improvement challenge and we show the utility of these
speech quality dimensions in this challenge. The tool will be publicly
available as open-source at https://github.com/microsoft/P.808
Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks
Emotion recognition has become an important field of research in the
human-computer interactions domain. The latest advancements in the field show
that combining visual with audio information lead to better results if compared
to the case of using a single source of information separately. From a visual
point of view, a human emotion can be recognized by analyzing the facial
expression of the person. More precisely, the human emotion can be described
through a combination of several Facial Action Units. In this paper, we propose
a system that is able to recognize emotions with a high accuracy rate and in
real time, based on deep Convolutional Neural Networks. In order to increase
the accuracy of the recognition system, we analyze also the speech data and
fuse the information coming from both sources, i.e., visual and audio.
Experimental results show the effectiveness of the proposed scheme for emotion
recognition and the importance of combining visual with audio data
Guided deep learning by subaperture decomposition: ocean patterns from SAR imagery
Spaceborne synthetic aperture radar can provide meters scale images of the
ocean surface roughness day or night in nearly all weather conditions. This
makes it a unique asset for many geophysical applications. Sentinel 1 SAR wave
mode vignettes have made possible to capture many important oceanic and
atmospheric phenomena since 2014. However, considering the amount of data
provided, expanding applications requires a strategy to automatically process
and extract geophysical parameters. In this study, we propose to apply
subaperture decomposition as a preprocessing stage for SAR deep learning
models. Our data centring approach surpassed the baseline by 0.7, obtaining
state of the art on the TenGeoPSARwv data set. In addition, we empirically
showed that subaperture decomposition could bring additional information over
the original vignette, by rising the number of clusters for an unsupervised
segmentation method. Overall, we encourage the development of data centring
approaches, showing that, data preprocessing could bring significant
performance improvements over existing deep learning models
CL-MAE: Curriculum-Learned Masked Autoencoders
Masked image modeling has been demonstrated as a powerful pretext task for
generating robust representations that can be effectively generalized across
multiple downstream tasks. Typically, this approach involves randomly masking
patches (tokens) in input images, with the masking strategy remaining unchanged
during training. In this paper, we propose a curriculum learning approach that
updates the masking strategy to continually increase the complexity of the
self-supervised reconstruction task. We conjecture that, by gradually
increasing the task complexity, the model can learn more sophisticated and
transferable representations. To facilitate this, we introduce a novel
learnable masking module that possesses the capability to generate masks of
different complexities, and integrate the proposed module into masked
autoencoders (MAE). Our module is jointly trained with the MAE, while adjusting
its behavior during training, transitioning from a partner to the MAE
(optimizing the same reconstruction loss) to an adversary (optimizing the
opposite loss), while passing through a neutral state. The transition between
these behaviors is smooth, being regulated by a factor that is multiplied with
the reconstruction loss of the masking module. The resulting training procedure
generates an easy-to-hard curriculum. We train our Curriculum-Learned Masked
Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior
representation learning capabilities compared to MAE. The empirical results on
five downstream tasks confirm our conjecture, demonstrating that curriculum
learning can be successfully used to self-supervise masked autoencoders
- …