20 research outputs found
Graph neural network for audio representation learning
Learning audio representations is an important task with many potential applications. Whether it takes the shape of speech, music, or ambient sounds, audio is a common form of data that may communicate rich information. Audio representation learning is also a fundamental ingredient of deep learning. However, learning a good representation is a challenging task. Audio representation learning can also enable more accurate downstream tasks both in audio and video, such as emotion recognition. For audio representation learning, such a representation should contain the information needed to understand the input sound and make discriminative patterns. This necessitates a sizable volume of carefully annotated data, which requires a considerable amount of labour. In this thesis, we propose a set of models for audio representation learning. We address the discriminative patterns by proposing graph structure and graph neural network to further process it. Our work is the first to consider the graph structure for audio data. In contrast to existing methods that use approximation, our first model proposes a manual graph structure and uses a graph convolution layer with accurate graph convolution operation. In the second model, By integrating a graph inception network, we expand the manually created graph structure and simultaneously learn it with the primary objective in our model. In the third model, we addressed the dearth of annotated data by including a semi-supervised graph technique that represents audio corpora as nodes in a graph and connects them depending on label information in smaller subgraphs. We brought up the issue of leveraging multimodal data to improve audio representation learning in addition to earlier works. To accommodate multimodal input data, we included heterogeneous graph data to our fourth model. Additionally, we created a new graph architecture to handle multimodal data
Compact graph architecture for speech emotion recognition
We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way to represent data is in the form of graphs. Following the theory of graph signal processing, we propose to model speech signal as a cycle graph or a line graph. Such graph structure enables us to construct a Graph Convolution Network (GCN)-based architecture that can perform an accurate graph convolution in contrast to the approximate convolution used in standard GCNs. We evaluated the performance of our model for speech emotion recognition on the popular IEMOCAP and MSP-IMPROV databases. Our model outperforms standard GCN and other relevant deep graph architectures indicating the effectiveness of our approach. When compared with existing speech emotion recognition methods, our model achieves comparable performance to the state-of-the-art with significantly fewer learnable parameters (~30K) indicating its applicability in resource-constrained devices. Our code is available at /github.com/AmirSh15/Compact_SER
Heterogeneous Graph Learning for Acoustic Event Classification
Heterogeneous graphs provide a compact, efficient, and scalable way to model
data involving multiple disparate modalities. This makes modeling audiovisual
data using heterogeneous graphs an attractive option. However, graph structure
does not appear naturally in audiovisual data. Graphs for audiovisual data are
constructed manually which is both difficult and sub-optimal. In this work, we
address this problem by (i) proposing a parametric graph construction strategy
for the intra-modal edges, and (ii) learning the crossmodal edges. To this end,
we develop a new model, heterogeneous graph crossmodal network (HGCN) that
learns the crossmodal edges. Our proposed model can adapt to various spatial
and temporal scales owing to its parametric construction, while the learnable
crossmodal edges effectively connect the relevant nodes across modalities.
Experiments on a large benchmark dataset (AudioSet) show that our model is
state-of-the-art (0.53 mean average precision), outperforming transformer-based
models and other graph-based models.Comment: arXiv admin note: text overlap with arXiv:2207.0793
SAFE: Saliency-Aware Counterfactual Explanations for DNN-based automated driving systems
The explainability of Deep Neural Networks (DNNs) has recently gained significant importance especially in safety-critical applications such as automated/autonomous vehicles, a.k.a. automated driving systems. CounterFactual (CF) explanations have emerged as a promising approach for interpreting the behaviour of black-box DNNs. A CF explainer identifies the minimum modifications in the input that would alter the model’s output to its complement. In other words, a CF explainer computes the minimum modifications required to cross the model’s
decision boundary. Current deep generative CF models often work with user-selected features rather than focusing on the discriminative features of the black-box model. Consequently, such CF examples may not necessarily lie near the decision boundary, thereby contradicting the definition of CFs. To address this issue, we propose in this paper a novel approach that leverages saliency maps to generate more informative CF
explanations. Our approach guides a Generative Adversarial Network based on the most influential features of the input of the black-box model to produce CFs near the decision boundary. We evaluate the performance of this approach using a real-world dataset of driving scenes, BDD100k, and demonstrate its superiority over several baseline methods in
terms of well-known CF metrics, including proximity, sparsity and validity. Our work contributes to the ongoing efforts to improve the interpretability of DNNs and provides a promising direction for generating more accurate and informative CF explanation
Enhancing the Therapeutic Efficacy of Daunorubicin and Mitoxantrone with Bavachinin, Candidone, and Tephrosin
The capability of flavonoids in sensitizing cancer cells was demonstrated in numerous works to chemotherapy and converse multidrug resistance by modulating efflux pumps and apoptosis mechanisms. Three flavonoids, namely, bavachinin, tephrosin, and candidone, have been recently introduced to cancer treatment research presenting various activities, such as antibacterial, immunomodulatory, cell death, and anticancer. Less information exists regarding the therapeutic significance of these flavonoids in cancer treatment, especially in overcoming multidrug resistance (MDR). Here, we tempted to investigate the potency of these agents in reversing MDR by analyzing their effects as chemosensitizers on cell cytotoxicity, P-gp and ABCG2 protein expression levels, and their function on two multidrug-resistant cell lines, P-gp-overexpressing human gastric adenocarcinoma cell line (EPG85.257RDB) and ABCG2-overexpressing human epithelial breast cancer cell line (MCF7/MX). The inhibitory concentration of 10% (IC10) of bavachinin, tephrosin, and candidone in EPG85.257RDB cells was 1588.7 ± 202.2, 264.8 ± 86.15, and 1338.6 ± 114.11 nM, respectively. Moreover, these values in MCF7/MX cell were 2406.4 ± 257.63, 38.8 ± 4.28, and 27.9 ± 5.59 nM, respectively. Expression levels of ABCG2 and P-gp were not significantly downregulated by these flavonoids. Maximum levels of daunorubicin and mitoxantrone accumulations and minimum rates of drug efflux in both cell lines were detected 48 hrs posttreatment with tephrosin and bavachinin, respectively. Chemosensitization to mitoxantrone and daunorubicin treatments was, respectively, achieved in MCF7/MX and EPG85.257RDB cells in response to IC10 of bavachinin and tephrosin, independently. These effects did not follow time-dependent manner, and each flavonoid had its cell-dependent patterns. Overall, bavachinin, tephrosin, and candidone showed potency to sensitize MDR cells to daunorubicin and mitoxantrone and could be considered as attractive MDR modulators for cancer treatment. However, their action was time and cell specific
Clinical Significance and Different Expression of Dipeptidyl Peptidase IV and Procalcitonin in Mild and Severe COVID-19
Background: Coronavirus has become a global concern in 2019-20. The virus belongs to the coronavirus family, which has been able to infect many patients and victims around the world. The virus originated in the Chinese city of Wuhan, which eventually spread around the world and became a pandemic.
Materials and Methods: A total of 60 Patients with severe (n=30) and mild (n=30) symptoms of COIVD-19 were included in this study. Peripheral blood samples were collected from the patients. Real-time PCR was used to compare the relative expression levels of Procalcitonin and dipeptidyl peptidase IV (DPPIV) in a patient with severe and mild Covid-19 infection.
Results: Procalcitonin and dipeptidyl peptidase IV markers in the peripheral blood of patients with severe symptoms, were positive in 29 (96.60%) and 26 (86.60%), respectively (n=30); however, positive rates in the mild symptoms patients group were 27 (90%) and 25 (83.30%), respectively. There was a statistically significant difference between these two groups in terms of DDPIV and Procalcitonin (p<0.001).
Conclusion: Procalcitonin and DPPIV increase in patients with COVID-19 infection, significantly higher in the patients with more severe clinical symptoms than those with milder ones. More studies will be needed to verify the reliability of the current findings.
Keywords: Procalcitonin, DPPIV, Severe symptoms, Mild symptoms, COVID-1
Compact Graph Architecture for Speech Emotion Recognition
We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way to represent data is in the form of graphs. Following the theory of graph signal processing, we propose to model speech signal as a cycle graph or a line graph. Such graph structure enables us to construct a Graph Convolution Network (GCN)-based architecture that can perform an accurate graph convolution in contrast to the approximate convolution used in standard GCNs. We evaluated the performance of our model for speech emotion recognition on the popular IEMOCAP and MSP-IMPROV databases. Our model outperforms standard GCN and other relevant deep graph architectures indicating the effectiveness of our approach. When compared with existing speech emotion recognition methods, our model achieves comparable performance to the state-of-the-art with significantly fewer learnable parameters (~30K) indicating its applicability in resource-constrained devices. Our code is available at /github.com/AmirSh15/Compact_SER
Dynamic emotion modeling with learnable graphs and graph inception network
Human emotion is expressed, perceived and captured using a variety of dynamic data modalities, such as speech (verbal), videos (facial expressions) and motion sensors (body gestures). We propose a generalized approach to emotion recognition that can adapt across modalities by modeling dynamic data as structured graphs. The motivation behind the graph approach is to build compact models without compromising on performance. To alleviate the problem of optimal graph construction, we cast this as a joint graph learning and classification task. To this end, we present the learnable graph inception network (L-GrIN) that jointly learns to recognize emotion and to identify the underlying graph structure in the dynamic data. Our architecture comprises multiple novel components: a new graph convolution operation, a graph inception layer, learnable adjacency, and a learnable pooling function that yields a graph-level embedding. We evaluate the proposed architecture on five benchmark emotion recognition databases spanning three different modalities (video, audio, motion capture), where each database captures one of the following emotional cues: facial expressions, speech and body gestures. We achieve state-of-the-art performance on all five databases outperforming several competitive baselines and relevant existing methods. Our graph architecture shows superior performance with significantly fewer parameters (compared to convolutional or recurrent neural networks) promising its applicability to resource-constrained devices. Our code is available at /github.com/AmirSh15/graph_emotion_recognition
Self-supervised graphs for audio representation learning with limited labeled data
Large-scale databases with high-quality manual labels are scarce in audio domain. We thus explore a self-supervised graph approach to learning audio representations from highly limited labelled data. Considering each audio sample as a graph node, we propose a subgraph-based framework with novel selfsupervision tasks to learn effective audio representations. During training, subgraphs are constructed by sampling the entire pool of available training data to exploit the relationship between the labelled and unlabeled audio samples. During inference, we use random edges to alleviate the overhead of graph construction. We evaluate our model on three benchmark audio datasets spanning two tasks: acoustic event classification and speech emotion recognition. We show that our semi-supervised model performs better or on par with fully supervised models and outperforms several competitive existing models. Our model is compact and can produce generalized audio representations robust to different types of signal noise. Our code is available at github.com/AmirSh15/SSL_graph_audio