17 research outputs found
Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding
Speech codecs learn compact representations of speech signals to facilitate
data transmission. Many recent deep neural network (DNN) based end-to-end
speech codecs achieve low bitrates and high perceptual quality at the cost of
model complexity. We propose a cross-module residual learning (CMRL) pipeline
as a module carrier with each module reconstructing the residual from its
preceding modules. CMRL differs from other DNN-based speech codecs, in that
rather than modeling speech compression problem in a single large neural
network, it optimizes a series of less-complicated modules in a two-phase
training scheme. The proposed method shows better objective performance than
AMR-WB and the state-of-the-art DNN-based speech codec with a similar network
architecture. As an end-to-end model, it takes raw PCM signals as an input, but
is also compatible with linear predictive coding (LPC), showing better
subjective quality at high bitrates than AMR-WB and OPUS. The gain is achieved
by using only 0.9 million trainable parameters, a significantly less complex
architecture than the other DNN-based codecs in the literature.Comment: Accepted for publication in INTERSPEECH 201
Graph Representation Learning-Based Early Depression Detection Framework in Smart Home Environments
Although the diagnosis and treatment of depression is a medical field, ICTs and AI technologies are used widely to detect depression earlier in the elderly. These technologies are used to identify behavioral changes in the physical world or sentiment changes in cyberspace, known as symptoms of depression. However, although sentiment and physical changes, which are signs of depression in the elderly, are usually revealed simultaneously, there is no research on them at the same time. To solve the problem, this paper proposes knowledge graph-based cyber–physical view (CPV)-based activity pattern recognition for the early detection of depression, also known as KARE. In the KARE framework, the knowledge graph (KG) plays key roles in providing cross-domain knowledge as well as resolving issues of grammatical and semantic heterogeneity required in order to integrate cyberspace and the physical world. In addition, it can flexibly express the patterns of different activities for each elderly. To achieve this, the KARE framework implements a set of new machine learning techniques. The first is 1D-CNN for attribute representation in relation to learning to connect the attributes of physical and cyber worlds and the KG. The second is the entity alignment with embedding vectors extracted by the CNN and GNN. The third is a graph extraction method to construct the CPV from KG with the graph representation learning and wrapper-based feature selection in the unsupervised manner. The last one is a method of activity-pattern graph representation based on a Gaussian Mixture Model and KL divergence for training the GAT model to detect depression early. To demonstrate the superiority of the KARE framework, we performed the experiments using real-world datasets with five state-of-the-art models in knowledge graph entity alignment
Graph Representation Learning-Based Early Depression Detection Framework in Smart Home Environments
Although the diagnosis and treatment of depression is a medical field, ICTs and AI technologies are used widely to detect depression earlier in the elderly. These technologies are used to identify behavioral changes in the physical world or sentiment changes in cyberspace, known as symptoms of depression. However, although sentiment and physical changes, which are signs of depression in the elderly, are usually revealed simultaneously, there is no research on them at the same time. To solve the problem, this paper proposes knowledge graph-based cyber–physical view (CPV)-based activity pattern recognition for the early detection of depression, also known as KARE. In the KARE framework, the knowledge graph (KG) plays key roles in providing cross-domain knowledge as well as resolving issues of grammatical and semantic heterogeneity required in order to integrate cyberspace and the physical world. In addition, it can flexibly express the patterns of different activities for each elderly. To achieve this, the KARE framework implements a set of new machine learning techniques. The first is 1D-CNN for attribute representation in relation to learning to connect the attributes of physical and cyber worlds and the KG. The second is the entity alignment with embedding vectors extracted by the CNN and GNN. The third is a graph extraction method to construct the CPV from KG with the graph representation learning and wrapper-based feature selection in the unsupervised manner. The last one is a method of activity-pattern graph representation based on a Gaussian Mixture Model and KL divergence for training the GAT model to detect depression early. To demonstrate the superiority of the KARE framework, we performed the experiments using real-world datasets with five state-of-the-art models in knowledge graph entity alignment
Deep Model-Based Security-Aware Entity Alignment Method for Edge-Specific Knowledge Graphs
This paper proposes a deep model-based entity alignment method for the edge-specific knowledge graphs (KGs) to resolve the semantic heterogeneity between the edge systems’ data. To do so, this paper first analyzes the edge-specific knowledge graphs (KGs) to find unique characteristics. The deep model-based entity alignment method is developed based on their unique characteristics. The proposed method performs the entity alignment using a graph which is not topological but data-centric, to reflect the characteristics of the edge-specific KGs, which are mainly composed of the instance entities rather than the conceptual entities. In addition, two deep models, namely BERT (bidirectional encoder representations from transformers) for the concept entities and GAN (generative adversarial networks) for the instance entities, are applied to model learning. By utilizing the deep models, neural network models that humans cannot interpret, it is possible to secure data on the edge systems. The two learning models trained separately are integrated using a graph-based deep learning model GCN (graph convolution network). Finally, the integrated deep model is utilized to align the entities in the edge-specific KGs. To demonstrate the superiority of the proposed method, we perform the experiment and evaluation compared to the state-of-the-art entity alignment methods with the two experimental datasets from DBpedia, YAGO, and wikidata. In the evaluation metrics of Hits@k, mean rank (MR), and mean reciprocal rank (MRR), the proposed method shows the best predictive and generalization performance for the KG entity alignment
Deep Model-Based Security-Aware Entity Alignment Method for Edge-Specific Knowledge Graphs
This paper proposes a deep model-based entity alignment method for the edge-specific knowledge graphs (KGs) to resolve the semantic heterogeneity between the edge systems’ data. To do so, this paper first analyzes the edge-specific knowledge graphs (KGs) to find unique characteristics. The deep model-based entity alignment method is developed based on their unique characteristics. The proposed method performs the entity alignment using a graph which is not topological but data-centric, to reflect the characteristics of the edge-specific KGs, which are mainly composed of the instance entities rather than the conceptual entities. In addition, two deep models, namely BERT (bidirectional encoder representations from transformers) for the concept entities and GAN (generative adversarial networks) for the instance entities, are applied to model learning. By utilizing the deep models, neural network models that humans cannot interpret, it is possible to secure data on the edge systems. The two learning models trained separately are integrated using a graph-based deep learning model GCN (graph convolution network). Finally, the integrated deep model is utilized to align the entities in the edge-specific KGs. To demonstrate the superiority of the proposed method, we perform the experiment and evaluation compared to the state-of-the-art entity alignment methods with the two experimental datasets from DBpedia, YAGO, and wikidata. In the evaluation metrics of Hits@k, mean rank (MR), and mean reciprocal rank (MRR), the proposed method shows the best predictive and generalization performance for the KG entity alignment
On Psychoacoustically Weighted Cost Functions Towards Resource-Efficient Deep Neural Networks for Speech Denoising
We present a psychoacoustically enhanced cost function to balance network complexity and perceptual performance of deep neural networks for speech denoising. While training the network, we utilize perceptual weights added to the ordinary mean-squared error to emphasize contribution from frequency bins which are most audible while ignoring error from inaudible bins. To generate the weights, we employ psychoacoustic models to compute the global masking threshold from the clean speech spectra. We then evaluate the speech denoising performance of our perceptually guided neural network by using both objective and perceptual sound quality metrics, testing on various network structures ranging from shallow and narrow ones to deep and wide ones. The experimental results showcase our method as a valid approach for infusing perceptual significance to deep neural network operations. In particular, the more perceptually sensible enhancement in performance seen by simple neural network topologies proves that the proposed method can lead to resource-efficient speech denoising implementations in small devices without degrading the percieved signal fidelity