116,944 research outputs found

    MM DialogueGAT- A Fusion Graph Attention Network for Emotion Recognition using Multi-model System

    Get PDF
    Emotion recognition is an important part of human-computer interaction and human communication information is multi-model. Despite advancements in emotion recognition models, certain challenges persist. The first problem pertains to the predominant focus in existing research on mining the interaction information between modes and the context information in the dialogue process but neglects to mine the role information between multi-model states and context information in the dialogue process. The second problem is in the context information of the dialogue where the information is not completely transmitted in a temporal structure. Aiming at these two problems, we propose a multi-model fusion dialogue graph attention network (MM DialogueGAT). To solve the problem 1, the bidirectional GRU mechanism is used to extract the information from each model. In the multi-model information fusion problem, different model configurations and different combinations use the cross-model multi-head attention mechanism to establish a multi-head attention layer. Text, video and audio information are used as the main and auxiliary modes for information fusion. To solve the problem 2, in the temporal context information extraction problem, the GAT graph structure is used to capture the context information in the mode. The results show that our model achieves good results using the IMEOCAP datasets

    Common and Unique Feature Learning for Data Fusion

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.In today’s era of big data, information about a phenomenon of interest is available from multiple acquisitions. Data captured from each of these acquisition frameworks are commonly known as modality, where each modality provides information in a complementary manner. Despite the evident benefits and plethora of works on data fusion, two challenging issues persist, 1) feature representation: how to exploit the data diversity that multiple modalities offer, and 2) feature fusion: how to combine the heterogeneous information for better decision making. To address these challenges, this thesis presents a significantly improved model of two widely utilised fusion techniques, a) early fusion: combining features from multiple modalities for joint prediction, and b) late fusion: combining modality-specific predictions at the decision level. I illustrate how both these techniques have their own specific limitations, with late fusion unable to harness the inter-modality benefits, and the reliance of early fusion on a single model causing failure when information from any modality is futile. To overcome these drawbacks, I developed novel multimodal systems that performs feature extraction and feature fusion in a consolidated frameworks. Technically, I designed feature extraction schemes to capture both unique information from individual modalities and common information from multimode representations. I then combine these two kinds of information for supervised prediction, by designing efficient fusion schemes that enable this frameworks to perform information discovery and feature fusion simultaneously. In this thesis, I also demonstrated the benefits of fusing both the common and unique information in supervised learning and validate the significance of the developed techniques on multimodal, multiview, and multisource datasets. The designed methods leverage the multimodal benefits by creating additional diversity, and obtain a more unified view of the underlying phenomenon for better decision making

    Neural data search for table augmentation

    Get PDF
    Tabular data is widely available on the web and in private data lakes run by commercial companies or research institutes. However, data that is essential for a specific task at hand is often scattered throughout numerous tables in these data lakes. Accessing this data requires retrieving the relevant information for the task. One approach to retrieve this data is through table augmentation. Table augmentation adds an additional attribute to a query table and populates the values of that attribute with data from the data lake. My research focuses on evaluating methods for augmenting a table with an additional attribute. Table augmentation presents a variety of challenges due to the heterogeneity of data sources and the multitude of possible combinations of methods. To successfully augment a query table based on tabular data from a data lake, several tasks such as data normalization, data search, schema matching, information extraction and data fusion must be performed. In my work, I empirically compare methods for data search, information extraction and data fusion as well as complete table augmentation pipelines using different datasets containing tabular data found in real-world data lakes. Methodologically, I plan to introduce new neural techniques for data search, information extraction and data fusion in the context of table augmentation. These new methods, as well as existing symbolic data search methods for table augmentation, will be empirically evaluated on two sets of benchmark query tables. The aim is to identify task- and dataset-specific challenges for data search, information extraction and data fusion methods. By profiling the datasets and analysing the errors made by the evaluated methods on the test query tables, the strengths and weaknesses of the methods can be systematically identified. Data search and information extraction methods should maximize recall while data fusion methods should achieve high accuracy. Pipelines built on the basis of the new methods should deliver their results quickly without compromising the highest possible accuracy of the augmented attribute values

    Multisource and Multitemporal Data Fusion in Remote Sensing

    Get PDF
    The sharp and recent increase in the availability of data captured by different sensors combined with their considerably heterogeneous natures poses a serious challenge for the effective and efficient processing of remotely sensed data. Such an increase in remote sensing and ancillary datasets, however, opens up the possibility of utilizing multimodal datasets in a joint manner to further improve the performance of the processing approaches with respect to the application at hand. Multisource data fusion has, therefore, received enormous attention from researchers worldwide for a wide variety of applications. Moreover, thanks to the revisit capability of several spaceborne sensors, the integration of the temporal information with the spatial and/or spectral/backscattering information of the remotely sensed data is possible and helps to move from a representation of 2D/3D data to 4D data structures, where the time variable adds new information as well as challenges for the information extraction algorithms. There are a huge number of research works dedicated to multisource and multitemporal data fusion, but the methods for the fusion of different modalities have expanded in different paths according to each research community. This paper brings together the advances of multisource and multitemporal data fusion approaches with respect to different research communities and provides a thorough and discipline-specific starting point for researchers at different levels (i.e., students, researchers, and senior researchers) willing to conduct novel investigations on this challenging topic by supplying sufficient detail and references

    FECFusion: Infrared and visible image fusion network based on fast edge convolution

    Get PDF
    The purpose of infrared and visible image fusion is to integrate the complementary information from heterogeneous images in order to enhance their detailed scene information. However, existing deep learning fusion methods suffer from an imbalance between fusion performance and computational resource consumption. Additionally, fusion layers or fusion rules fail to effectively combine heteromodal feature information. To address these challenges, this paper presents a novel algorithm called infrared and visible image fusion network base on fast edge convolution (FECFusion). During the training phase, the proposed algorithm enhances the extraction of texture features in the source image through the utilization of structural re-parameterization edge convolution (RECB) with embedded edge operators. Subsequently, the attention fusion module (AFM) is employed to sufficiently fuze both unique and public information from the heteromodal features. In the inference stage, we further optimize the training network using the structural reparameterization technique, resulting in a VGG-like network architecture. This optimization improves the fusion speed while maintaining the fusion performance. To evaluate the performance of the proposed FECFusion algorithm, qualitative and quantitative experiments are conducted. Seven advanced fusion algorithms are compared using MSRS, TNO, and M3FD datasets. The results demonstrate that the fusion algorithm presented in this paper achieves superior performance in multiple evaluation metrics, while consuming fewer computational resources. Consequently, the proposed algorithm yields better visual results and provides richer scene detail information

    Bearing fault diagnosis using multidomain fusion-based vibration imaging and multitask learning.

    Get PDF
    Statistical features extraction from bearing fault signals requires a substantial level of knowledge and domain expertise. Furthermore, existing feature extraction techniques are mostly confined to selective feature extraction methods namely, time-domain, frequency-domain, or time-frequency domain statistical parameters. Vibration signals of bearing fault are highly non-linear and non-stationary making it cumbersome to extract relevant information for existing methodologies. This process even became more complicated when the bearing operates at variable speeds and load conditions. To address these challenges, this study develops an autonomous diagnostic system that combines signal-to-image transformation techniques for multi-domain information with convolutional neural network (CNN)-aided multitask learning (MTL). To address variable operating conditions, a composite color image is created by fusing information from multi-domains, such as the raw time-domain signal, the spectrum of the time-domain signal, and the envelope spectrum of the time-frequency analysis. This 2-D composite image, named multi-domain fusion-based vibration imaging (MDFVI), is highly effective in generating a unique pattern even with variable speeds and loads. Following that, these MDFVI images are fed to the proposed MTL-based CNN architecture to identify faults in variable speed and health conditions concurrently. The proposed method is tested on two benchmark datasets from the bearing experiment. The experimental results suggested that the proposed method outperformed state-of-the-arts in both datasets

    SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification

    Full text link
    Polarimetric synthetic aperture radar (PolSAR) images encompass valuable information that can facilitate extensive land cover interpretation and generate diverse output products. Extracting meaningful features from PolSAR data poses challenges distinct from those encountered in optical imagery. Deep learning (DL) methods offer effective solutions for overcoming these challenges in PolSAR feature extraction. Convolutional neural networks (CNNs) play a crucial role in capturing PolSAR image characteristics by leveraging kernel capabilities to consider local information and the complex-valued nature of PolSAR data. In this study, a novel three-branch fusion of complex-valued CNN, named the Shallow to Deep Feature Fusion Network (SDF2Net), is proposed for PolSAR image classification. To validate the performance of the proposed method, classification results are compared against multiple state-of-the-art approaches using the airborne synthetic aperture radar (AIRSAR) datasets of Flevoland and San Francisco, as well as the ESAR Oberpfaffenhofen dataset. The results indicate that the proposed approach demonstrates improvements in overallaccuracy, with a 1.3% and 0.8% enhancement for the AIRSAR datasets and a 0.5% improvement for the ESAR dataset. Analyses conducted on the Flevoland data underscore the effectiveness of the SDF2Net model, revealing a promising overall accuracy of 96.01% even with only a 1% sampling ratio

    TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis

    Full text link
    Multi-modal skin lesion diagnosis (MSLD) has achieved remarkable success by modern computer-aided diagnosis technology based on deep convolutions. However, the information aggregation across modalities in MSLD remains challenging due to severity unaligned spatial resolution (dermoscopic image and clinical image) and heterogeneous data (dermoscopic image and patients' meta-data). Limited by the intrinsic local attention, most recent MSLD pipelines using pure convolutions struggle to capture representative features in shallow layers, thus the fusion across different modalities is usually done at the end of the pipelines, even at the last layer, leading to an insufficient information aggregation. To tackle the issue, we introduce a pure transformer-based method, which we refer to as ``Throughout Fusion Transformer (TFormer)", for sufficient information intergration in MSLD. Different from the existing approaches with convolutions, the proposed network leverages transformer as feature extraction backbone, bringing more representative shallow features. We then carefully design a stack of dual-branch hierarchical multi-modal transformer (HMT) blocks to fuse information across different image modalities in a stage-by-stage way. With the aggregated information of image modalities, a multi-modal transformer post-fusion (MTP) block is designed to integrate features across image and non-image data. Such a strategy that information of the image modalities is firstly fused then the heterogeneous ones enables us to better divide and conquer the two major challenges while ensuring inter-modality dynamics are effectively modeled. Experiments conducted on the public Derm7pt dataset validate the superiority of the proposed method. Our TFormer outperforms other state-of-the-art methods. Ablation experiments also suggest the effectiveness of our designs
    • …
    corecore