116,944 research outputs found
MM DialogueGAT- A Fusion Graph Attention Network for Emotion Recognition using Multi-model System
Emotion recognition is an important part of human-computer interaction and human communication information is multi-model. Despite advancements in emotion recognition models, certain challenges persist. The first problem pertains to the predominant focus in existing research on mining the interaction information between modes and the context information in the dialogue process but neglects to mine the role information between multi-model states and context information in the dialogue process. The second problem is in the context information of the dialogue where the information is not completely transmitted in a temporal structure. Aiming at these two problems, we propose a multi-model fusion dialogue graph attention network (MM DialogueGAT). To solve the problem 1, the bidirectional GRU mechanism is used to extract the information from each model. In the multi-model information fusion problem, different model configurations and different combinations use the cross-model multi-head attention mechanism to establish a multi-head attention layer. Text, video and audio information are used as the main and auxiliary modes for information fusion. To solve the problem 2, in the temporal context information extraction problem, the GAT graph structure is used to capture the context information in the mode. The results show that our model achieves good results using the IMEOCAP datasets
Common and Unique Feature Learning for Data Fusion
University of Technology Sydney. Faculty of Engineering and Information Technology.In today’s era of big data, information about a phenomenon of interest is available from multiple acquisitions. Data captured from each of these acquisition frameworks are commonly known as modality, where each modality provides information in a complementary manner. Despite the evident benefits and plethora of works on data fusion, two challenging issues persist, 1) feature representation: how to exploit the data diversity that multiple modalities offer, and 2) feature fusion: how to combine the heterogeneous information for better decision making.
To address these challenges, this thesis presents a significantly improved model of two widely utilised fusion techniques, a) early fusion: combining features from multiple modalities for joint prediction, and b) late fusion: combining modality-specific predictions at the decision level. I illustrate how both these techniques have their own specific limitations, with late fusion unable to harness the inter-modality benefits, and the reliance of early fusion on a single model causing failure when information from any modality is futile. To overcome these drawbacks, I developed novel multimodal systems that performs feature extraction and feature fusion in a consolidated frameworks. Technically, I designed feature extraction schemes to capture both unique information from individual modalities and common information from multimode representations. I then combine these two kinds of information for supervised prediction, by designing efficient fusion schemes that enable this frameworks to perform information discovery and feature fusion simultaneously.
In this thesis, I also demonstrated the benefits of fusing both the common and unique information in supervised learning and validate the significance of the developed techniques on multimodal, multiview, and multisource datasets. The designed methods leverage the multimodal benefits by creating additional diversity, and obtain a more unified view of the underlying phenomenon for better decision making
Neural data search for table augmentation
Tabular data is widely available on the web and in private data lakes run by commercial companies or research institutes. However, data that is essential for a specific task at hand is often scattered throughout numerous tables in these data lakes. Accessing this data requires retrieving the relevant information for the task. One approach to retrieve this data is through table augmentation. Table augmentation adds an additional attribute to a query table and populates the values of that attribute with data from the data lake. My research focuses on evaluating methods for augmenting a table with an additional attribute. Table augmentation presents a variety of challenges due to the heterogeneity of data sources and the multitude of possible combinations of methods. To successfully augment a query table based on tabular data from a data lake, several tasks such as data normalization, data search, schema matching, information extraction and data fusion must be performed. In my work, I empirically compare methods for data search, information extraction and data fusion as well as complete table augmentation pipelines using different datasets containing tabular data found in real-world data lakes. Methodologically, I plan to introduce new neural techniques for data search, information extraction and data fusion in the context of table augmentation. These new methods, as well as existing symbolic data search methods for table augmentation, will be empirically evaluated on two sets of benchmark query tables. The aim is to identify task- and dataset-specific challenges for data search, information extraction and data fusion methods. By profiling the datasets and analysing the errors made by the evaluated methods on the test query tables, the strengths and weaknesses of the methods can be systematically identified. Data search and information extraction methods should maximize recall while data fusion methods should achieve high accuracy. Pipelines built on the basis of the new methods should deliver their results quickly without compromising the highest possible accuracy of the augmented attribute values
Multisource and Multitemporal Data Fusion in Remote Sensing
The sharp and recent increase in the availability of data captured by
different sensors combined with their considerably heterogeneous natures poses
a serious challenge for the effective and efficient processing of remotely
sensed data. Such an increase in remote sensing and ancillary datasets,
however, opens up the possibility of utilizing multimodal datasets in a joint
manner to further improve the performance of the processing approaches with
respect to the application at hand. Multisource data fusion has, therefore,
received enormous attention from researchers worldwide for a wide variety of
applications. Moreover, thanks to the revisit capability of several spaceborne
sensors, the integration of the temporal information with the spatial and/or
spectral/backscattering information of the remotely sensed data is possible and
helps to move from a representation of 2D/3D data to 4D data structures, where
the time variable adds new information as well as challenges for the
information extraction algorithms. There are a huge number of research works
dedicated to multisource and multitemporal data fusion, but the methods for the
fusion of different modalities have expanded in different paths according to
each research community. This paper brings together the advances of multisource
and multitemporal data fusion approaches with respect to different research
communities and provides a thorough and discipline-specific starting point for
researchers at different levels (i.e., students, researchers, and senior
researchers) willing to conduct novel investigations on this challenging topic
by supplying sufficient detail and references
FECFusion: Infrared and visible image fusion network based on fast edge convolution
The purpose of infrared and visible image fusion is to integrate the complementary information from heterogeneous images in order to enhance their detailed scene information. However, existing deep learning fusion methods suffer from an imbalance between fusion performance and computational resource consumption. Additionally, fusion layers or fusion rules fail to effectively combine heteromodal feature information. To address these challenges, this paper presents a novel algorithm called infrared and visible image fusion network base on fast edge convolution (FECFusion). During the training phase, the proposed algorithm enhances the extraction of texture features in the source image through the utilization of structural re-parameterization edge convolution (RECB) with embedded edge operators. Subsequently, the attention fusion module (AFM) is employed to sufficiently fuze both unique and public information from the heteromodal features. In the inference stage, we further optimize the training network using the structural reparameterization technique, resulting in a VGG-like network architecture. This optimization improves the fusion speed while maintaining the fusion performance. To evaluate the performance of the proposed FECFusion algorithm, qualitative and quantitative experiments are conducted. Seven advanced fusion algorithms are compared using MSRS, TNO, and M3FD datasets. The results demonstrate that the fusion algorithm presented in this paper achieves superior performance in multiple evaluation metrics, while consuming fewer computational resources. Consequently, the proposed algorithm yields better visual results and provides richer scene detail information
Bearing fault diagnosis using multidomain fusion-based vibration imaging and multitask learning.
Statistical features extraction from bearing fault signals requires a substantial level of knowledge and domain expertise. Furthermore, existing feature extraction techniques are mostly confined to selective feature extraction methods namely, time-domain, frequency-domain, or time-frequency domain statistical parameters. Vibration signals of bearing fault are highly non-linear and non-stationary making it cumbersome to extract relevant information for existing methodologies. This process even became more complicated when the bearing operates at variable speeds and load conditions. To address these challenges, this study develops an autonomous diagnostic system that combines signal-to-image transformation techniques for multi-domain information with convolutional neural network (CNN)-aided multitask learning (MTL). To address variable operating conditions, a composite color image is created by fusing information from multi-domains, such as the raw time-domain signal, the spectrum of the time-domain signal, and the envelope spectrum of the time-frequency analysis. This 2-D composite image, named multi-domain fusion-based vibration imaging (MDFVI), is highly effective in generating a unique pattern even with variable speeds and loads. Following that, these MDFVI images are fed to the proposed MTL-based CNN architecture to identify faults in variable speed and health conditions concurrently. The proposed method is tested on two benchmark datasets from the bearing experiment. The experimental results suggested that the proposed method outperformed state-of-the-arts in both datasets
SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification
Polarimetric synthetic aperture radar (PolSAR) images encompass valuable
information that can facilitate extensive land cover interpretation and
generate diverse output products. Extracting meaningful features from PolSAR
data poses challenges distinct from those encountered in optical imagery. Deep
learning (DL) methods offer effective solutions for overcoming these challenges
in PolSAR feature extraction. Convolutional neural networks (CNNs) play a
crucial role in capturing PolSAR image characteristics by leveraging kernel
capabilities to consider local information and the complex-valued nature of
PolSAR data. In this study, a novel three-branch fusion of complex-valued CNN,
named the Shallow to Deep Feature Fusion Network (SDF2Net), is proposed for
PolSAR image classification. To validate the performance of the proposed
method, classification results are compared against multiple state-of-the-art
approaches using the airborne synthetic aperture radar (AIRSAR) datasets of
Flevoland and San Francisco, as well as the ESAR Oberpfaffenhofen dataset. The
results indicate that the proposed approach demonstrates improvements in
overallaccuracy, with a 1.3% and 0.8% enhancement for the AIRSAR datasets and a
0.5% improvement for the ESAR dataset. Analyses conducted on the Flevoland data
underscore the effectiveness of the SDF2Net model, revealing a promising
overall accuracy of 96.01% even with only a 1% sampling ratio
TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis
Multi-modal skin lesion diagnosis (MSLD) has achieved remarkable success by
modern computer-aided diagnosis technology based on deep convolutions. However,
the information aggregation across modalities in MSLD remains challenging due
to severity unaligned spatial resolution (dermoscopic image and clinical image)
and heterogeneous data (dermoscopic image and patients' meta-data). Limited by
the intrinsic local attention, most recent MSLD pipelines using pure
convolutions struggle to capture representative features in shallow layers,
thus the fusion across different modalities is usually done at the end of the
pipelines, even at the last layer, leading to an insufficient information
aggregation. To tackle the issue, we introduce a pure transformer-based method,
which we refer to as ``Throughout Fusion Transformer (TFormer)", for sufficient
information intergration in MSLD. Different from the existing approaches with
convolutions, the proposed network leverages transformer as feature extraction
backbone, bringing more representative shallow features. We then carefully
design a stack of dual-branch hierarchical multi-modal transformer (HMT) blocks
to fuse information across different image modalities in a stage-by-stage way.
With the aggregated information of image modalities, a multi-modal transformer
post-fusion (MTP) block is designed to integrate features across image and
non-image data. Such a strategy that information of the image modalities is
firstly fused then the heterogeneous ones enables us to better divide and
conquer the two major challenges while ensuring inter-modality dynamics are
effectively modeled. Experiments conducted on the public Derm7pt dataset
validate the superiority of the proposed method. Our TFormer outperforms other
state-of-the-art methods. Ablation experiments also suggest the effectiveness
of our designs
- …