303 research outputs found
Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation
Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which
aims to generate a natural language sentence for a multimodal social post (an
image as well as its caption) to explain why it contains sarcasm. Although the
existing pioneer study has achieved great success with the BART backbone, it
overlooks the gap between the visual feature space and the decoder semantic
space, the object-level metadata of the image, as well as the potential
external knowledge. To solve these limitations, in this work, we propose a
novel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme,
named TEAM. In particular, TEAM extracts the object-level semantic meta-data
instead of the traditional global visual features from the input image.
Meanwhile, TEAM resorts to ConceptNet to obtain the external related knowledge
concepts for the input text and the extracted object meta-data. Thereafter,
TEAM introduces a multi-source semantic graph that comprehensively characterize
the multi-source (i.e., caption, object meta-data, external knowledge) semantic
relations to facilitate the sarcasm reasoning. Extensive experiments on a
public released dataset MORE verify the superiority of our model over
cutting-edge methods.Comment: Accepted by ACL 2023 main conferenc
Target-Guided Composed Image Retrieval
Composed image retrieval (CIR) is a new and flexible image retrieval
paradigm, which can retrieve the target image for a multimodal query, including
a reference image and its corresponding modification text. Although existing
efforts have achieved compelling success, they overlook the conflict
relationship modeling between the reference image and the modification text for
improving the multimodal query composition and the adaptive matching degree
modeling for promoting the ranking of the candidate images that could present
different levels of matching degrees with the given query. To address these two
limitations, in this work, we propose a Target-Guided Composed Image Retrieval
network (TG-CIR). In particular, TG-CIR first extracts the unified global and
local attribute features for the reference/target image and the modification
text with the contrastive language-image pre-training model (CLIP) as the
backbone, where an orthogonal regularization is introduced to promote the
independence among the attribute features. Then TG-CIR designs a target-query
relationship-guided multimodal query composition module, comprising a
target-free student composition branch and a target-based teacher composition
branch, where the target-query relationship is injected into the teacher branch
for guiding the conflict relationship modeling of the student branch. Last,
apart from the conventional batch-based classification loss, TG-CIR
additionally introduces a batch-based target similarity-guided matching degree
regularization to promote the metric learning process. Extensive experiments on
three benchmark datasets demonstrate the superiority of our proposed method
OFAR: A Multimodal Evidence Retrieval Framework for Illegal Live-streaming Identification
Illegal live-streaming identification, which aims to help live-streaming
platforms immediately recognize the illegal behaviors in the live-streaming,
such as selling precious and endangered animals, plays a crucial role in
purifying the network environment. Traditionally, the live-streaming platform
needs to employ some professionals to manually identify the potential illegal
live-streaming. Specifically, the professional needs to search for related
evidence from a large-scale knowledge database for evaluating whether a given
live-streaming clip contains illegal behavior, which is time-consuming and
laborious. To address this issue, in this work, we propose a multimodal
evidence retrieval system, named OFAR, to facilitate the illegal live-streaming
identification. OFAR consists of three modules: Query Encoder, Document
Encoder, and MaxSim-based Contrastive Late Intersection. Both query encoder and
document encoder are implemented with the advanced OFA encoder, which is
pretrained on a large-scale multimodal dataset. In the last module, we
introduce contrastive learning on the basis of the MaxiSim-based late
intersection, to enhance the model's ability of query-document matching. The
proposed framework achieves significant improvement on our industrial dataset
TaoLive, demonstrating the advances of our scheme
Multi-objective Optimization of Space-Air-Ground Integrated Network Slicing Relying on a Pair of Central and Distributed Learning Algorithms
As an attractive enabling technology for next-generation wireless
communications, network slicing supports diverse customized services in the
global space-air-ground integrated network (SAGIN) with diverse resource
constraints. In this paper, we dynamically consider three typical classes of
radio access network (RAN) slices, namely high-throughput slices, low-delay
slices and wide-coverage slices, under the same underlying physical SAGIN. The
throughput, the service delay and the coverage area of these three classes of
RAN slices are jointly optimized in a non-scalar form by considering the
distinct channel features and service advantages of the terrestrial, aerial and
satellite components of SAGINs. A joint central and distributed multi-agent
deep deterministic policy gradient (CDMADDPG) algorithm is proposed for solving
the above problem to obtain the Pareto optimal solutions. The algorithm first
determines the optimal virtual unmanned aerial vehicle (vUAV) positions and the
inter-slice sub-channel and power sharing by relying on a centralized unit.
Then it optimizes the intra-slice sub-channel and power allocation, and the
virtual base station (vBS)/vUAV/virtual low earth orbit (vLEO) satellite
deployment in support of three classes of slices by three separate distributed
units. Simulation results verify that the proposed method approaches the
Pareto-optimal exploitation of multiple RAN slices, and outperforms the
benchmarkers.Comment: 19 pages, 14 figures, journa
Novel approach of electroshock treatment for defect repair in near-β titanium alloy manufactured via directed energy deposition
© 2021, The Minerals, Metals & Materials Society and ASM International. A subsecond and novel approach of electroshock treatment (EST) is used in this study to repair defects in directed-energy-deposited Ti-5Al-5Mo-5V-3Cr-1Zr near-β titanium alloy. After EST, the porosity of the specimen decreased significantly from 0.81 to 0.1 pct. Large cracks observed at the bottom of the above mentioned near-β titanium alloy became intermittent small cracks and the number of voids decreased. The defects in the top and middle regions of the specimens are repaired. The potential defect repair is attributable to energy concentration, which promoted the coalescence of defect tips, and thermal stresses, which compressed the defects inward and closed them
- …