142 research outputs found
Hierarchical and Incremental Structural Entropy Minimization for Unsupervised Social Event Detection
As a trending approach for social event detection, graph neural network
(GNN)-based methods enable a fusion of natural language semantics and the
complex social network structural information, thus showing SOTA performance.
However, GNN-based methods can miss useful message correlations. Moreover, they
require manual labeling for training and predetermining the number of events
for prediction. In this work, we address social event detection via graph
structural entropy (SE) minimization. While keeping the merits of the GNN-based
methods, the proposed framework, HISEvent, constructs more informative message
graphs, is unsupervised, and does not require the number of events given a
priori. Specifically, we incrementally explore the graph neighborhoods using
1-dimensional (1D) SE minimization to supplement the existing message graph
with edges between semantically related messages. We then detect events from
the message graph by hierarchically minimizing 2-dimensional (2D) SE. Our
proposed 1D and 2D SE minimization algorithms are customized for social event
detection and effectively tackle the efficiency problem of the existing SE
minimization algorithms. Extensive experiments show that HISEvent consistently
outperforms GNN-based methods and achieves the new SOTA for social event
detection under both closed- and open-set settings while being efficient and
robust.Comment: Accepted to AAAI 202
CHITNet: A Complementary to Harmonious Information Transfer Network for Infrared and Visible Image Fusion
Current infrared and visible image fusion (IVIF) methods go to great lengths
to excavate complementary features and design complex fusion strategies, which
is extremely challenging. To this end, we rethink the IVIF outside the box,
proposing a complementary to harmonious information transfer network (CHITNet).
It reasonably transfers complementary information into harmonious one, which
integrates both the shared and complementary features from two modalities.
Specifically, to skillfully sidestep aggregating complementary information in
IVIF, we design a mutual information transfer (MIT) module to mutually
represent features from two modalities, roughly transferring complementary
information into harmonious one. Then, a harmonious information acquisition
supervised by source image (HIASSI) module is devised to further ensure the
complementary to harmonious information transfer after MIT. Meanwhile, we also
propose a structure information preservation (SIP) module to guarantee that the
edge structure information of the source images can be transferred to the
fusion results. Moreover, a mutual promotion training paradigm (MPTP) with
interaction loss is adopted to facilitate better collaboration among MIT,
HIASSI and SIP. In this way, the proposed method is able to generate fused
images with higher qualities. Extensive experimental results demonstrate the
superiority of our CHITNet over state-of-the-art algorithms in terms of visual
quality and quantitative evaluations
Adaptive multimodal continuous ant colony optimization
Seeking multiple optima simultaneously, which multimodal optimization aims at, has attracted increasing attention but remains challenging. Taking advantage of ant colony optimization algorithms in preserving high diversity, this paper intends to extend ant colony optimization algorithms to deal with multimodal optimization. First, combined with current niching methods, an adaptive multimodal continuous ant colony optimization algorithm is introduced. In this algorithm, an adaptive parameter adjustment is developed, which takes the difference among niches into consideration. Second, to accelerate convergence, a differential evolution mutation operator is alternatively utilized to build base vectors for ants to construct new solutions. Then, to enhance the exploitation, a local search scheme based on Gaussian distribution is self-adaptively performed around the seeds of niches. Together, the proposed algorithm affords a good balance between exploration and exploitation. Extensive experiments on 20 widely used benchmark multimodal functions are conducted to investigate the influence of each algorithmic component and results are compared with several state-of-the-art multimodal algorithms and winners of competitions on multimodal optimization. These comparisons demonstrate the competitive efficiency and effectiveness of the proposed algorithm, especially in dealing with complex problems with high numbers of local optima
A novel decomposed-ensemble time series forecasting framework: capturing underlying volatility information
Time series forecasting represents a significant and challenging task across
various fields. Recently, methods based on mode decomposition have dominated
the forecasting of complex time series because of the advantages of capturing
local characteristics and extracting intrinsic modes from data. Unfortunately,
most models fail to capture the implied volatilities that contain significant
information. To enhance the prediction of contemporary diverse and complex time
series, we propose a novel time series forecasting paradigm that integrates
decomposition with the capability to capture the underlying fluctuation
information of the series. In our methodology, we implement the Variational
Mode Decomposition algorithm to decompose the time series into K distinct
sub-modes. Following this decomposition, we apply the Generalized
Autoregressive Conditional Heteroskedasticity (GARCH) model to extract the
volatility information in these sub-modes. Subsequently, both the numerical
data and the volatility information for each sub-mode are harnessed to train a
neural network. This network is adept at predicting the information of the
sub-modes, and we aggregate the predictions of all sub-modes to generate the
final output. By integrating econometric and artificial intelligence methods,
and taking into account both the numerical and volatility information of the
time series, our proposed framework demonstrates superior performance in time
series forecasting, as evidenced by the significant decrease in MSE, RMSE, and
MAPE in our comparative experimental results
Generation and Recombination for Multifocus Image Fusion with Free Number of Inputs
Multifocus image fusion is an effective way to overcome the limitation of
optical lenses. Many existing methods obtain fused results by generating
decision maps. However, such methods often assume that the focused areas of the
two source images are complementary, making it impossible to achieve
simultaneous fusion of multiple images. Additionally, the existing methods
ignore the impact of hard pixels on fusion performance, limiting the visual
quality improvement of fusion image. To address these issues, a combining
generation and recombination model, termed as GRFusion, is proposed. In
GRFusion, focus property detection of each source image can be implemented
independently, enabling simultaneous fusion of multiple source images and
avoiding information loss caused by alternating fusion. This makes GRFusion
free from the number of inputs. To distinguish the hard pixels from the source
images, we achieve the determination of hard pixels by considering the
inconsistency among the detection results of focus areas in source images.
Furthermore, a multi-directional gradient embedding method for generating full
focus images is proposed. Subsequently, a hard-pixel-guided recombination
mechanism for constructing fused result is devised, effectively integrating the
complementary advantages of feature reconstruction-based method and focused
pixel recombination-based method. Extensive experimental results demonstrate
the effectiveness and the superiority of the proposed method.The source code
will be released on https://github.com/xxx/xxx
Adversarial Self-Attack Defense and Spatial-Temporal Relation Mining for Visible-Infrared Video Person Re-Identification
In visible-infrared video person re-identification (re-ID), extracting
features not affected by complex scenes (such as modality, camera views,
pedestrian pose, background, etc.) changes, and mining and utilizing motion
information are the keys to solving cross-modal pedestrian identity matching.
To this end, the paper proposes a new visible-infrared video person re-ID
method from a novel perspective, i.e., adversarial self-attack defense and
spatial-temporal relation mining. In this work, the changes of views, posture,
background and modal discrepancy are considered as the main factors that cause
the perturbations of person identity features. Such interference information
contained in the training samples is used as an adversarial perturbation. It
performs adversarial attacks on the re-ID model during the training to make the
model more robust to these unfavorable factors. The attack from the adversarial
perturbation is introduced by activating the interference information contained
in the input samples without generating adversarial samples, and it can be thus
called adversarial self-attack. This design allows adversarial attack and
defense to be integrated into one framework. This paper further proposes a
spatial-temporal information-guided feature representation network to use the
information in video sequences. The network cannot only extract the information
contained in the video-frame sequences but also use the relation of the local
information in space to guide the network to extract more robust features. The
proposed method exhibits compelling performance on large-scale cross-modality
video datasets. The source code of the proposed method will be released at
https://github.com/lhf12278/xxx.Comment: 11 pages,8 figure
Single-Image HDR Reconstruction Assisted Ghost Suppression and Detail Preservation Network for Multi-Exposure HDR Imaging
The reconstruction of high dynamic range (HDR) images from multi-exposure low
dynamic range (LDR) images in dynamic scenes presents significant challenges,
especially in preserving and restoring information in oversaturated regions and
avoiding ghosting artifacts. While current methods often struggle to address
these challenges, our work aims to bridge this gap by developing a
multi-exposure HDR image reconstruction network for dynamic scenes,
complemented by single-frame HDR image reconstruction. This network, comprising
single-frame HDR reconstruction with enhanced stop image (SHDR-ESI) and
SHDR-ESI-assisted multi-exposure HDR reconstruction (SHDRA-MHDR), effectively
leverages the ghost-free characteristic of single-frame HDR reconstruction and
the detail-enhancing capability of ESI in oversaturated areas. Specifically,
SHDR-ESI innovatively integrates single-frame HDR reconstruction with the
utilization of ESI. This integration not only optimizes the single image HDR
reconstruction process but also effectively guides the synthesis of
multi-exposure HDR images in SHDR-AMHDR. In this method, the single-frame HDR
reconstruction is specifically applied to reduce potential ghosting effects in
multiexposure HDR synthesis, while the use of ESI images assists in enhancing
the detail information in the HDR synthesis process. Technically, SHDR-ESI
incorporates a detail enhancement mechanism, which includes a
self-representation module and a mutual-representation module, designed to
aggregate crucial information from both reference image and ESI. To fully
leverage the complementary information from non-reference images, a feature
interaction fusion module is integrated within SHDRA-MHDR. Additionally, a
ghost suppression module, guided by the ghost-free results of SHDR-ESI, is
employed to suppress the ghosting artifacts.Comment: IEEE Transactions on Computational Imagin
- …