708 research outputs found
Deep Graph Embedding for IoT Botnet Traffic Detection
Botnet attacks have mainly targeted computers in the past, which is a fundamental cybersecurity problem. Due to the booming of Internet of things (IoT) devices, an increasing number of botnet attacks are now targeting IoT devices. Researchers have proposed several mechanisms to avoid botnet attacks, such as identification by communication patterns or network topology and defence by DNS blacklisting. A popular direction for botnet detection currently relies on the specific topological characteristics of botnets and uses machine learning models. However, it relies on network experts’ domain knowledge for feature engineering. Recently, neural networks have shown the capability of representation learning. This paper proposes a new approach to extracting graph features via graph neural networks. To capture the particular topology of the botnet, we transform the network traffic into graphs and train a graph neural network to extract features. In our evaluations, we use graph embedding features to train six machine learning models and compare them with the performance of traditional graph features in identifying botnet nodes. The experimental results show that botnet traffic detection is still challenging even with neural networks. We should consider the impact of data, features, and algorithms for an accurate and robust solution
Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation
Large Language Models (LLMs) have been demonstrated effective for code
generation. Due to the complexity and opacity of LLMs, little is known about
how these models generate code. To deepen our understanding, we investigate
whether LLMs attend to the same parts of a natural language description as
human programmers during code generation. An analysis of five LLMs on a popular
benchmark, HumanEval, revealed a consistent misalignment between LLMs' and
programmers' attention. Furthermore, we found that there is no correlation
between the code generation accuracy of LLMs and their alignment with human
programmers. Through a quantitative experiment and a user study, we confirmed
that, among twelve different attention computation methods, attention computed
by the perturbation-based method is most aligned with human attention and is
constantly favored by human programmers. Our findings highlight the need for
human-aligned LLMs for better interpretability and programmer trust.Comment: 13 pages, 8 figures, 7 table
Exploring the axion potential and axion walls in dense quark matter
We study the potential of the Quantum Chromodynamics axion in hot and/or
dense quark matter, within a Nambu-Jona-Lasinio-like model that includes the
coupling of the axion to quarks. Differently from previous studies, we
implement local electrical neutrality and equilibrium, which are
relevant for the description of the quark matter in the core of compact stellar
objects. Firstly we compute the effects of the chiral crossover on the axion
mass and self-coupling. We find that the low energy properties of axion are
very sensitive to the phase transition of Quantum Chromodynamics, in
particular, when the bulk quark matter is close to criticality. Then, for the
first time in the literature we compute the axion potential at finite quark
chemical potential and study the axion domain walls in bulk quark matter. We
find that the energy barrier between two adjacent vacuum states decrease in the
chirally restored phase: this results in a lower surface tension of the walls.
Finally, we comment on the possibility of production of walls in dense quark
matter.Comment: 10 pages, 7 figure
Towards Consistent Video Editing with Text-to-Image Diffusion Models
Existing works have advanced Text-to-Image (TTI) diffusion models for video
editing in a one-shot learning manner. Despite their low requirements of data
and computation, these methods might produce results of unsatisfied consistency
with text prompt as well as temporal sequence, limiting their applications in
the real world. In this paper, we propose to address the above issues with a
novel EI model towards \textbf{E}nhancing v\textbf{I}deo \textbf{E}diting
cons\textbf{I}stency of TTI-based frameworks. Specifically, we analyze and find
that the inconsistent problem is caused by newly added modules into TTI models
for learning temporal information. These modules lead to covariate shift in the
feature space, which harms the editing capability. Thus, we design EI to
tackle the above drawbacks with two classical modules: Shift-restricted
Temporal Attention Module (STAM) and Fine-coarse Frame Attention Module (FFAM).
First, through theoretical analysis, we demonstrate that covariate shift is
highly related to Layer Normalization, thus STAM employs a \textit{Instance
Centering} layer replacing it to preserve the distribution of temporal
features. In addition, {STAM} employs an attention layer with normalized
mapping to transform temporal features while constraining the variance shift.
As the second part, we incorporate {STAM} with a novel {FFAM}, which
efficiently leverages fine-coarse spatial information of overall frames to
further enhance temporal consistency. Extensive experiments demonstrate the
superiority of the proposed EI model for text-driven video editing
DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration
Blind face restoration (BFR) is important while challenging. Prior works
prefer to exploit GAN-based frameworks to tackle this task due to the balance
of quality and efficiency. However, these methods suffer from poor stability
and adaptability to long-tail distribution, failing to simultaneously retain
source identity and restore detail. We propose DiffBFR to introduce Diffusion
Probabilistic Model (DPM) for BFR to tackle the above problem, given its
superiority over GAN in aspects of avoiding training collapse and generating
long-tail distribution. DiffBFR utilizes a two-step design, that first restores
identity information from low-quality images and then enhances texture details
according to the distribution of real faces. This design is implemented with
two key components: 1) Identity Restoration Module (IRM) for preserving the
face details in results. Instead of denoising from pure Gaussian random
distribution with LQ images as the condition during the reverse process, we
propose a novel truncated sampling method which starts from LQ images with part
noise added. We theoretically prove that this change shrinks the evidence lower
bound of DPM and then restores more original details. With theoretical proof,
two cascade conditional DPMs with different input sizes are introduced to
strengthen this sampling effect and reduce training difficulty in the
high-resolution image generated directly. 2) Texture Enhancement Module (TEM)
for polishing the texture of the image. Here an unconditional DPM, a LQ-free
model, is introduced to further force the restorations to appear realistic. We
theoretically proved that this unconditional DPM trained on pure HQ images
contributes to justifying the correct distribution of inference images output
from IRM in pixel-level space. Truncated sampling with fractional time step is
utilized to polish pixel-level textures while preserving identity information
BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering
Developing blind video deflickering (BVD) algorithms to enhance video
temporal consistency, is gaining importance amid the flourish of image
processing and video generation. However, the intricate nature of video data
complicates the training of deep learning methods, leading to high resource
consumption and instability, notably under severe lighting flicker. This
underscores the critical need for a compact representation beyond pixel values
to advance BVD research and applications. Inspired by the classic scale-time
equalization (STE), our work introduces the histogram-assisted solution, called
BlazeBVD, for high-fidelity and rapid BVD. Compared with STE, which directly
corrects pixel values by temporally smoothing color histograms, BlazeBVD
leverages smoothed illumination histograms within STE filtering to ease the
challenge of learning temporal data using neural networks. In technique,
BlazeBVD begins by condensing pixel values into illumination histograms that
precisely capture flickering and local exposure variations. These histograms
are then smoothed to produce singular frames set, filtered illumination maps,
and exposure maps. Resorting to these deflickering priors, BlazeBVD utilizes a
2D network to restore faithful and consistent texture impacted by lighting
changes or localized exposure issues. BlazeBVD also incorporates a lightweight
3D network to amend slight temporal inconsistencies, avoiding the resource
consumption issue. Comprehensive experiments on synthetic, real-world and
generated videos, showcase the superior qualitative and quantitative results of
BlazeBVD, achieving inference speeds up to 10x faster than state-of-the-arts
Quarterly GDP forecast based on coupled economic and energy feature WA-LSTM model
Existing macroeconomic forecasting methods primarily focus on the characteristics of economic data, but they overlook the energy-related features concealed behind these economic characteristics, which may lead to inaccurate GDP predictions. Therefore, this paper meticulously analyzes the relationship between energy big data and economic data indicators, explores the coupling feature mining of energy big data and economic data, and constructs features coupling economic and energy data. Targeting the nonlinear variation coupling features in China’s quarterly GDP data and using the long short-term memory (LSTM) neural network model based on deep learning, we employ wavelet analysis technology (WA) to decompose selected macroeconomic variables and construct a prediction model combining LSTM and WA, which is further compared with multiple benchmark models. The research findings show that, in terms of quarterly GDP data prediction, the combined deep learning model and wavelet analysis significantly outperform other methods. When processing structurally complex, nonlinear, and multi-variable data, the LSTM and WA combined prediction model demonstrate better generalization capabilities, with its prediction accuracy generally surpassing other benchmark models
Catalytic Isomerization of Olefins and Their Derivatives: A Brief Overview
Carbon–carbon double bond (CCDB) isomerization is a method for synthesizing new organic compounds from olefins and their derivatives, which was based on C=C migration along carbon chain and cis/trans transform, and it plays a vital role in the fields of organic synthesis, synthesis of daily chemicals, raw oil’s development and synthesis of natural products and so on. In this paper, advances of five types of catalytic methods for CCDB of olefins and their derivatives since the 1960s were discussed in detail; Based on his recent work, the author mainly introduces the application and development of photocatalysis in CCDB of olefins and their derivatives
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
Recent works in implicit representations, such as Neural Radiance Fields
(NeRF), have advanced the generation of realistic and animatable head avatars
from video sequences. These implicit methods are still confronted by visual
artifacts and jitters, since the lack of explicit geometric constraints poses a
fundamental challenge in accurately modeling complex facial deformations. In
this paper, we introduce Dynamic Tetrahedra (DynTet), a novel hybrid
representation that encodes explicit dynamic meshes by neural networks to
ensure geometric consistency across various motions and viewpoints. DynTet is
parameterized by the coordinate-based networks which learn signed distance,
deformation, and material texture, anchoring the training data into a
predefined tetrahedra grid. Leveraging Marching Tetrahedra, DynTet efficiently
decodes textured meshes with a consistent topology, enabling fast rendering
through a differentiable rasterizer and supervision via a pixel loss. To
enhance training efficiency, we incorporate classical 3D Morphable Models to
facilitate geometry learning and define a canonical space for simplifying
texture learning. These advantages are readily achievable owing to the
effective geometric representation employed in DynTet. Compared with prior
works, DynTet demonstrates significant improvements in fidelity, lip
synchronization, and real-time performance according to various metrics. Beyond
producing stable and visually appealing synthesis videos, our method also
outputs the dynamic meshes which is promising to enable many emerging
applications.Comment: CVPR 202
- …