14 research outputs found
Robust Sensor Fusion for Indoor Wireless Localization
Location knowledge in indoor environment using Indoor Positioning Systems
(IPS) has become very useful and popular in recent years. Indoor wireless
localization suffers from severe multi-path fading and non-line-of-sight
conditions. This paper presents a novel indoor localization framework based on
sensor fusion of Zigbee Wireless Sensor Networks (WSN) using Received Signal
Strength (RSS). The unknown position is equipped with two or more mobile nodes.
The range between two mobile nodes is fixed as priori. The attitude (roll,
pitch, and yaw) of the mobile node are measured by inertial sensors (ISs). Then
the angle and the range between any two nodes can be obtained, and thus the
path between the two nodes can be modeled as a curve. Through an efficient
cooperation between two or more mobile nodes, this framework effectively
exploits the RSS techniques. This constraint help improve the positioning
accuracy. Theoretical analysis on localization distortion and Monte Carlo
simulations shows that the proposed cooperative strategy of multiple nodes with
extended Kalman filter (EKF) achieves significantly higher positioning accuracy
than the existing systems, especially in heavily obstructed scenarios
Quaternion MLP Neural Networks Based on the Maximum Correntropy Criterion
We propose a gradient ascent algorithm for quaternion multilayer perceptron
(MLP) networks based on the cost function of the maximum correntropy criterion
(MCC). In the algorithm, we use the split quaternion activation function based
on the generalized Hamilton-real quaternion gradient. By introducing a new
quaternion operator, we first rewrite the early quaternion single layer
perceptron algorithm. Secondly, we propose a gradient descent algorithm for
quaternion multilayer perceptron based on the cost function of the mean square
error (MSE). Finally, the MSE algorithm is extended to the MCC algorithm.
Simulations show the feasibility of the proposed method
Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment
The estimation of non-Gaussian measurement noise models is a significant
challenge across various fields. In practical applications, it often faces
challenges due to the large number of parameters and high computational
complexity. This paper proposes a threshold-based Kalman filtering approach for
online estimation of noise parameters in non-Gaussian measurement noise models.
This method uses a certain amount of sample data to infer the variance
threshold of observation parameters and employs variational Bayesian estimation
to obtain corresponding noise variance estimates, enabling subsequent
iterations of the Kalman filtering algorithm. Finally, we evaluate the
performance of this algorithm through simulation experiments, demonstrating its
accurate and effective estimation of state and noise parameters.Comment: 5 pages, conferenc
DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection
Anomaly detection has garnered extensive applications in real industrial
manufacturing due to its remarkable effectiveness and efficiency. However,
previous generative-based models have been limited by suboptimal reconstruction
quality, hampering their overall performance. A fundamental enhancement lies in
our reformulation of the reconstruction process using a diffusion model into a
noise-to-norm paradigm. Here, anomalous regions are perturbed with Gaussian
noise and reconstructed as normal, overcoming the limitations of previous
models by facilitating anomaly-free restoration. Additionally, we propose a
rapid one-step denoising paradigm, significantly faster than the traditional
iterative denoising in diffusion models. Furthermore, the introduction of the
norm-guided paradigm elevates the accuracy and fidelity of reconstructions. The
segmentation sub-network predicts pixel-level anomaly scores using the input
image and its anomaly-free restoration. Comprehensive evaluations on four
standard and challenging benchmarks reveal that DiffusionAD outperforms current
state-of-the-art approaches, demonstrating the effectiveness and broad
applicability of the proposed pipeline.Comment: 14 pages, 12 figure
Prototypical Residual Networks for Anomaly Detection and Localization
Anomaly detection and localization are widely used in industrial
manufacturing for its efficiency and effectiveness. Anomalies are rare and hard
to collect and supervised models easily over-fit to these seen anomalies with a
handful of abnormal samples, producing unsatisfactory performance. On the other
hand, anomalies are typically subtle, hard to discern, and of various
appearance, making it difficult to detect anomalies and let alone locate
anomalous regions. To address these issues, we propose a framework called
Prototypical Residual Network (PRN), which learns feature residuals of varying
scales and sizes between anomalous and normal patterns to accurately
reconstruct the segmentation maps of anomalous regions. PRN mainly consists of
two parts: multi-scale prototypes that explicitly represent the residual
features of anomalies to normal patterns; a multisize self-attention mechanism
that enables variable-sized anomalous feature learning. Besides, we present a
variety of anomaly generation strategies that consider both seen and unseen
appearance variance to enlarge and diversify anomalies. Extensive experiments
on the challenging and widely used MVTec AD benchmark show that PRN outperforms
current state-of-the-art unsupervised and supervised methods. We further report
SOTA results on three additional datasets to demonstrate the effectiveness and
generalizability of PRN.Comment: Accepted by CVPR 202
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Diffusion models have achieved significant success in image and video
generation. This motivates a growing interest in video editing tasks, where
videos are edited according to provided text descriptions. However, most
existing approaches only focus on video editing for short clips and rely on
time-consuming tuning or inference. We are the first to propose Video
Instruction Diffusion (VIDiff), a unified foundation model designed for a wide
range of video tasks. These tasks encompass both understanding tasks (such as
language-guided video object segmentation) and generative tasks (video editing
and enhancement). Our model can edit and translate the desired results within
seconds based on user instructions. Moreover, we design an iterative
auto-regressive method to ensure consistency in editing and enhancing long
videos. We provide convincing generative results for diverse input videos and
written instructions, both qualitatively and quantitatively. More examples can
be found at our website https://ChenHsing.github.io/VIDiff
AdaDiff: Adaptive Step Selection for Fast Diffusion
Diffusion models, as a type of generative models, have achieved impressive
results in generating images and videos conditioned on textual conditions.
However, the generation process of diffusion models involves denoising for
dozens of steps to produce photorealistic images/videos, which is
computationally expensive. Unlike previous methods that design
``one-size-fits-all'' approaches for speed up, we argue denoising steps should
be sample-specific conditioned on the richness of input texts. To this end, we
introduce AdaDiff, a lightweight framework designed to learn instance-specific
step usage policies, which are then used by the diffusion model for generation.
AdaDiff is optimized using a policy gradient method to maximize a carefully
designed reward function, balancing inference time and generation quality. We
conduct experiments on three image generation and two video generation
benchmarks and demonstrate that our approach achieves similar results in terms
of visual quality compared to the baseline using a fixed 50 denoising steps
while reducing inference time by at least 33%, going as high as 40%.
Furthermore, our qualitative analysis shows that our method allocates more
steps to more informative text conditions and fewer steps to simpler text
conditions.Comment: 10 pages, 5 figure
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models
Latent Diffusion Models (LDMs) are renowned for their powerful capabilities
in image and video synthesis. Yet, video editing methods suffer from
insufficient pre-training data or video-by-video re-training cost. In
addressing this gap, we propose FLDM (Fused Latent Diffusion Model), a
training-free framework to achieve text-guided video editing by applying
off-the-shelf image editing methods in video LDMs. Specifically, FLDM fuses
latents from an image LDM and an video LDM during the denoising process. In
this way, temporal consistency can be kept with video LDM while high-fidelity
from the image LDM can also be exploited. Meanwhile, FLDM possesses high
flexibility since both image LDM and video LDM can be replaced so advanced
image editing methods such as InstructPix2Pix and ControlNet can be exploited.
To the best of our knowledge, FLDM is the first method to adapt off-the-shelf
image editing methods into video LDMs for video editing. Extensive quantitative
and qualitative experiments demonstrate that FLDM can improve the textual
alignment and temporal consistency of edited videos