44 research outputs found
Learning Gradient Fields for Scalable and Generalizable Irregular Packing
The packing problem, also known as cutting or nesting, has diverse
applications in logistics, manufacturing, layout design, and atlas generation.
It involves arranging irregularly shaped pieces to minimize waste while
avoiding overlap. Recent advances in machine learning, particularly
reinforcement learning, have shown promise in addressing the packing problem.
In this work, we delve deeper into a novel machine learning-based approach that
formulates the packing problem as conditional generative modeling. To tackle
the challenges of irregular packing, including object validity constraints and
collision avoidance, our method employs the score-based diffusion model to
learn a series of gradient fields. These gradient fields encode the
correlations between constraint satisfaction and the spatial relationships of
polygons, learned from teacher examples. During the testing phase, packing
solutions are generated using a coarse-to-fine refinement mechanism guided by
the learned gradient fields. To enhance packing feasibility and optimality, we
introduce two key architectural designs: multi-scale feature extraction and
coarse-to-fine relation extraction. We conduct experiments on two typical
industrial packing domains, considering translations only. Empirically, our
approach demonstrates spatial utilization rates comparable to, or even
surpassing, those achieved by the teacher algorithm responsible for training
data generation. Additionally, it exhibits some level of generalization to
shape variations. We are hopeful that this method could pave the way for new
possibilities in solving the packing problem
The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling
With the incorporation of the UNet architecture, diffusion probabilistic
models have become a dominant force in image generation tasks. One key design
in UNet is the skip connections between the encoder and decoder blocks.
Although skip connections have been shown to improve training stability and
model performance, we reveal that such shortcuts can be a limiting factor for
the complexity of the transformation. As the sampling steps decrease, the
generation process and the role of the UNet get closer to the push-forward
transformations from Gaussian distribution to the target, posing a challenge
for the network's complexity. To address this challenge, we propose
Skip-Tuning, a simple yet surprisingly effective training-free tuning method on
the skip connections. Our method can achieve 100% FID improvement for
pretrained EDM on ImageNet 64 with only 19 NFEs (1.75), breaking the limit of
ODE samplers regardless of sampling steps. Surprisingly, the improvement
persists when we increase the number of sampling steps and can even surpass the
best result from EDM-2 (1.58) with only 39 NFEs (1.57). Comprehensive
exploratory experiments are conducted to shed light on the surprising
effectiveness. We observe that while Skip-Tuning increases the score-matching
losses in the pixel space, the losses in the feature space are reduced,
particularly at intermediate noise levels, which coincide with the most
effective range accounting for image quality improvement
Evidential Detection and Tracking Collaboration: New Problem, Benchmark and Algorithm for Robust Anti-UAV System
Unmanned Aerial Vehicles (UAVs) have been widely used in many areas,
including transportation, surveillance, and military. However, their potential
for safety and privacy violations is an increasing issue and highly limits
their broader applications, underscoring the critical importance of UAV
perception and defense (anti-UAV). Still, previous works have simplified such
an anti-UAV task as a tracking problem, where the prior information of UAVs is
always provided; such a scheme fails in real-world anti-UAV tasks (i.e. complex
scenes, indeterminate-appear and -reappear UAVs, and real-time UAV
surveillance). In this paper, we first formulate a new and practical anti-UAV
problem featuring the UAVs perception in complex scenes without prior UAVs
information. To benchmark such a challenging task, we propose the largest UAV
dataset dubbed AntiUAV600 and a new evaluation metric. The AntiUAV600 comprises
600 video sequences of challenging scenes with random, fast, and small-scale
UAVs, with over 723K thermal infrared frames densely annotated with bounding
boxes. Finally, we develop a novel anti-UAV approach via an evidential
collaboration of global UAVs detection and local UAVs tracking, which
effectively tackles the proposed problem and can serve as a strong baseline for
future research. Extensive experiments show our method outperforms SOTA
approaches and validate the ability of AntiUAV600 to enhance UAV perception
performance due to its large scale and complexity. Our dataset, pretrained
models, and source codes will be released publically
The Ninth Visual Object Tracking VOT2021 Challenge Results
acceptedVersionPeer reviewe
Visual Object Tracking on Multi-modal RGB-D Videos: A Review
The development of visual object tracking has continued for decades. Recent
years, as the wide accessibility of the low-cost RGBD sensors, the task of
visual object tracking on RGB-D videos has drawn much attention. Compared to
conventional RGB-only tracking, the RGB-D videos can provide more information
that facilitates objecting tracking in some complicated scenarios. The goal of
this review is to summarize the relative knowledge of the research filed of
RGB-D tracking. To be specific, we will generalize the related RGB-D tracking
benchmarking datasets as well as the corresponding performance measurements.
Besides, the existing RGB-D tracking methods are summarized in the paper.
Moreover, we discuss the possible future direction in the field of RGB-D
tracking
Research on deformation of submarine slope in Zhoushan Islands by in-situ observation
Instability of submarine slopes in Zhoushan Islands is widespread. Frequent submarine landslides pose a great threat to offshore facilities such as submarine optical cables, reclamation projects, ports and docks. In this paper, a self-developed in-situ observation system is used to observe the deformation of submarine slopes on the southwest side of Zhujiajian Island in Zhoushan Islands for 75 days. The results show that the deformation characteristics of sediments at different depths of the submarine slope are different, and the lateral deformation of bottom sediments is about 0.75 mm, which is three times as much as the deformation of overlying sediments. The deformation process presents a step-like change, and the deformation direction is consistent with the trend of submarine slope
Generative-Based Fusion Mechanism for Multi-Modal Tracking
Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-modal tracking. In this paper, we delve into two prominent GM techniques, namely, Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs). Different from the standard fusion process where the features from each modality are directly fed into the fusion block, we combine these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances. This design excels at extracting discriminative clues from the features, enhancing the ultimate tracking performance. Based on this, we conduct extensive experiments across two multi-modal tracking tasks, three baseline methods, and four challenging benchmarks. The experimental results demonstrate that the proposed generative-based fusion mechanism achieves state-of-the-art performance by setting new records on GTOT, LasHeR and RGBD1K. Code will be available at https://github.com/Zhangyong-Tang/GMMT
Research on Submarine landslide monitoring and early warning system
Monitoring and early warning of submarine landslides could provide instant predictions for landslides, which is to avoid the destructive damage of submarine facilities such as pipelines and optical cable, etc effectively. However, researches on submarine landslide focus on numerical simulation and laboratory test, lacking support of in-situ observation data. This paper established the submarine landslide monitoring and early warning system by combining real-time monitoring data with web network platform and database technique. Based on the computational analysis of key monitoring parameters in the process of seabed deformation and sliding, the system has realized the accurate prediction and early warning of submarine landslides. The system has been applied to the submarine landslide monitoring in Zhoushan sea area, Zhejiang province, China, which has ensured the safety of offshore platforms and submarine projects in this area. The establishment of this system provides a new idea and method for submarine landslide warning
Complementary Discriminative Correlation Filters Based on Collaborative Representation for Visual Object Tracking
In recent years, discriminative correlation filter
(DCF) based algorithms have significantly advanced the state of the art in visual object tracking. The key to the success of DCF is an efficient discriminative regression model trained
with powerful multi-cue features, including both hand-crafted and deep neural network features. However, the tracking performance is hindered by their inability to respond adequately to abrupt target appearance variations. This issue is posed by the limited representation capability of fixed image features. In this work, we set out to rectify this shortcoming by proposing a complementary representation of a visual content. Specifically, we propose the use of a collaborative representation between
successive frames to extract the dynamic appearance information from a target with rapid appearance changes, which results in suppressing the undesirable impact of the background. The resulting collaborative representation coefficients are combined
with the original feature maps using a spatially regularised DCF framework for performance boosting. The experimental results on several benchmarking datasets demonstrate the effectiveness and robustness of the proposed method, as compared with a
number of state-of-the-art tracking algorithms