754 research outputs found
A Novel Improved Bat Algorithm in UAV Path Planning
Path planning algorithm is the key point to UAV path planning scenario. Many traditional path planning methods still suffer from low convergence rate and insufficient robustness. In this paper, three main methods are contributed to solving these problems. First, the improved artificial potential field (APF) method is adopted to accelerate the convergence process of the bat’s position update. Second, the optimal success rate strategy is proposed to improve the adaptive inertia weight of bat algorithm. Third chaos strategy is proposed to avoid falling into a local optimum. Compared with standard APF and chaos strategy in UAV path planning scenarios, the improved algorithm CPFIBA (The improved artificial potential field method combined with chaotic bat algorithm, CPFIBA) significantly increases the success rate of finding suitable planning path and decrease the convergence time. Simulation results show that the proposed algorithm also has great robustness for processing with path planning problems. Meanwhile, it overcomes the shortcomings of the traditional meta-heuristic algorithms, as their convergence process is the potential to fall into a local optimum. From the simulation, we can see also obverse that the proposed CPFIBA provides better performance than BA and DEBA in problems of UAV path planning
PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling
Masked Image Modeling (MIM) has achieved promising progress with the advent
of Masked Autoencoders (MAE) and BEiT. However, subsequent works have
complicated the framework with new auxiliary tasks or extra pre-trained models,
inevitably increasing computational overhead. This paper undertakes a
fundamental analysis of MIM from the perspective of pixel reconstruction, which
examines the input image patches and reconstruction target, and highlights two
critical but previously overlooked bottlenecks. Based on this analysis, we
propose a remarkably simple and effective method, {\ourmethod}, that entails
two strategies: 1) filtering the high-frequency components from the
reconstruction target to de-emphasize the network's focus on texture-rich
details and 2) adopting a conservative data transform strategy to alleviate the
problem of missing foreground in MIM training. {\ourmethod} can be easily
integrated into most existing pixel-based MIM approaches (\ie, using raw images
as reconstruction target) with negligible additional computation. Without bells
and whistles, our method consistently improves three MIM approaches, MAE,
ConvMAE, and LSMAE, across various downstream tasks. We believe this effective
plug-and-play method will serve as a strong baseline for self-supervised
learning and provide insights for future improvements of the MIM framework.
Code and models are available at
\url{https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/pixmim}.Comment: Update code link and add additional result
GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models
The field of relation extraction (RE) is experiencing a notable shift towards
generative relation extraction (GRE), leveraging the capabilities of large
language models (LLMs). However, we discovered that traditional relation
extraction (RE) metrics like precision and recall fall short in evaluating GRE
methods. This shortfall arises because these metrics rely on exact matching
with human-annotated reference relations, while GRE methods often produce
diverse and semantically accurate relations that differ from the references. To
fill this gap, we introduce GenRES for a multi-dimensional assessment in terms
of the topic similarity, uniqueness, granularity, factualness, and completeness
of the GRE results. With GenRES, we empirically identified that (1)
precision/recall fails to justify the performance of GRE methods; (2)
human-annotated referential relations can be incomplete; (3) prompting LLMs
with a fixed set of relations or entities can cause hallucinations. Next, we
conducted a human evaluation of GRE methods that shows GenRES is consistent
with human preferences for RE quality. Last, we made a comprehensive evaluation
of fourteen leading LLMs using GenRES across document, bag, and sentence level
RE datasets, respectively, to set the benchmark for future research in GR
Collaborative Propagation on Multiple Instance Graphs for 3D Instance Segmentation with Single-point Supervision
Instance segmentation on 3D point clouds has been attracting increasing
attention due to its wide applications, especially in scene understanding
areas. However, most existing methods operate on fully annotated data while
manually preparing ground-truth labels at point-level is very cumbersome and
labor-intensive. To address this issue, we propose a novel weakly supervised
method RWSeg that only requires labeling one object with one point. With these
sparse weak labels, we introduce a unified framework with two branches to
propagate semantic and instance information respectively to unknown regions
using self-attention and a cross-graph random walk method. Specifically, we
propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages
competition among different instance graphs to resolve ambiguities in closely
placed objects, improving instance assignment accuracy. RWSeg generates
high-quality instance-level pseudo labels. Experimental results on ScanNet-v2
and S3DIS datasets show that our approach achieves comparable performance with
fully-supervised methods and outperforms previous weakly-supervised methods by
a substantial margin
Learning-Based Biharmonic Augmentation for Point Cloud Classification
Point cloud datasets often suffer from inadequate sample sizes in comparison
to image datasets, making data augmentation challenging. While traditional
methods, like rigid transformations and scaling, have limited potential in
increasing dataset diversity due to their constraints on altering individual
sample shapes, we introduce the Biharmonic Augmentation (BA) method. BA is a
novel and efficient data augmentation technique that diversifies point cloud
data by imposing smooth non-rigid deformations on existing 3D structures. This
approach calculates biharmonic coordinates for the deformation function and
learns diverse deformation prototypes. Utilizing a CoefNet, our method predicts
coefficients to amalgamate these prototypes, ensuring comprehensive
deformation. Moreover, we present AdvTune, an advanced online augmentation
system that integrates adversarial training. This system synergistically
refines the CoefNet and the classification network, facilitating the automated
creation of adaptive shape deformations contingent on the learner status.
Comprehensive experimental analysis validates the superiority of Biharmonic
Augmentation, showcasing notable performance improvements over prevailing point
cloud augmentation techniques across varied network designs
REACTO: Reconstructing Articulated Objects from a Single Video
In this paper, we address the challenge of reconstructing general articulated
3D objects from a single video. Existing works employing dynamic neural
radiance fields have advanced the modeling of articulated objects like humans
and animals from videos, but face challenges with piece-wise rigid general
articulated objects due to limitations in their deformation models. To tackle
this, we propose Quasi-Rigid Blend Skinning, a novel deformation model that
enhances the rigidity of each part while maintaining flexible deformation of
the joints. Our primary insight combines three distinct approaches: 1) an
enhanced bone rigging system for improved component modeling, 2) the use of
quasi-sparse skinning weights to boost part rigidity and reconstruction
fidelity, and 3) the application of geodesic point assignment for precise
motion and seamless deformation. Our method outperforms previous works in
producing higher-fidelity 3D reconstructions of general articulated objects, as
demonstrated on both real and synthetic datasets. Project page:
https://chaoyuesong.github.io/REACTO
HiCRISP: An LLM-based Hierarchical Closed-Loop Robotic Intelligent Self-Correction Planner
The integration of Large Language Models (LLMs) into robotics has
revolutionized human-robot interactions and autonomous task planning. However,
these systems are often unable to self-correct during the task execution, which
hinders their adaptability in dynamic real-world environments. To address this
issue, we present a Hierarchical Closed-loop Robotic Intelligent
Self-correction Planner (HiCRISP), an innovative framework that enables robots
to correct errors within individual steps during the task execution. HiCRISP
actively monitors and adapts the task execution process, addressing both
high-level planning and low-level action errors. Extensive benchmark
experiments, encompassing virtual and real-world scenarios, showcase HiCRISP's
exceptional performance, positioning it as a promising solution for robotic
task planning with LLMs
Performance Analysis of Uplink/Downlink Decoupled Access in Cellular-V2X Networks
This paper firstly develops an analytical framework to investigate the
performance of uplink (UL) / downlink (DL) decoupled access in cellular
vehicle-to-everything (C-V2X) networks, in which a vehicle's UL/DL can be
connected to different macro/small base stations (MBSs/SBSs) separately. Using
the stochastic geometry analytical tool, the UL/DL decoupled access C-V2X is
modeled as a Cox process, and we obtain the following theoretical results,
i.e., 1) the probability of different UL/DL joint association cases i.e., both
the UL and DL are associated with the different MBSs or SBSs, or they are
associated with different types of BSs; 2) the distance distribution of a
vehicle to its serving BSs in each case; 3) the spectral efficiency of UL/DL in
each case; and 4) the UL/DL coverage probability of MBS/SBS. The analyses
reveal the insights and performance gain of UL/DL decoupled access. Through
extensive simulations, \textcolor{black}{the accuracy of the proposed
analytical framework is validated.} Both the analytical and simulation results
show that UL/DL decoupled access can improve spectral efficiency. The
theoretical results can be directly used for estimating the statistical
performance of a UL/DL decoupled access C-V2X network.Comment: 15 pages, 10 figure
- …
