447 research outputs found
The localization of single pulse in VLBI observation
In our previous work, we propose a cross spectrum based method to extract
single pulse signals from RFI contaminated data, which is originated from
geodetic VLBI postprocessing. This method fully utilizes fringe phase
information of the cross spectrum and hence maximizes signal power, however the
localization was not discussed in that work yet. As the continuation of that
work, in this paper, we further study how to localize single pulses using
astrometric solving method. Assuming that the burst is a point source, we
derive the burst position by solving a set of linear equations given the
relation between residual delay and offset to a priori position. We find that
the single pulse localization results given by both astrometric solving and
radio imaging are consistent within 3 sigma level. Therefore we claim that it
is possible to derive the position of a single pulse with reasonable precision
based on only 3 or even 2 baselines with 4 milliseconds integration. The
combination of cross spectrum based detection and the localization proposed in
this work then provide a thorough solution for searching single pulse in VLBI
observation. According to our calculation, our pipeline gives comparable
accuracy as radio imaging pipeline. Moreover, the computational cost of our
pipeline is much smaller, which makes it more practical for FRB search in
regular VLBI observation. The pipeline is now publicly available and we name it
as "VOLKS", which is the acronym of "VLBI Observation for frb Localization Keen
Searcher".Comment: 11 pages, 4 figures, 3 tables, accepted for publication in A
Data Augmentation in Human-Centric Vision
This survey presents a comprehensive analysis of data augmentation techniques
in human-centric vision tasks, a first of its kind in the field. It delves into
a wide range of research areas including person ReID, human parsing, human pose
estimation, and pedestrian detection, addressing the significant challenges
posed by overfitting and limited training data in these domains. Our work
categorizes data augmentation methods into two main types: data generation and
data perturbation. Data generation covers techniques like graphic engine-based
generation, generative model-based generation, and data recombination, while
data perturbation is divided into image-level and human-level perturbations.
Each method is tailored to the unique requirements of human-centric tasks, with
some applicable across multiple areas. Our contributions include an extensive
literature review, providing deep insights into the influence of these
augmentation techniques in human-centric vision and highlighting the nuances of
each method. We also discuss open issues and future directions, such as the
integration of advanced generative models like Latent Diffusion Models, for
creating more realistic and diverse training data. This survey not only
encapsulates the current state of data augmentation in human-centric vision but
also charts a course for future research, aiming to develop more robust,
accurate, and efficient human-centric vision systems
True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning
Despite the impressive performance across numerous tasks, large language
models (LLMs) often fail in solving simple decision-making tasks due to the
misalignment of the knowledge in LLMs with environments. On the contrary,
reinforcement learning (RL) agents learn policies from scratch, which makes
them always align with environments but difficult to incorporate prior
knowledge for efficient explorations. To narrow the gap, we propose TWOSOME, a
novel general online framework that deploys LLMs as decision-making agents to
efficiently interact and align with embodied environments via RL without
requiring any prepared datasets or prior knowledge of the environments.
Firstly, we query the joint probabilities of each valid action with LLMs to
form behavior policies. Then, to enhance the stability and robustness of the
policies, we propose two normalization methods and summarize four prompt design
principles. Finally, we design a novel parameter-efficient training
architecture where the actor and critic share one frozen LLM equipped with
low-rank adapters (LoRA) updated by PPO. We conduct extensive experiments to
evaluate TWOSOME. i) TWOSOME exhibits significantly better sample efficiency
and performance compared to the conventional RL method, PPO, and prompt tuning
method, SayCan, in both classical decision-making environment, Overcooked, and
simulated household environment, VirtualHome. ii) Benefiting from LLMs'
open-vocabulary feature, TWOSOME shows superior generalization ability to
unseen tasks. iii) Under our framework, there is no significant loss of the
LLMs' original ability during online PPO finetuning.Comment: Accepted by ICLR202
Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases
Currently intelligent diagnosis systems lack the ability of continually
learning to diagnose new diseases once deployed, under the condition of
preserving old disease knowledge. In particular, updating an intelligent
diagnosis system with training data of new diseases would cause catastrophic
forgetting of old disease knowledge. To address the catastrophic forgetting
issue, a novel adapter-based strategy is proposed to help effectively learn a
set of new diseases at each round (or task) of continual learning, without
changing the shared feature extractor. The learnable lightweight task-specific
adapter(s) can be flexibly designed (e.g., two convolutional layers) and then
added to the pretrained and fixed feature extractor. Together with a specially
designed task-specific head which absorbs all previously learned old diseases
as a single 'out-of-distribution' category, task-specific adapter(s) can help
the pretrained feature extractor more effectively extract discriminative
features between diseases. In addition, a simple yet effective fine-tuning is
applied to collaboratively fine-tune multiple task-specific heads such that
outputs from different heads are comparable and consequently the appropriate
classifier head can be more accurately selected during model inference.
Extensive empirical evaluations on three image datasets demonstrate the
superior performance of the proposed method in continual learning of new
diseases. The source code will be released publicly.Comment: 10 page
Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization
Out-of-distribution (OOD) graph generalization are critical for many
real-world applications. Existing methods neglect to discard spurious or noisy
features of inputs, which are irrelevant to the label. Besides, they mainly
conduct instance-level class-invariant graph learning and fail to utilize the
structural class relationships between graph instances. In this work, we
endeavor to address these issues in a unified framework, dubbed Individual and
Structural Graph Information Bottlenecks (IS-GIB). To remove class spurious
feature caused by distribution shifts, we propose Individual Graph Information
Bottleneck (I-GIB) which discards irrelevant information by minimizing the
mutual information between the input graph and its embeddings. To leverage the
structural intra- and inter-domain correlations, we propose Structural Graph
Information Bottleneck (S-GIB). Specifically for a batch of graphs with
multiple domains, S-GIB first computes the pair-wise input-input,
embedding-embedding, and label-label correlations. Then it minimizes the mutual
information between input graph and embedding pairs while maximizing the mutual
information between embedding and label pairs. The critical insight of S-GIB is
to simultaneously discard spurious features and learn invariant features from a
high-order perspective by maintaining class relationships under multiple
distributional shifts. Notably, we unify the proposed I-GIB and S-GIB to form
our complementary framework IS-GIB. Extensive experiments conducted on both
node- and graph-level tasks consistently demonstrate the superior
generalization ability of IS-GIB. The code is available at
https://github.com/YangLing0818/GraphOOD.Comment: Accepted by IEEE Transactions on Knowledge and Data Engineering
(TKDE
- …