42 research outputs found
A Computational Model of the Short-Cut Rule for 2D Shape Decomposition
We propose a new 2D shape decomposition method based on the short-cut rule.
The short-cut rule originates from cognition research, and states that the
human visual system prefers to partition an object into parts using the
shortest possible cuts. We propose and implement a computational model for the
short-cut rule and apply it to the problem of shape decomposition. The model we
proposed generates a set of cut hypotheses passing through the points on the
silhouette which represent the negative minima of curvature. We then show that
most part-cut hypotheses can be eliminated by analysis of local properties of
each. Finally, the remaining hypotheses are evaluated in ascending length
order, which guarantees that of any pair of conflicting cuts only the shortest
will be accepted. We demonstrate that, compared with state-of-the-art shape
decomposition methods, the proposed approach achieves decomposition results
which better correspond to human intuition as revealed in psychological
experiments.Comment: 11 page
A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory Prediction
Accurate and robust trajectory prediction of neighboring agents is critical
for autonomous vehicles traversing in complex scenes. Most methods proposed in
recent years are deep learning-based due to their strength in encoding complex
interactions. However, unplausible predictions are often generated since they
rely heavily on past observations and cannot effectively capture the transient
and contingency interactions from sparse samples. In this paper, we propose a
hierarchical hybrid framework of deep learning (DL) and reinforcement learning
(RL) for multi-agent trajectory prediction, to cope with the challenge of
predicting motions shaped by multi-scale interactions. In the DL stage, the
traffic scene is divided into multiple intermediate-scale heterogenous graphs
based on which Transformer-style GNNs are adopted to encode heterogenous
interactions at intermediate and global levels. In the RL stage, we divide the
traffic scene into local sub-scenes utilizing the key future points predicted
in the DL stage. To emulate the motion planning procedure so as to produce
trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO)
incorporated with a vehicle kinematics model is devised to plan motions under
the dominant influence of microscopic interactions. A multi-objective reward is
designed to balance between agent-centric accuracy and scene-wise
compatibility. Experimental results show that our proposal matches the
state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by
the visualized results that the hierarchical learning framework captures the
multi-scale interactions and improves the feasibility and compliance of the
predicted trajectories
Normal values of left atrial size and strain analyzed by dedicated speckle-tracking echocardiography in the Chinese population
Visual In-Context Prompting
In-context prompting in large language models (LLMs) has become a prevalent
approach to improve zero-shot capabilities, but this idea is less explored in
the vision domain. Existing visual prompting methods focus on referring
segmentation to segment the most relevant object, falling short of addressing
many generic vision tasks like open-set segmentation and detection. In this
paper, we introduce a universal visual in-context prompting framework for both
tasks. In particular, we build on top of an encoder-decoder architecture, and
develop a versatile prompt encoder to support a variety of prompts like
strokes, boxes, and points. We further enhance it to take an arbitrary number
of reference image segments as the context. Our extensive explorations show
that the proposed visual in-context prompting elicits extraordinary referring
and generic segmentation capabilities to refer and detect, yielding competitive
performance to close-set in-domain datasets and showing promising results on
many open-set segmentation datasets. By joint training on COCO and SA-1B, our
model achieves PQ on COCO and PQ on ADE20K. Code will be
available at https://github.com/UX-Decoder/DINOv.Comment: technical repor
Grounded Language-Image Pre-training
This paper presents a grounded language-image pre-training (GLIP) model for
learning object-level, language-aware, and semantic-rich visual
representations. GLIP unifies object detection and phrase grounding for
pre-training. The unification brings two benefits: 1) it allows GLIP to learn
from both detection and grounding data to improve both tasks and bootstrap a
good grounding model; 2) GLIP can leverage massive image-text pairs by
generating grounding boxes in a self-training fashion, making the learned
representation semantic-rich. In our experiments, we pre-train GLIP on 27M
grounding data, including 3M human-annotated and 24M web-crawled image-text
pairs. The learned representations demonstrate strong zero-shot and few-shot
transferability to various object-level recognition tasks. 1) When directly
evaluated on COCO and LVIS (without seeing any images in COCO during
pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many
supervised baselines. 2) After fine-tuned on COCO, GLIP achieves 60.8 AP on val
and 61.5 AP on test-dev, surpassing prior SoTA. 3) When transferred to 13
downstream object detection tasks, a 1-shot GLIP rivals with a fully-supervised
Dynamic Head. Code is released at https://github.com/microsoft/GLIP.Comment: CVPR 2022; updated visualizations; fixed hyper-parameters in Appendix
C.
Design and Analysis of a Linear Hybrid Excitation Flux-Switching Generator for Direct Drive Wave Energy Converters
Linear generators have the advantage of a simple structure of the secondary, which is suitable for the application of wave energy conversion. Based on the vernier hybrid machines (VHMs), widely used for direct drive wave energy converters, this paper proposes a novel hybrid excitation flux-switching generator (LHEFSG), which can effectively improve the performance of this kind of generators. DC hybrid excitation windings and multitooth structure were used in the proposed generator to increase the magnetic energy and overcome the disadvantages of easily irreversible demagnetization of VHMs. Firstly, the operation principle and structure of the proposed generator are introduced. Secondly, by using the finite element method, the no-load performance of the proposed generator is analyzed and composed with ones of conventional VHM. In addition, the on-load performance of the proposed generator is obtained by finite element analysis (FEA). A dislocation of pole alignments method is implemented to reduce the cogging force. Lastly, a prototype of the linear flux-switching generator is used to verify the correctness of FEA results. All the results validate that the proposed generator has better performance than its counterparts