42 research outputs found

    A Computational Model of the Short-Cut Rule for 2D Shape Decomposition

    Full text link
    We propose a new 2D shape decomposition method based on the short-cut rule. The short-cut rule originates from cognition research, and states that the human visual system prefers to partition an object into parts using the shortest possible cuts. We propose and implement a computational model for the short-cut rule and apply it to the problem of shape decomposition. The model we proposed generates a set of cut hypotheses passing through the points on the silhouette which represent the negative minima of curvature. We then show that most part-cut hypotheses can be eliminated by analysis of local properties of each. Finally, the remaining hypotheses are evaluated in ascending length order, which guarantees that of any pair of conflicting cuts only the shortest will be accepted. We demonstrate that, compared with state-of-the-art shape decomposition methods, the proposed approach achieves decomposition results which better correspond to human intuition as revealed in psychological experiments.Comment: 11 page

    A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory Prediction

    Full text link
    Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories

    Visual In-Context Prompting

    Full text link
    In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain. Existing visual prompting methods focus on referring segmentation to segment the most relevant object, falling short of addressing many generic vision tasks like open-set segmentation and detection. In this paper, we introduce a universal visual in-context prompting framework for both tasks. In particular, we build on top of an encoder-decoder architecture, and develop a versatile prompt encoder to support a variety of prompts like strokes, boxes, and points. We further enhance it to take an arbitrary number of reference image segments as the context. Our extensive explorations show that the proposed visual in-context prompting elicits extraordinary referring and generic segmentation capabilities to refer and detect, yielding competitive performance to close-set in-domain datasets and showing promising results on many open-set segmentation datasets. By joint training on COCO and SA-1B, our model achieves 57.757.7 PQ on COCO and 23.223.2 PQ on ADE20K. Code will be available at https://github.com/UX-Decoder/DINOv.Comment: technical repor

    Grounded Language-Image Pre-training

    Full text link
    This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich. In our experiments, we pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs. The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks. 1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines. 2) After fine-tuned on COCO, GLIP achieves 60.8 AP on val and 61.5 AP on test-dev, surpassing prior SoTA. 3) When transferred to 13 downstream object detection tasks, a 1-shot GLIP rivals with a fully-supervised Dynamic Head. Code is released at https://github.com/microsoft/GLIP.Comment: CVPR 2022; updated visualizations; fixed hyper-parameters in Appendix C.

    A Computational Model of the Short-Cut Rule for 2D Shape Decomposition

    No full text

    Design and Analysis of a Linear Hybrid Excitation Flux-Switching Generator for Direct Drive Wave Energy Converters

    No full text
    Linear generators have the advantage of a simple structure of the secondary, which is suitable for the application of wave energy conversion. Based on the vernier hybrid machines (VHMs), widely used for direct drive wave energy converters, this paper proposes a novel hybrid excitation flux-switching generator (LHEFSG), which can effectively improve the performance of this kind of generators. DC hybrid excitation windings and multitooth structure were used in the proposed generator to increase the magnetic energy and overcome the disadvantages of easily irreversible demagnetization of VHMs. Firstly, the operation principle and structure of the proposed generator are introduced. Secondly, by using the finite element method, the no-load performance of the proposed generator is analyzed and composed with ones of conventional VHM. In addition, the on-load performance of the proposed generator is obtained by finite element analysis (FEA). A dislocation of pole alignments method is implemented to reduce the cogging force. Lastly, a prototype of the linear flux-switching generator is used to verify the correctness of FEA results. All the results validate that the proposed generator has better performance than its counterparts
    corecore