6 research outputs found

    Hand Pose-based Task Learning from Visual Observations with Semantic Skill Extraction

    Get PDF
    Learning from Demonstrations is a promising technique to transfer task knowledge from a user to a robot. We propose a framework for task programming by observing the human hand pose and object locations solely with a depth camera. By extracting skills from the demonstrations, we are able to represent what the robot has learned, generalize to unseen object locations and optimize the robotic execution instead of replaying a non-optimal behavior. A two-staged segmentation algorithm that employs skill template matching via Hidden Markov Models has been developed to extract motion primitives from the demonstration and gives them semantic meanings. In this way, the transfer of task knowledge has been improved from a simple replay of the demonstration towards a semantically annotated, optimized and generalized execution. We evaluated the extraction of a set of skills in simulation and prove that the task execution can be optimized by such means

    Controlling Text-to-Image Diffusion by Orthogonal Finetuning

    Full text link
    Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.Comment: NeurIPS 2023 (43 pages, 34 figures, project page: https://oft.wyliu.com/

    Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

    Full text link
    Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.Comment: Technical Report (33 pages, 18 figures

    Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning

    Get PDF
    Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing approaches. The Learn2Reg challenge addresses these limitations by providing a multi-task medical image registration data set for comprehensive characterisation of deformable registration algorithms. A continuous evaluation will be possible at https://learn2reg.grand-challenge.org. Learn2Reg covers a wide range of anatomies (brain, abdomen, and thorax), modalities (ultrasound, CT, MR), availability of annotations, as well as intra- and inter-patient registration evaluation. We established an easily accessible framework for training and validation of 3D registration methods, which enabled the compilation of results of over 65 individual method submissions from more than 20 unique teams. We used a complementary set of metrics, including robustness, accuracy, plausibility, and runtime, enabling unique insight into the current state-of-the-art of medical image registration. This paper describes datasets, tasks, evaluation methods and results of the challenge, as well as results of further analysis of transferability to new datasets, the importance of label supervision, and resulting bias. While no single approach worked best across all tasks, many methodological aspects could be identified that push the performance of medical image registration to new state-of-the-art performance. Furthermore, we demystified the common belief that conventional registration methods have to be much slower than deep-learning-based methods

    The effects of post-weld aging and cryogenic treatment on self-fusion welded austenitic stainless steel

    No full text
    The effects of post-weld aging and cryogenic treatment on self-fusion welded austenitic stainless steel thick plates were investigated in the present work. The results showed that fusion zone microstructure consisted of austenite matrix and vermicular ferrite. Aging treatment promoted the decomposition of δ-ferrite into σ-ferrites as well as the precipitation of carbides. Isothermal martensitic transformation in the Tungsten Inert Gas Welding (TIG) specimen was induced by cryogenic treatment, and stacking faults were increased. The fusion zone microstructure of Electron Beam Welding (EBW) and Laser Welding (LW) was finer than that of TIG, with acicular ferrite distributed in the austenite matrix. A large number of twins were generated in the austenite matrix after LW. Cryogenic treatment produced a large number of sub-grains in LW specimens, which was due to the entanglement and accumulation of dislocations in the vicinity of ferrite. Post-weld aging and cryogenic treatment have no influence on the strength of weldments with different welding methods while cryogenic treatment could improve the impact toughness of EBW and LW weldments by the extent of 7.4% and 8.8%, respectively. The aging treatment reduced the impact toughness by 49%, 33% and 15.5%, as well as the uniform elongation by 44%, 39% and 17% for TIG, EBW and LW, respectively. Aging treatment reduced the surface residual stress of TIG weldment by 58.8% in Y direction and 61.2% in X direction. Cryogenic treatment at could also release the surface residual stress of TIG weldment by 36.8% in X direction and 16.3% in Y direction
    corecore