925 research outputs found

    Freeform User Interfaces for Graphical Computing

    Get PDF
    報告番号: 甲15222 ; 学位授与年月日: 2000-03-29 ; 学位の種別: 課程博士 ; 学位の種類: 博士(工学) ; 学位記番号: 博工第4717号 ; 研究科・専攻: 工学系研究科情報工学専

    Exploring Natural User Abstractions For Shared Perceptual Manipulator Task Modeling & Recovery

    Get PDF
    State-of-the-art domestic robot assistants are essentially autonomous mobile manipulators capable of exerting human-scale precision grasps. To maximize utility and economy, non-technical end-users would need to be nearly as efficient as trained roboticists in control and collaboration of manipulation task behaviors. However, it remains a significant challenge given that many WIMP-style tools require superficial proficiency in robotics, 3D graphics, and computer science for rapid task modeling and recovery. But research on robot-centric collaboration has garnered momentum in recent years; robots are now planning in partially observable environments that maintain geometries and semantic maps, presenting opportunities for non-experts to cooperatively control task behavior with autonomous-planning agents exploiting the knowledge. However, as autonomous systems are not immune to errors under perceptual difficulty, a human-in-the-loop is needed to bias autonomous-planning towards recovery conditions that resume the task and avoid similar errors. In this work, we explore interactive techniques allowing non-technical users to model task behaviors and perceive cooperatively with a service robot under robot-centric collaboration. We evaluate stylus and touch modalities that users can intuitively and effectively convey natural abstractions of high-level tasks, semantic revisions, and geometries about the world. Experiments are conducted with \u27pick-and-place\u27 tasks in an ideal \u27Blocks World\u27 environment using a Kinova JACO six degree-of-freedom manipulator. Possibilities for the architecture and interface are demonstrated with the following features; (1) Semantic \u27Object\u27 and \u27Location\u27 grounding that describe function and ambiguous geometries (2) Task specification with an unordered list of goal predicates, and (3) Guiding task recovery with implied scene geometries and trajectory via symmetry cues and configuration space abstraction. Empirical results from four user studies show our interface was much preferred than the control condition, demonstrating high learnability and ease-of-use that enable our non-technical participants to model complex tasks, provide effective recovery assistance, and teleoperative control

    Using GANs to Create Training Images for Object Pose Estimation

    Get PDF
    openObject pose estimation is a significant challenge in computer vision, involving determining the position and orientation of an object within an image. This task requires a large training dataset representing various poses of the object. However, collecting such data can be costly and labor-intensive. This thesis examines the architecture of Generative Adversarial Networks (GANs), which can generate realistic images from random noise, and aims to study various GAN architectures and apply them to generate images that can be used in object pose estimation. Throughout the experiments, pre-rendered images of the Cat's model from LINEMOD [1] will serve as input to the considered GAN models, aiming to reproduce realistic images containing the reference Cat in the same predefined pose. First, a Domain Adaptation (DA) approach has been used to generate underwater images based on an unpaired dataset. This experiment served as a proof-of-concept to illustrate the process of generating images in environments where data collection is notably challenging. Next, the focus has been shifted to generating new samples from the LINEMOD's dataset. Here, the objective is to capture features and spatial relationships between objects from the real images and use them to generate new pictures coherent with the input Cat's model. Finally, the best GAN models have been combined with Pixel-wise Voting Network (PVNet [2]) to compare their performance against the PVNet model trained using the conventional image Superimposing Method (PVNet-SM). The main conclusion of this work is that GANs can achieve similar performance as PVNet-SM while eliminating the need for manual work in background selection and dataset preparation. [1] S. Hinterstoisser, V. Lepetit, S. Ilic, et al., "Model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes," in Computer Vision ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds., vol. 7724, Springer Berlin Heidelberg, 2012, pp. 548–562. [2] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, "PVNet: Pixel-wise voting network for 6-DOF pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.Object pose estimation is a significant challenge in computer vision, involving determining the position and orientation of an object within an image. This task requires a large training dataset representing various poses of the object. However, collecting such data can be costly and labor-intensive. This thesis examines the architecture of Generative Adversarial Networks (GANs), which can generate realistic images from random noise, and aims to study various GAN architectures and apply them to generate images that can be used in object pose estimation. Throughout the experiments, pre-rendered images of the Cat's model from LINEMOD [1] will serve as input to the considered GAN models, aiming to reproduce realistic images containing the reference Cat in the same predefined pose. First, a Domain Adaptation (DA) approach has been used to generate underwater images based on an unpaired dataset. This experiment served as a proof-of-concept to illustrate the process of generating images in environments where data collection is notably challenging. Next, the focus has been shifted to generating new samples from the LINEMOD's dataset. Here, the objective is to capture features and spatial relationships between objects from the real images and use them to generate new pictures coherent with the input Cat's model. Finally, the best GAN models have been combined with Pixel-wise Voting Network (PVNet [2]) to compare their performance against the PVNet model trained using the conventional image Superimposing Method (PVNet-SM). The main conclusion of this work is that GANs can achieve similar performance as PVNet-SM while eliminating the need for manual work in background selection and dataset preparation. [1] S. Hinterstoisser, V. Lepetit, S. Ilic, et al., "Model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes," in Computer Vision ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds., vol. 7724, Springer Berlin Heidelberg, 2012, pp. 548–562. [2] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, "PVNet: Pixel-wise voting network for 6-DOF pose estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

    CARE: Confidence-rich Autonomous Robot Exploration using Bayesian Kernel Inference and Optimization

    Full text link
    In this paper, we consider improving the efficiency of information-based autonomous robot exploration in unknown and complex environments. We first utilize Gaussian process (GP) regression to learn a surrogate model to infer the confidence-rich mutual information (CRMI) of querying control actions, then adopt an objective function consisting of predicted CRMI values and prediction uncertainties to conduct Bayesian optimization (BO), i.e., GP-based BO (GPBO). The trade-off between the best action with the highest CRMI value (exploitation) and the action with high prediction variance (exploration) can be realized. To further improve the efficiency of GPBO, we propose a novel lightweight information gain inference method based on Bayesian kernel inference and optimization (BKIO), achieving an approximate logarithmic complexity without the need for training. BKIO can also infer the CRMI and generate the best action using BO with bounded cumulative regret, which ensures its comparable accuracy to GPBO with much higher efficiency. Extensive numerical and real-world experiments show the desired efficiency of our proposed methods without losing exploration performance in different unstructured, cluttered environments. We also provide our open-source implementation code at https://github.com/Shepherd-Gregory/BKIO-Exploration.Comment: Full version for the paper accepted by IEEE Robotics and Automation Letters (RA-L) 2023. arXiv admin note: text overlap with arXiv:2301.0052

    Spatial Interaction for Immersive Mixed-Reality Visualizations

    Get PDF
    Growing amounts of data, both in personal and professional settings, have caused an increased interest in data visualization and visual analytics. Especially for inherently three-dimensional data, immersive technologies such as virtual and augmented reality and advanced, natural interaction techniques have been shown to facilitate data analysis. Furthermore, in such use cases, the physical environment often plays an important role, both by directly influencing the data and by serving as context for the analysis. Therefore, there has been a trend to bring data visualization into new, immersive environments and to make use of the physical surroundings, leading to a surge in mixed-reality visualization research. One of the resulting challenges, however, is the design of user interaction for these often complex systems. In my thesis, I address this challenge by investigating interaction for immersive mixed-reality visualizations regarding three core research questions: 1) What are promising types of immersive mixed-reality visualizations, and how can advanced interaction concepts be applied to them? 2) How does spatial interaction benefit these visualizations and how should such interactions be designed? 3) How can spatial interaction in these immersive environments be analyzed and evaluated? To address the first question, I examine how various visualizations such as 3D node-link diagrams and volume visualizations can be adapted for immersive mixed-reality settings and how they stand to benefit from advanced interaction concepts. For the second question, I study how spatial interaction in particular can help to explore data in mixed reality. There, I look into spatial device interaction in comparison to touch input, the use of additional mobile devices as input controllers, and the potential of transparent interaction panels. Finally, to address the third question, I present my research on how user interaction in immersive mixed-reality environments can be analyzed directly in the original, real-world locations, and how this can provide new insights. Overall, with my research, I contribute interaction and visualization concepts, software prototypes, and findings from several user studies on how spatial interaction techniques can support the exploration of immersive mixed-reality visualizations.Zunehmende Datenmengen, sowohl im privaten als auch im beruflichen Umfeld, führen zu einem zunehmenden Interesse an Datenvisualisierung und visueller Analyse. Insbesondere bei inhärent dreidimensionalen Daten haben sich immersive Technologien wie Virtual und Augmented Reality sowie moderne, natürliche Interaktionstechniken als hilfreich für die Datenanalyse erwiesen. Darüber hinaus spielt in solchen Anwendungsfällen die physische Umgebung oft eine wichtige Rolle, da sie sowohl die Daten direkt beeinflusst als auch als Kontext für die Analyse dient. Daher gibt es einen Trend, die Datenvisualisierung in neue, immersive Umgebungen zu bringen und die physische Umgebung zu nutzen, was zu einem Anstieg der Forschung im Bereich Mixed-Reality-Visualisierung geführt hat. Eine der daraus resultierenden Herausforderungen ist jedoch die Gestaltung der Benutzerinteraktion für diese oft komplexen Systeme. In meiner Dissertation beschäftige ich mich mit dieser Herausforderung, indem ich die Interaktion für immersive Mixed-Reality-Visualisierungen im Hinblick auf drei zentrale Forschungsfragen untersuche: 1) Was sind vielversprechende Arten von immersiven Mixed-Reality-Visualisierungen, und wie können fortschrittliche Interaktionskonzepte auf sie angewendet werden? 2) Wie profitieren diese Visualisierungen von räumlicher Interaktion und wie sollten solche Interaktionen gestaltet werden? 3) Wie kann räumliche Interaktion in diesen immersiven Umgebungen analysiert und ausgewertet werden? Um die erste Frage zu beantworten, untersuche ich, wie verschiedene Visualisierungen wie 3D-Node-Link-Diagramme oder Volumenvisualisierungen für immersive Mixed-Reality-Umgebungen angepasst werden können und wie sie von fortgeschrittenen Interaktionskonzepten profitieren. Für die zweite Frage untersuche ich, wie insbesondere die räumliche Interaktion bei der Exploration von Daten in Mixed Reality helfen kann. Dabei betrachte ich die Interaktion mit räumlichen Geräten im Vergleich zur Touch-Eingabe, die Verwendung zusätzlicher mobiler Geräte als Controller und das Potenzial transparenter Interaktionspanels. Um die dritte Frage zu beantworten, stelle ich schließlich meine Forschung darüber vor, wie Benutzerinteraktion in immersiver Mixed-Reality direkt in der realen Umgebung analysiert werden kann und wie dies neue Erkenntnisse liefern kann. Insgesamt trage ich mit meiner Forschung durch Interaktions- und Visualisierungskonzepte, Software-Prototypen und Ergebnisse aus mehreren Nutzerstudien zu der Frage bei, wie räumliche Interaktionstechniken die Erkundung von immersiven Mixed-Reality-Visualisierungen unterstützen können
    corecore