201,830 research outputs found

    Stereoscopic bimanual interaction for 3D visualization

    Get PDF
    Virtual Environments (VE) are being widely used in various research fields for several decades such as 3D visualization, education, training and games. VEs have the potential to enhance the visualization and act as a general medium for human-computer interaction (HCI). However, limited research has evaluated virtual reality (VR) display technologies, monocular and binocular depth cues, for human depth perception of volumetric (non-polygonal) datasets. In addition, a lack of standardization of three-dimensional (3D) user interfaces (UI) makes it challenging to interact with many VE systems. To address these issues, this dissertation focuses on evaluation of effects of stereoscopic and head-coupled displays on depth judgment of volumetric dataset. It also focuses on evaluation of a two-handed view manipulation techniques which support simultaneous 7 degree-of-freedom (DOF) navigation (x,y,z + yaw,pitch,roll + scale) in a multi-scale virtual environment (MSVE). Furthermore, this dissertation evaluates auto-adjustment of stereo view parameters techniques for stereoscopic fusion problems in a MSVE. Next, this dissertation presents a bimanual, hybrid user interface which combines traditional tracking devices with computer-vision based "natural" 3D inputs for multi-dimensional visualization in a semi-immersive desktop VR system. In conclusion, this dissertation provides a guideline for research design for evaluating UI and interaction techniques

    Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement

    Full text link
    Large-scale datasets for single-label multi-class classification, such as \emph{ImageNet-1k}, have been instrumental in advancing deep learning and computer vision. However, a critical and often understudied aspect is the comprehensive quality assessment of these datasets, especially regarding potential multi-label annotation errors. In this paper, we introduce a lightweight, user-friendly, and scalable framework that synergizes human and machine intelligence for efficient dataset validation and quality enhancement. We term this novel framework \emph{Multilabelfy}. Central to Multilabelfy is an adaptable web-based platform that systematically guides annotators through the re-evaluation process, effectively leveraging human-machine interactions to enhance dataset quality. By using Multilabelfy on the ImageNetV2 dataset, we found that approximately 47.88%47.88\% of the images contained at least two labels, underscoring the need for more rigorous assessments of such influential datasets. Furthermore, our analysis showed a negative correlation between the number of potential labels per image and model top-1 accuracy, illuminating a crucial factor in model evaluation and selection. Our open-source framework, Multilabelfy, offers a convenient, lightweight solution for dataset enhancement, emphasizing multi-label proportions. This study tackles major challenges in dataset integrity and provides key insights into model performance evaluation. Moreover, it underscores the advantages of integrating human expertise with machine capabilities to produce more robust models and trustworthy data development. The source code for Multilabelfy will be available at https://github.com/esla/Multilabelfy. \keywords{Computer Vision \and Dataset Quality Enhancement \and Dataset Validation \and Human-Computer Interaction \and Multi-label Annotation.

    Multi-touch For General-purpose Computing An Examination Of Text Entry

    Get PDF
    In recent years, multi-touch has been heralded as a revolution in humancomputer interaction. Multi-touch provides features such as gestural interaction, tangible interfaces, pen-based computing, and interface customization – features embraced by an increasingly tech-savvy public. However, multi-touch platforms have not been adopted as everyday computer interaction devices; that is, multi-touch has not been applied to general-purpose computing. The questions this thesis seeks to address are: Will the general public adopt these systems as their chief interaction paradigm? Can multi-touch provide such a compelling platform that it displaces the desktop mouse and keyboard? Is multi-touch truly the next revolution in human-computer interaction? As a first step toward answering these questions, we observe that generalpurpose computing relies on text input, and ask: Can multi-touch, without a text entry peripheral, provide a platform for efficient text entry? And, by extension, is such a platform viable for general-purpose computing? We investigate these questions through four user studies that collected objective and subjective data for text entry and word processing tasks. The first of these studies establishes a benchmark for text entry performance on a multi-touch platform, across a variety of input modes. The second study attempts to improve this performance by iv examining an alternate input technique. The third and fourth studies include mousestyle interaction for formatting rich-text on a multi-touch platform, in the context of a word processing task. These studies establish a foundation for future efforts in general-purpose computing on a multi-touch platform. Furthermore, this work details deficiencies in tactile feedback with modern multi-touch platforms, and describes an exploration of audible feedback. Finally, the thesis conveys a vision for a general-purpose multi-touch platform, its design and rationale

    Editor’s Note

    Get PDF
    This special issue “Artificial Intelligence and Social Application” includes extended versions of selected papers from Artificial Intelligence and Education area of the 13th edition of the Ibero-American Conference on Artificial Intelligence, held in Cartagena de Indias - Colombia, November, 2012. The issue includes, thus, five selected papers, describing innovative research work, on Artificial Intelligence in Education area including, among others: Recommender Systems, Learning Objects, Intelligent Tutoring Systems, Multi-Agent Systems, Virtual Learning Environments, Case-based reasoning and Classifiers Algorithms. This issue also includes six papers in the Interactive Multimedia and Artificial Intelligence areas, dealing with subjects such as User Experience, E-Learning, Communication Tools, Multi-Agent Systems, Grid Computing. IBERAMIA 2012 was the 13th edition of the Ibero-American Conference on Artificial Intelligence, a leading symposium where the Ibero-American AI community comes together to share research results and experiences with researchers in Artificial Intelligence from all over the world. The papers were organized in topical sections on knowledge representation and reasoning, information and knowledge processing, knowledge discovery and data mining, machine learning, bio-inspired computing, fuzzy systems, modelling and simulation, ambient intelligence, multi-agent systems, human-computer interaction, natural language processing, computer vision and robotics, planning and scheduling, AI in education, and knowledge engineering and applications

    Natural User Interfaces for Human-Drone Multi-Modal Interaction

    Get PDF
    Personal drones are becoming part of every day life. To fully integrate them into society, it is crucial to design safe and intuitive ways to interact with these aerial systems. The recent advances on User-Centered Design (UCD) applied to Natural User Interfaces (NUIs) intend to make use of human innate features, such as speech, gestures and vision to interact with technology in the way humans would with one another. In this paper, a Graphical User Interface (GUI) and several NUI methods are studied and implemented, along with computer vision techniques, in a single software framework for aerial robotics called Aerostack which allows for intuitive and natural human-quadrotor interaction in indoor GPS-denied environments. These strategies include speech, body position, hand gesture and visual marker interactions used to directly command tasks to the drone. The NUIs presented are based on devices like the Leap Motion Controller, microphones and small size monocular on-board cameras which are unnoticeable to the user. Thanks to this UCD perspective, the users can choose the most intuitive and effective type of interaction for their application. Additionally, the strategies proposed allow for multi-modal interaction between multiple users and the drone by being able to integrate several of these interfaces in one single application as is shown in various real flight experiments performed with non-expert users

    The Evolution of First Person Vision Methods: A Survey

    Full text link
    The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

    A real-time human-robot interaction system based on gestures for assistive scenarios

    Get PDF
    Natural and intuitive human interaction with robotic systems is a key point to develop robots assisting people in an easy and effective way. In this paper, a Human Robot Interaction (HRI) system able to recognize gestures usually employed in human non-verbal communication is introduced, and an in-depth study of its usability is performed. The system deals with dynamic gestures such as waving or nodding which are recognized using a Dynamic Time Warping approach based on gesture specific features computed from depth maps. A static gesture consisting in pointing at an object is also recognized. The pointed location is then estimated in order to detect candidate objects the user may refer to. When the pointed object is unclear for the robot, a disambiguation procedure by means of either a verbal or gestural dialogue is performed. This skill would lead to the robot picking an object in behalf of the user, which could present difficulties to do it by itself. The overall system — which is composed by a NAO and Wifibot robots, a KinectTM v2 sensor and two laptops — is firstly evaluated in a structured lab setup. Then, a broad set of user tests has been completed, which allows to assess correct performance in terms of recognition rates, easiness of use and response times.Postprint (author's final draft

    Vision systems with the human in the loop

    Get PDF
    The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed
    corecore