110,104 research outputs found

    Generic system for human-computer gesture interaction: applications on sign language recognition and robotic soccer refereeing

    Get PDF
    Hand gestures are a powerful way for human communication, with lots of potential applications in the area of human computer interaction. Vision-based hand gesture recognition techniques have many proven advantages compared with traditional devices, giving users a simpler and more natural way to communicate with electronic devices. This work proposes a generic system architecture based in computer vision and machine learning, able to be used with any interface for human-computer interaction. The proposed solution is mainly composed of three modules: a pre-processing and hand segmentation module, a static gesture interface module and a dynamic gesture interface module. The experiments showed that the core of visionbased interaction systems could be the same for all applications and thus facilitate the implementation. For hand posture recognition, a SVM (Support Vector Machine) model was trained and used, able to achieve a final accuracy of 99.4%. For dynamic gestures, an HMM (Hidden Markov Model) model was trained for each gesture that the system could recognize with a final average accuracy of 93.7%. The proposed solution as the advantage of being generic enough with the trained models able to work in real-time, allowing its application in a wide range of human-machine applications. To validate the proposed framework two applications were implemented. The first one is a real-time system able to interpret the Portuguese Sign Language. The second one is an online system able to help a robotic soccer game referee judge a game in real time

    Hand gesture recognition system based in computer vision and machine learning

    Get PDF
    "Lecture notes in computational vision and biomechanics series, ISSN 2212-9391, vol. 19"Hand gesture recognition is a natural way of human computer interaction and an area of very active research in computer vision and machine learning. This is an area with many different possible applications, giving users a simpler and more natural way to communicate with robots/systems interfaces, without the need for extra devices. So, the primary goal of gesture recognition research applied to Human-Computer Interaction (HCI) is to create systems, which can identify specific human gestures and use them to convey information or controlling devices. For that, vision-based hand gesture interfaces require fast and extremely robust hand detection, and gesture recognition in real time. This paper presents a solution, generic enough, with the help of machine learning algorithms, allowing its application in a wide range of human-computer interfaces, for real-time gesture recognition. Experiments carried out showed that the system was able to achieve an accuracy of 99.4% in terms of hand posture recognition and an average accuracy of 93.72% in terms of dynamic gesture recognition. To validate the proposed framework, two applications were implemented. The first one is a real-time system able to help a robotic soccer referee judge a game in real time. The prototype combines a vision-based hand gesture recognition system with a formal language definition, the Referee CommLang, into what is called the Referee Command Language Interface System (ReCLIS). The second one is a real-time system able to interpret the Portuguese Sign Language. Sign languages are not standard and universal and the grammars differ from country to country. Although the implemented prototype was only trained to recognize the vowels, it is easily extended to recognize the rest of the alphabet, being a solid foundation for the development of any vision-based sign language recognition user interface system.(undefined

    Vision application of human robot interaction: Development of a ping pong playing robotic arm

    Get PDF
    Robotics is a science that is implemented parallel to human behavior. This work describes and implements techniques to mathematically model the game of ping pong played by the humans, and utilization of these methods in the design and development of a ping pong playing robotic arm as an application of robotic vision. Displaced frame difference (DFD) is used to segment the ball motion from background motion and parametric calibration of single CCD camera is utilized to track the ball in three dimensions. This visual information is temporally updated and further applied to guide a robot arm to hit the ball at a specified location in time. The results signify the system development based on single camera tracking and also demonstrate its working with self-sufficiency for the color of the ball. System latency is measured as a function of the camera interface, processor architecture, and robot motion. Various hardware and software parameters that influence the real time system performance are also discussed

    AltURI: a thin middleware for simulated robot vision applications

    Get PDF
    Fast software performance is often the focus when developing real-time vision-based control applications for robot simulators. In this paper we have developed a thin, high performance middleware for USARSim and other simulators designed for real-time vision-based control applications. It includes a fast image server providing images in OpenCV, Matlab or web formats and a simple command/sensor processor. The interface has been tested in USARSim with an Unmanned Aerial Vehicle using two control applications; landing using a reinforcement learning algorithm and altitude control using elementary motion detection. The middleware has been found to be fast enough to control the flying robot as well as very easy to set up and use

    Virtual Skiing as an Art Installation

    Get PDF
    The Virtual Skiing game allows the user to immerse himself into the skiing sensation without using any obvious hardware interfaces. To achieve the movement down the virtual skiing slope the skier who stands on a pair of skis attached to the floor performs the same movements as on real skis, in particular this is the case on carving skis: tilting the body to the left initiates a left turn, tilting the body to the right initiates a right turn, by lowering the body, the speed is increased. The skier observes his progress down the virtual slope projected on the wall in front of him. The skier’s movements are recorded using a video camera placed in front of him and processed on a PC in real time to drive the projected animation of the virtual slope

    Control y programaciĂłn de un robot industrial con Microsoft Kinect

    Get PDF
    Industrial robot programming relies on a suitable interface between a human operator and the robot hardware. This interface has evolved through the years to facilitate the task of programming, making possible, for example, to position a robot at real time with a handheld unit, or designing ‘offline’ the layout and operation of a complex industrial process in the GUI of a computer application. The different approaches to robot programming aim to increasing rates of efficiency without impairing already assumed capacities of the programming environment, like stability control, precision or safety. Some of these approaches have found their way in computer vision, and the last generation of image sensors boosts today many applications, inside and outside the automation industry, featuring extended capabilities in a new low-cost market. The Kinectℱ sensor from Microsoft¼ emerged in the market of video game consoles in 2010, based around a webcam-style add-on peripheral for the Xbox 360ℱ console, enabling gamers to control and interact with it through a natural user interface using gestures and spoken commands. Nevertheless, the project was aimed at broadening the Xbox 360’s audience beyond its typical gamer base and in 2011 Microsoft released the Kinect SDK for Windows¼ 7, allowing developers to write Kinect applications in C++/CLI, C# and Visual Basic .NETℱ. The aim of this thesis is the design, implementation, testing and documentation of the software capable of identifying, tracking, locating and representing three objects in real time, based on their color characteristic, through the use of the Kinect sensor. Such a system is also the beginning of an interface for robot programming. The sensor is to be programmed in C++ language using the Kinect for Windows SDK and the Desktop App UI for Windows. The OpenCV library is the tool for the image processing algorithms. For this thesis, the IDE selected for programming is Visual Studio 2012, running in a 32-bit OS.Ingeniería Industria

    A Consumer-tier based Visual-Brain Machine Interface for Augmented Reality Glasses Interactions

    Full text link
    Objective.Visual-Brain Machine Interface(V-BMI) has provide a novel interaction technique for Augmented Reality (AR) industries. Several state-of-arts work has demonstates its high accuracy and real-time interaction capbilities. However, most of the studies employ EEGs devices that are rigid and difficult to apply in real-life AR glasseses application sceniraros. Here we develop a consumer-tier Visual-Brain Machine Inteface(V-BMI) system specialized for Augmented Reality(AR) glasses interactions. Approach. The developed system consists of a wearable hardware which takes advantages of fast set-up, reliable recording and comfortable wearable experience that specificized for AR glasses applications. Complementing this hardware, we have devised a software framework that facilitates real-time interactions within the system while accommodating a modular configuration to enhance scalability. Main results. The developed hardware is only 110g and 120x85x23 mm, which with 1 Tohm and peak to peak voltage is less than 1.5 uV, and a V-BMI based angry bird game and an Internet of Thing (IoT) AR applications are deisgned, we demonstrated such technology merits of intuitive experience and efficiency interaction. The real-time interaction accuracy is between 85 and 96 percentages in a commercial AR glasses (DTI is 2.24s and ITR 65 bits-min ). Significance. Our study indicates the developed system can provide an essential hardware-software framework for consumer based V-BMI AR glasses. Also, we derive several pivotal design factors for a consumer-grade V-BMI-based AR system: 1) Dynamic adaptation of stimulation patterns-classification methods via computer vision algorithms is necessary for AR glasses applications; and 2) Algorithmic localization to foster system stability and latency reduction.Comment: 15 pages,10 figure

    Playing for Data: Ground Truth from Computer Games

    Full text link
    Recent progress in computer vision has been driven by high-capacity models trained on large datasets. Unfortunately, creating large datasets with pixel-level labels has been extremely costly due to the amount of human effort required. In this paper, we present an approach to rapidly creating pixel-accurate semantic label maps for images extracted from modern computer games. Although the source code and the internal operation of commercial games are inaccessible, we show that associations between image patches can be reconstructed from the communication between the game and the graphics hardware. This enables rapid propagation of semantic labels within and across images synthesized by the game, with no access to the source code or the content. We validate the presented approach by producing dense pixel-level semantic annotations for 25 thousand images synthesized by a photorealistic open-world computer game. Experiments on semantic segmentation datasets show that using the acquired data to supplement real-world images significantly increases accuracy and that the acquired data enables reducing the amount of hand-labeled real-world data: models trained with game data and just 1/3 of the CamVid training set outperform models trained on the complete CamVid training set.Comment: Accepted to the 14th European Conference on Computer Vision (ECCV 2016
    • 

    corecore