324,907 research outputs found

    Real-time human detection for video surveillance

    Get PDF
    Recent research in computer vision has increasingly focused on building systems for observing humans and understanding their appearance, movements, and activities, providing advanced interfaces for interacting with humans, and creating realistic models of humans for various purposes (Ogale, 2006). For the last decades, video analysis and understanding has been one of the main active fields in computer vision and image analysis where applications relying on this field are various, like video surveillance, object tracking and traffic monitoring. (Allili et aI, 2007). The ability of computer vision to recognize human from the image and works similar to human eye has made computer vision to be used widely in various applications especially in video analysi

    On human motion prediction using recurrent neural networks

    Full text link
    Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.Comment: Accepted at CVPR 1

    View-invariant action recognition

    Full text link
    Human action recognition is an important problem in computer vision. It has a wide range of applications in surveillance, human-computer interaction, augmented reality, video indexing, and retrieval. The varying pattern of spatio-temporal appearance generated by human action is key for identifying the performed action. We have seen a lot of research exploring this dynamics of spatio-temporal appearance for learning a visual representation of human actions. However, most of the research in action recognition is focused on some common viewpoints, and these approaches do not perform well when there is a change in viewpoint. Human actions are performed in a 3-dimensional environment and are projected to a 2-dimensional space when captured as a video from a given viewpoint. Therefore, an action will have a different spatio-temporal appearance from different viewpoints. The research in view-invariant action recognition addresses this problem and focuses on recognizing human actions from unseen viewpoints

    FINE-GRAINED OBJECT DETECTION

    Get PDF
    Object detection plays a vital role in many real-world computer vision applications such as selfdriving cars, human-less stores and general purpose robotic systems. Convolutional Neural Network(CNN) based Deep Learning has evolved to become the backbone of most computer vision algorithms, including object detection. Most of the research has focused on detecting objects that differ significantly e.g. a car, a person, and a bird. Achieving fine-grained object detection to detect different types within one class of objects from general object detection can be the next step. Fine-grained object detection is crucial to tasks like automated retail checkout. This research has developed deep learning models to detect 200 types of birds of similar size and shape. The models were trained and tested on CUB-200-2011 dataset. To the best of our knowledge, by attaining a mean Average Precision (mAP) of 71.5% we achieved an improvement of 5 percentage points over the previous best mAP of 66.2%

    Gesture-based interfaces for INTEROB: interacting with information and robotics systems

    Get PDF
    We discuss in this paper several implementations of computer vision applications that were developed in the last two years in our laboratory and for which gesture-based interactions were introduced. The aim is to provide enhanced human-computer interfaces for several commonly encountered application scenarios: manipulating virtual objects and working inside virtual environments, playing computer games and interacting with robotic systems. We particularly focused on table-based systems which allow natural and intuitive interactions as they transform into comfortable and familiar interfaces

    Learn to automate GUI tasks from demonstration

    Get PDF
    This thesis explores and extends Computer Vision applications in the context of Graphical User Interface (GUI) environments to address the challenges of Programming by Demonstration (PbD). The challenges are explored in PbD which could be addressed through innovations in Computer Vision, when GUIs are treated as an application domain, analogous to automotive or factory settings. Existing PbD systems were restricted by domain applications or special application interfaces. Although they use the term Demonstration, the systems did not actually see what the user performs. Rather they listen to the demonstrations through internal communications via operating system. Machine Vision and Human in the Loop Machine Learning are used to circumvent many restrictions, allowing the PbD system to watch the demonstration like another human observer would. This thesis will demonstrate that our prototype PbD systems allow non-programmer users to easily create their own automation scripts for their repetitive and looping tasks. Our PbD systems take their input from sequences of screenshots, and sometimes from easily available keyboard and mouse sniffer software. It will also be shown that the problem of inconsistent human demonstration can be remedied with our proposed Human in the Loop Computer Vision techniques. Lastly, the problem is extended to learn from demonstration videos. Due to the sheer complexity of computer desktop GUI manipulation videos, attention is focused on the domain of video game environments. The initial studies illustrate that it is possible to teach a computer to watch gameplay videos and to estimate what buttons the user pressed

    Recognising Complex Mental States from Naturalistic Human-Computer Interactions

    Get PDF
    New advances in computer vision techniques will revolutionize the way we interact with computers, as they, together with other improvements, will help us build machines that understand us better. The face is the main non-verbal channel for human-human communication and contains valuable information about emotion, mood, and mental state. Affective computing researchers have investigated widely how facial expressions can be used for automatically recognizing affect and mental states. Nowadays, physiological signals can be measured by video-based techniques, which can also be utilised for emotion detection. Physiological signals, are an important indicator of internal feelings, and are more robust against social masking. This thesis focuses on computer vision techniques to detect facial expression and physiological changes for recognizing non-basic and natural emotions during human-computer interaction. It covers all stages of the research process from data acquisition, integration and application. Most previous studies focused on acquiring data from prototypic basic emotions acted out under laboratory conditions. To evaluate the proposed method under more practical conditions, two different scenarios were used for data collection. In the first scenario, a set of controlled stimulus was used to trigger the user’s emotion. The second scenario aimed at capturing more naturalistic emotions that might occur during a writing activity. In the second scenario, the engagement level of the participants with other affective states was the target of the system. For the first time this thesis explores how video-based physiological measures can be used in affect detection. Video-based measuring of physiological signals is a new technique that needs more improvement to be used in practical applications. A machine learning approach is proposed and evaluated to improve the accuracy of heart rate (HR) measurement using an ordinary camera during a naturalistic interaction with computer

    Recognising Complex Mental States from Naturalistic Human-Computer Interactions

    Get PDF
    New advances in computer vision techniques will revolutionize the way we interact with computers, as they, together with other improvements, will help us build machines that understand us better. The face is the main non-verbal channel for human-human communication and contains valuable information about emotion, mood, and mental state. Affective computing researchers have investigated widely how facial expressions can be used for automatically recognizing affect and mental states. Nowadays, physiological signals can be measured by video-based techniques, which can also be utilised for emotion detection. Physiological signals, are an important indicator of internal feelings, and are more robust against social masking. This thesis focuses on computer vision techniques to detect facial expression and physiological changes for recognizing non-basic and natural emotions during human-computer interaction. It covers all stages of the research process from data acquisition, integration and application. Most previous studies focused on acquiring data from prototypic basic emotions acted out under laboratory conditions. To evaluate the proposed method under more practical conditions, two different scenarios were used for data collection. In the first scenario, a set of controlled stimulus was used to trigger the user’s emotion. The second scenario aimed at capturing more naturalistic emotions that might occur during a writing activity. In the second scenario, the engagement level of the participants with other affective states was the target of the system. For the first time this thesis explores how video-based physiological measures can be used in affect detection. Video-based measuring of physiological signals is a new technique that needs more improvement to be used in practical applications. A machine learning approach is proposed and evaluated to improve the accuracy of heart rate (HR) measurement using an ordinary camera during a naturalistic interaction with computer
    corecore