1,780 research outputs found

    The Evolution of First Person Vision Methods: A Survey

    Full text link
    The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

    Multi-sensor human action recognition with particular application to tennis event-based indexing

    Get PDF
    The ability to automatically classify human actions and activities using vi- sual sensors or by analysing body worn sensor data has been an active re- search area for many years. Only recently with advancements in both fields and the ubiquitous nature of low cost sensors in our everyday lives has auto- matic human action recognition become a reality. While traditional sports coaching systems rely on manual indexing of events from a single modality, such as visual or inertial sensors, this thesis investigates the possibility of cap- turing and automatically indexing events from multimodal sensor streams. In this work, we detail a novel approach to infer human actions by fusing multimodal sensors to improve recognition accuracy. State of the art visual action recognition approaches are also investigated. Firstly we apply these action recognition detectors to basic human actions in a non-sporting con- text. We then perform action recognition to infer tennis events in a tennis court instrumented with cameras and inertial sensing infrastructure. The system proposed in this thesis can use either visual or inertial sensors to au- tomatically recognise the main tennis events during play. A complete event retrieval system is also presented to allow coaches to build advanced queries, which existing sports coaching solutions cannot facilitate, without an inordi- nate amount of manual indexing. The event retrieval interface is evaluated against a leading commercial sports coaching tool in terms of both usability and efficiency

    Multi-set canonical correlation analysis for 3D abnormal gait behaviour recognition based on virtual sample generation

    Get PDF
    Small sample dataset and two-dimensional (2D) approach are challenges to vision-based abnormal gait behaviour recognition (AGBR). The lack of three-dimensional (3D) structure of the human body causes 2D based methods to be limited in abnormal gait virtual sample generation (VSG). In this paper, 3D AGBR based on VSG and multi-set canonical correlation analysis (3D-AGRBMCCA) is proposed. First, the unstructured point cloud data of gait are obtained by using a structured light sensor. A 3D parametric body model is then deformed to fit the point cloud data, both in shape and posture. The features of point cloud data are then converted to a high-level structured representation of the body. The parametric body model is used for VSG based on the estimated body pose and shape data. Symmetry virtual samples, pose-perturbation virtual samples and various body-shape virtual samples with multi-views are generated to extend the training samples. The spatial-temporal features of the abnormal gait behaviour from different views, body pose and shape parameters are then extracted by convolutional neural network based Long Short-Term Memory model network. These are projected onto a uniform pattern space using deep learning based multi-set canonical correlation analysis. Experiments on four publicly available datasets show the proposed system performs well under various conditions

    Deep Learning Based Abnormal Gait Classification System Study with Heterogeneous Sensor Network

    Get PDF
    Gait is one of the important biological characteristics of the human body. Abnormal gait is mostly related to the lesion site and has been demonstrated to play a guiding role in clinical research such as medical diagnosis and disease prevention. In order to promote the research of automatic gait pattern recognition, this paper introduces the research status of abnormal gait recognition and systems analysis of the common gait recognition technologies. Based on this, two gait information extraction methods, sensor-based and vision-based, are studied, including wearable system design and deep neural network-based algorithm design. In the sensor-based study, we proposed a lower limb data acquisition system. The experiment was designed to collect acceleration signals and sEMG signals under normal and pathological gaits. Specifically, wearable hardware-based on MSP430 and upper computer software based on Labview is designed. The hardware system consists of EMG foot ring, high-precision IMU and pressure-sensitive intelligent insole. Data of 15 healthy persons and 15 hemiplegic patients during walking were collected. The classification of gait was carried out based on sEMG and the average accuracy rate can reach 92.8% for CNN. For IMU signals five kinds of abnormal gait are trained based on three models: BPNN, LSTM, and CNN. The experimental results show that the system combined with the neural network can classify different pathological gaits well, and the average accuracy rate of the six-classifications task can reach 93%. In vision-based research, by using human keypoint detection technology, we obtain the precise location of the key points through the fusion of thermal mapping and offset, thus extracts the space-time information of the key points. However, the results show that even the state-of-the-art is not good enough for replacing IMU in gait analysis and classification. The good news is the rhythm wave can be observed within 2 m, which proves that the temporal and spatial information of the key points extracted is highly correlated with the acceleration information collected by IMU, which paved the way for the visual-based abnormal gait classification algorithm.步态指人走路时表现出来的姿态,是人体重要生物特征之一。异常步态多与病变部位有关,作为反映人体健康状况和行为能力的重要特征,其被论证在医疗诊断、疾病预防等临床研究中具有指导作用。为了促进步态模式自动识别的研究,本文介绍了异常步态识别的研究现状,系统地分析了常见步态识别技术以及算法,以此为基础研究了基于传感器与基于视觉两种步态信息提取方法,内容包括可穿戴系统设计与基于深度神经网络的算法设计。 在基于传感器的研究中,本工作开发了下肢步态信息采集系统,并利用该信息采集系统设计实验,采集正常与不同病理步态下的加速度信号与肌电信号,搭建深度神经网络完成分类任务。具体的,在系统搭建部分设计了基于MSP430的可穿戴硬件设备以及基于Labview的上位机软件,该硬件系统由肌电脚环,高精度IMU以及压感智能鞋垫组成,该上位机软件接收、解包蓝牙数据并计算出步频步长等常用步态参数。 在基于运动信号与基于表面肌电的研究中,采集了15名健康人与15名偏瘫病人的步态数据,并针对表面肌电信号训练卷积神经网络进行帕金森步态的识别与分类,平均准确率可达92.8%。针对运动信号训练了反向传播神经网络,LSTM以及卷积神经网络三种模型进行五种异常步态的分类任务。实验结果表明,本工作中步态信息采集系统结合神经网络模型,可以很好地对不同病理步态进行分类,六分类平均正确率可达93%。 在基于视觉的研究中,本文利用人体关键点检测技术,首先检测出图片中的一个或多个人,接着对边界框做图像分割,接着采用全卷积resnet对每一个边界框中的人物的主要关节点做热力图并分析偏移量,最后通过热力图与偏移的融合得到关键点的精确定位。通过该算法提取了不同步态下姿态关键点时空信息,为基于视觉的步态分析系统提供了基础条件。但实验结果表明目前最高准确率的人体关键点检测算法不足以替代IMU实现步态分析与分类。但在2m之内可以观察到节律信息,证明了所提取的关键点时空信息与IMU采集的加速度信息呈现较高相关度,为基于视觉的异常步态分类算法铺平了道路

    Video Based Handwritten Characters Recognition

    Get PDF

    Automated Tracking of Hand Hygiene Stages

    Get PDF
    The European Centre for Disease Prevention and Control (ECDC) estimates that 2.5 millioncases of Hospital Acquired Infections (HAIs) occur each year in the European Union. Handhygiene is regarded as one of the most important preventive measures for HAIs. If it is implemented properly, hand hygiene can reduce the risk of cross-transmission of an infection in the healthcare environment. Good hand hygiene is not only important for healthcare settings. Therecent ongoing coronavirus pandemic has highlighted the importance of hand hygiene practices in our daily lives, with governments and health authorities around the world promoting goodhand hygiene practices. The WHO has published guidelines of hand hygiene stages to promotegood hand washing practices. A significant amount of existing research has focused on theproblem of tracking hands to enable hand gesture recognition. In this work, gesture trackingdevices and image processing are explored in the context of the hand washing environment.Hand washing videos of professional healthcare workers were carefully observed and analyzedin order to recognize hand features associated with hand hygiene stages that could be extractedautomatically. Selected hand features such as palm shape (flat or curved); palm orientation(palms facing or not); hand trajectory (linear or circular movement) were then extracted andtracked with the help of a 3D gesture tracking device - the Leap Motion Controller. These fea-tures were further coupled together to detect the execution of a required WHO - hand hygienestage,Rub hands palm to palm, with the help of the Leap sensor in real time. In certain conditions, the Leap Motion Controller enables a clear distinction to be made between the left andright hands. However, whenever the two hands came into contact with each other, sensor data from the Leap, such as palm position and palm orientation was lost for one of the two hands.Hand occlusion was found to be a major drawback with the application of the device to this usecase. Therefore, RGB digital cameras were selected for further processing and tracking of the hands. An image processing technique, using a skin detection algorithm, was applied to extractinstantaneous hand positions for further processing, to enable various hand hygiene poses to be detected. Contour and centroid detection algorithms were further applied to track the handtrajectory in hand hygiene video recordings. In addition, feature detection algorithms wereapplied to a hand hygiene pose to extract the useful hand features. The video recordings did not suffer from occlusion as is the case for the Leap sensor, but the segmentation of one handfrom another was identified as a major challenge with images because the contour detectionresulted in a continuous mass when the two hands were in contact. For future work, the datafrom gesture trackers, such as the Leap Motion Controller and cameras (with image processing)could be combined to make a robust hand hygiene gesture classification system

    Development of a text reading system on video images

    Get PDF
    Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text region¿s identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions

    Feature Space Augmentation: Improving Prediction Accuracy of Classical Problems in Cognitive Science and Computer Vison

    Get PDF
    The prediction accuracy in many classical problems across multiple domains has seen a rise since computational tools such as multi-layer neural nets and complex machine learning algorithms have become widely accessible to the research community. In this research, we take a step back and examine the feature space in two problems from very different domains. We show that novel augmentation to the feature space yields higher performance. Emotion Recognition in Adults from a Control Group: The objective is to quantify the emotional state of an individual at any time using data collected by wearable sensors. We define emotional state as a mixture of amusement, anger, disgust, fear, sadness, anxiety and neutral and their respective levels at any time. The generated model predicts an individual’s dominant state and generates an emotional spectrum, 1x7 vector indicating levels of each emotional state and anxiety. We present an iterative learning framework that alters the feature space uniquely to an individual’s emotion perception, and predicts the emotional state using the individual specific feature space. Hybrid Feature Space for Image Classification: The objective is to improve the accuracy of existing image recognition by leveraging text features from the images. As humans, we perceive objects using colors, dimensions, geometry and any textual information we can gather. Current image recognition algorithms rely exclusively on the first 3 and do not use the textual information. This study develops and tests an approach that trains a classifier on a hybrid text based feature space that has comparable accuracy to the state of the art CNN’s while being significantly inexpensive computationally. Moreover, when combined with CNN’S the approach yields a statistically significant boost in accuracy. Both models are validated using cross validation and holdout validation, and are evaluated against the state of the art
    corecore