416 research outputs found

    Video based detection of driver fatigue

    Get PDF
    This thesis addresses the problem of drowsy driver detection using computer vision techniques applied to the human face. Specifically we explore the possibility of discriminating drowsy from alert video segments using facial expressions automatically extracted from video. Several approaches were previously proposed for the detection and prediction of drowsiness. There has recently been increasing interest in computer vision approaches as it is a potentially promising approach due to its non-invasive nature for detecting drowsiness. Previous studies with vision based approaches detect driver drowsiness primarily by making pre-assumptions about the relevant behavior, focusing on blink rate, eye closure, and yawning. Here we employ machine learning to explore, understand and exploit actual human behavior during drowsiness episodes. We have collected two datasets including facial and head movement measures. Head motion is collected through an accelerometer for the first dataset (UYAN-1) and an automatic video based head pose detector for the second dataset (UYAN-2). We use outputs of the automatic classifiers of the facial action coding system (FACS) for detecting drowsiness. These facial actions include blinking and yawn motions, as well as a number of other facial movements. These measures are passed to a learning-based classifier based on multinomial logistic regression. In UYAN-1 the system is able to predict sleep and crash episodes during a driving computer game with 0.98 performance area under the receiver operator characteristic curve for across subjects tests. This is the highest prediction rate reported to date for detecting real drowsiness. Moreover, the analysis reveals new information about human facial behavior during drowsy driving. In UYAN-2 fine discrimination of drowsy states are also explored on a separate dataset. The degree to which individual facial action units can predict the difference between moderately drowsy to acutely drowsy is studied. Signal processing techniques and machine learning methods are employed to build a person independent acute drowsiness detection system. Temporal dynamics are captured using a bank of temporal filters. Individual action unit predictive power is explored with an MLR based classifier. Best performing five action units have been determined for a person independent system. The system is able to obtain 0.96 performance of area under the receiver operator characteristic curve for a more challenging dataset with the combined features of the best performing 5 action units. Moreover the analysis reveals new markers for different levels of drowsiness

    Design and modeling of a stair climber smart mobile robot (MSRox)

    Full text link

    Extraction and representation of semantic information in digital media

    Get PDF

    Visual scene recognition with biologically relevant generative models

    No full text
    This research focuses on developing visual object categorization methodologies that are based on machine learning techniques and biologically inspired generative models of visual scene recognition. Modelling the statistical variability in visual patterns, in the space of features extracted from them by an appropriate low level signal processing technique, is an important matter of investigation for both humans and machines. To study this problem, we have examined in detail two recent probabilistic models of vision: a simple multivariate Gaussian model as suggested by (Karklin & Lewicki, 2009) and a restricted Boltzmann machine (RBM) proposed by (Hinton, 2002). Both the models have been widely used for visual object classification and scene analysis tasks before. This research highlights that these models on their own are not plausible enough to perform the classification task, and suggests Fisher kernel as a means of inducing discrimination into these models for classification power. Our empirical results on standard benchmark data sets reveal that the classification performance of these generative models could be significantly boosted near to the state of the art performance, by drawing a Fisher kernel from compact generative models that computes the data labels in a fraction of total computation time. We compare the proposed technique with other distance based and kernel based classifiers to show how computationally efficient the Fisher kernels are. To the best of our knowledge, Fisher kernel has not been drawn from the RBM before, so the work presented in the thesis is novel in terms of its idea and application to vision problem

    Automatic object classification for surveillance videos.

    Get PDF
    PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems

    Assessing the repeatability of automated seafloor classification algorithms, with application in marine protected area monitoring

    Get PDF
    The number and areal extent of marine protected areas worldwide is rapidly increasing as a result of numerous national targets that aim to see up to 30% of their waters protected by 2030. Automated seabed classification algorithms are arising as faster and objective methods to generate benthic habitat maps to monitor these areas. However, no study has yet systematically compared their repeatability. Here we aim to address that problem by comparing the repeatability of maps derived from acoustic datasets collected on consecutive days using three automated seafloor classification algorithms: (1) Random Forest (RF), (2) K–Nearest Neighbour (KNN) and (3) K means (KMEANS). The most robust and repeatable approach is then used to evaluate the change in seafloor habitats between 2012 and 2015 within the Greater Haig Fras Marine Conservation Zone, Celtic Sea, UK. Our results demonstrate that only RF and KNN provide statistically repeatable maps, with 60.3% and 47.2% agreement between consecutive days. Additionally, this study suggests that in low-relief areas, bathymetric derivatives are non-essential input parameters, while backscatter textural features, in particular Grey Level Co-occurrence Matrices, are substantially more effective in the detection of different habitats. Habitat persistence in the test area between 2012 and 2015 was 48.8%, with swapping of habitats driving the changes in 38.2% of the area. Overall, this study highlights the importance of investigating the repeatability of automated seafloor classification methods before they can be fully used in the monitoring of benthic habitat

    Aligning computer and human visual representations

    Get PDF
    Both computer vision and human visual system target the same goal: to accomplish visual tasks easily via a set of representations. In this thesis, we study to what extent representations from computer vision models align to human visual representations. To study this research question we used an interdisciplinary approach, integrating methods from psychology, neuroscience and computer vision. Such an approach is aimed to provide new insight in the understanding of human visual representations. In the four chapters of the thesis, we tested computer vision models against brain data obtained with electro-encephalography (EEG) and functional magnetic resonance imaging (fMRI). The main findings can be summarized as follows; 1) computer vision models with one or two computational stages correlate to visual representations of intermediate complexity in the human brain, 2) models with multiple computational stages correlate best to the hierarchy of representations in the human visual system, 3) computer vision models do not align one-to-one to the temporal hierarchy of representations in the visual cortex and 4) not only visual but also semantic representations correlate to representations in the human visual system
    corecore