416 research outputs found
Video based detection of driver fatigue
This thesis addresses the problem of drowsy driver detection using computer vision techniques applied to the human face. Specifically we explore the possibility of discriminating drowsy from alert video segments using facial expressions automatically extracted from video. Several approaches were previously proposed for the detection and prediction of drowsiness. There has recently been increasing interest in computer vision approaches as it is a potentially promising approach due to its non-invasive nature for detecting drowsiness. Previous studies with vision based approaches detect driver drowsiness primarily by making pre-assumptions about the relevant behavior, focusing on blink rate, eye closure, and yawning. Here we employ machine learning to explore, understand and exploit actual human behavior during drowsiness episodes. We have collected two datasets including facial and head movement measures. Head motion is collected through an accelerometer for the first dataset (UYAN-1) and an automatic video based head pose detector for the second dataset (UYAN-2). We use outputs of the automatic classifiers of the facial action coding system (FACS) for detecting drowsiness. These facial actions include blinking and yawn motions, as well as a number of other facial movements. These measures are passed to a learning-based classifier based on multinomial logistic regression. In UYAN-1 the system is able to predict sleep and crash episodes during a driving computer game with 0.98 performance area under the receiver operator characteristic curve for across subjects tests. This is the highest prediction rate reported to date for detecting real drowsiness. Moreover, the analysis reveals new information about human facial behavior during drowsy driving. In UYAN-2 fine discrimination of drowsy states are also explored on a separate dataset. The degree to which individual facial action units can predict the difference between moderately drowsy to acutely drowsy is studied. Signal processing techniques and machine learning methods are employed to build a person independent acute drowsiness detection system. Temporal dynamics are captured using a bank of temporal filters. Individual action unit predictive power is explored with an MLR based classifier. Best performing five action units have been determined for a person independent system. The system is able to obtain 0.96 performance of area under the receiver operator characteristic curve for a more challenging dataset with the combined features of the best performing 5 action units. Moreover the analysis reveals new markers for different levels of drowsiness
Recommended from our members
Video content analysis for automated detection and tracking of humans in CCTV surveillance applications
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The problems of achieving high detection rate with low false alarm rate for human detection and tracking in video sequence, performance scalability, and improving response time are addressed in this thesis. The underlying causes are the effect of scene complexity, human-to-human interactions, scale changes, and scene background-human interactions. A two-stage processing solution, namely, human detection, and human tracking with two novel pattern classifiers is presented. Scale independent human detection is achieved by processing in the wavelet domain using square wavelet features. These features used to characterise human silhouettes at different scales are similar to rectangular features used in [Viola 2001]. At the detection stage two detectors are combined to improve detection rate. The first detector is based on shape-outline of humans extracted from the scene using a reduced complexity outline extraction algorithm. A Shape mismatch measure is used to differentiate between the human and the background class. The second detector uses rectangular features as primitives for silhouette description in the wavelet domain. The marginal distribution of features collocated at a particular position on a candidate human (a patch of the image) is used to describe statistically the silhouette. Two similarity measures are computed between a candidate human and the model histograms of human and non human classes. The similarity measure is used to discriminate between the human and the non human class. At the tracking stage, a tracker based on joint probabilistic data association filter (JPDAF) for data association, and motion correspondence is presented. Track clustering is used to reduce hypothesis enumeration complexity. Towards improving response time with increase in frame dimension, scene complexity, and number of channels; a scalable algorithmic architecture and operating accuracy prediction technique is presented. A scheduling strategy for improving the response time and throughput by parallel processing is also presented
Visual scene recognition with biologically relevant generative models
This research focuses on developing visual object categorization methodologies that are based on machine learning techniques and biologically inspired generative models of visual scene recognition. Modelling the statistical variability in visual patterns, in the space of features extracted from them by an appropriate low level signal processing technique, is an important matter of investigation for both humans and machines. To study this problem, we have examined in detail two recent probabilistic models of vision: a simple multivariate Gaussian model as suggested by (Karklin & Lewicki, 2009) and a restricted Boltzmann machine (RBM) proposed by (Hinton, 2002). Both the models have been widely used for visual object classification and scene analysis tasks before. This research highlights that these models on their own are not plausible enough to perform the classification task, and suggests Fisher kernel as a means of inducing discrimination into these models for classification power. Our empirical results on standard benchmark data sets reveal that the classification performance of these generative models could be significantly boosted near to the state of the art performance, by drawing a Fisher kernel from compact generative models that computes the data labels in a fraction of total computation time. We compare the proposed technique with other distance based and kernel based classifiers to show how computationally efficient the Fisher kernels are. To the best of our knowledge, Fisher kernel has not been drawn from the RBM before, so the work presented in the thesis is novel in terms of its idea and application to vision problem
Automatic object classification for surveillance videos.
PhDThe recent popularity of surveillance video systems, specially located in urban
scenarios, demands the development of visual techniques for monitoring purposes.
A primary step towards intelligent surveillance video systems consists on automatic
object classification, which still remains an open research problem and the keystone
for the development of more specific applications.
Typically, object representation is based on the inherent visual features. However,
psychological studies have demonstrated that human beings can routinely categorise
objects according to their behaviour. The existing gap in the understanding
between the features automatically extracted by a computer, such as appearance-based
features, and the concepts unconsciously perceived by human beings but
unattainable for machines, or the behaviour features, is most commonly known
as semantic gap. Consequently, this thesis proposes to narrow the semantic gap
and bring together machine and human understanding towards object classification.
Thus, a Surveillance Media Management is proposed to automatically detect and
classify objects by analysing the physical properties inherent in their appearance
(machine understanding) and the behaviour patterns which require a higher level of
understanding (human understanding). Finally, a probabilistic multimodal fusion
algorithm bridges the gap performing an automatic classification considering both
machine and human understanding.
The performance of the proposed Surveillance Media Management framework
has been thoroughly evaluated on outdoor surveillance datasets. The experiments
conducted demonstrated that the combination of machine and human understanding
substantially enhanced the object classification performance. Finally, the inclusion
of human reasoning and understanding provides the essential information to bridge
the semantic gap towards smart surveillance video systems
Assessing the repeatability of automated seafloor classification algorithms, with application in marine protected area monitoring
The number and areal extent of marine protected areas worldwide is rapidly increasing as a result of numerous national targets that aim to see up to 30% of their waters protected by 2030. Automated seabed classification algorithms are arising as faster and objective methods to generate benthic habitat maps to monitor these areas. However, no study has yet systematically compared their repeatability. Here we aim to address that problem by comparing the repeatability of maps derived from acoustic datasets collected on consecutive days using three automated seafloor classification algorithms: (1) Random Forest (RF), (2) K–Nearest Neighbour (KNN) and (3) K means (KMEANS). The most robust and repeatable approach is then used to evaluate the change in seafloor habitats between 2012 and 2015 within the Greater Haig Fras Marine Conservation Zone, Celtic Sea, UK. Our results demonstrate that only RF and KNN provide statistically repeatable maps, with 60.3% and 47.2% agreement between consecutive days. Additionally, this study suggests that in low-relief areas, bathymetric derivatives are non-essential input parameters, while backscatter textural features, in particular Grey Level Co-occurrence Matrices, are substantially more effective in the detection of different habitats. Habitat persistence in the test area between 2012 and 2015 was 48.8%, with swapping of habitats driving the changes in 38.2% of the area. Overall, this study highlights the importance of investigating the repeatability of automated seafloor classification methods before they can be fully used in the monitoring of benthic habitat
Aligning computer and human visual representations
Both computer vision and human visual system target the same goal: to accomplish visual tasks easily via a set of representations. In this thesis, we study to what extent representations from computer vision models align to human visual representations. To study this research question we used an interdisciplinary approach, integrating methods from psychology, neuroscience and computer vision. Such an approach is aimed to provide new insight in the understanding of human visual representations. In the four chapters of the thesis, we tested computer vision models against brain data obtained with electro-encephalography (EEG) and functional magnetic resonance imaging (fMRI). The main findings can be summarized as follows; 1) computer vision models with one or two computational stages correlate to visual representations of intermediate complexity in the human brain, 2) models with multiple computational stages correlate best to the hierarchy of representations in the human visual system, 3) computer vision models do not align one-to-one to the temporal hierarchy of representations in the visual cortex and 4) not only visual but also semantic representations correlate to representations in the human visual system
- …