4 research outputs found
A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame
A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments
The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame
PiEye in the Wild: Exploring Eye Contact Detection for Small Inexpensive Hardware
Ăgonkontakt-sensorer skapar möjligheten att tolka anvĂ€ndarens uppmĂ€rksamhet, vilket
kan anvÀndas av system pÄ en mÀngd olika vis. Dessa inkluderar att skapa nya möjligheter
för mÀnniska-dator-interaktion och mÀta mönster i uppmÀrksamhet hos individer.
I den hÀr uppsatsen gör vi ett försök till att konstruera en ögonkontakt-sensor med hjÀlp
av en Raspberry Pi, med mÄlet att göra den praktisk i verkliga scenarion. För att faststÀlla
att den Àr praktisk satte vi upp ett antal kriterier baserat pÄ tidigare anvÀndning av
ögonkontakt-sensorer. För att möta dessa kriterier valde vi att anvÀnda en maskininlÀrningsmetod
för att trÀna en klassificerare med bilder för att lÀra systemet att upptÀcka om
en anvÀndare har ögonkontakt eller ej. VÄrt mÄl var att undersöka hur god prestanda vi
kunde uppnÄ gÀllande precision, hastighet och avstÄnd. Efter att ha testat kombinationer
av fyra olika metoder för feature extraction kunde vi fastslÄ att den bÀsta övergripande
precisionen uppnÄddes genom att anvÀnda LDA-komprimering pÄ pixeldatan frÄn varje
bild, medan PCA-komprimering var bÀst nÀr input-bilderna liknande de frÄn trÀningen.
NÀr vi undersökte systemets hastighet fann vi att nedskalning av bilder hade en stor effekt
pÄ hastigheten, men detta sÀnkte ocksÄ bÄde precision och maximalt avstÄnd. Vi lyckades
minska den negativa effekten som en minskad skala hos en bild hade pÄ precisionen, men
det maximala avstÄndet som sensorn fungerade pÄ var fortfarande relativ till skalan och i
förlÀngningen hastigheten.Eye contact detection sensors have the possibility of inferring user attention, which can be
utilized by a system in a multitude of different ways, including supporting human-computer
interaction and measuring human attention patterns. In this thesis we attempt to build
a versatile eye contact sensor using a Raspberry Pi that is suited for real world practical
usage. In order to ensure practicality, we constructed a set of criteria for the system based
on previous implementations. To meet these criteria, we opted to use an appearance-based
machine learning method where we train a classifier with training images in order to infer
if users look at the camera or not. Our aim was to investigate how well we could detect
eye contacts on the Raspberry Pi in terms of accuracy, speed and range. After extensive
testing on combinations of four different feature extraction methods, we found that Linear
Discriminant Analysis compression of pixel data provided the best overall accuracy, but
Principal Component Analysis compression performed the best when tested on images
from the same dataset as the training data. When investigating the speed of the system,
we found that down-scaling input images had a huge effect on the speed, but also lowered
the accuracy and range. While we managed to mitigate the effects the scale had on the
accuracy, the range of the system is still relative to the scale of input images and by
extension speed
An investigation of transfer learning for deep architectures in group activity recognition
Pervasive technologies permeating our immediate surroundings provide a wide variety of means for sensing and actuating in our environment, having a great potential to impact the way we live, but also how we work. In this paper, we address the problem of activity recognition in office environments, as a means for inferring contextual information in order to automatically and proactively assists people in their daily activities. To this end we employ state-of-the-art image processing techniques and evaluate their capabilities in a real-world setup. Traditional machine learning is characterized by instances where both the training and test data share the same distribution. When this is not the case, the performance of the learned model is deteriorated. However, often times, the data is expensive or difficult to collect and label. It is therefore important to develop techniques that are able to make the best possible use of existing data sets from related domains, relative to the target domain. To this end, we further investigate in this work transfer learning techniques in deep learning architectures for the task of activity recognition in office settings. We provide herein a solution model that attains a 94% accuracy under the right conditions