4 research outputs found

    A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments

    No full text
    The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame

    A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments

    No full text
    The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame

    PiEye in the Wild: Exploring Eye Contact Detection for Small Inexpensive Hardware

    No full text
    Ögonkontakt-sensorer skapar möjligheten att tolka anvĂ€ndarens uppmĂ€rksamhet, vilket kan anvĂ€ndas av system pĂ„ en mĂ€ngd olika vis. Dessa inkluderar att skapa nya möjligheter för mĂ€nniska-dator-interaktion och mĂ€ta mönster i uppmĂ€rksamhet hos individer. I den hĂ€r uppsatsen gör vi ett försök till att konstruera en ögonkontakt-sensor med hjĂ€lp av en Raspberry Pi, med mĂ„let att göra den praktisk i verkliga scenarion. För att faststĂ€lla att den Ă€r praktisk satte vi upp ett antal kriterier baserat pĂ„ tidigare anvĂ€ndning av ögonkontakt-sensorer. För att möta dessa kriterier valde vi att anvĂ€nda en maskininlĂ€rningsmetod för att trĂ€na en klassificerare med bilder för att lĂ€ra systemet att upptĂ€cka om en anvĂ€ndare har ögonkontakt eller ej. VĂ„rt mĂ„l var att undersöka hur god prestanda vi kunde uppnĂ„ gĂ€llande precision, hastighet och avstĂ„nd. Efter att ha testat kombinationer av fyra olika metoder för feature extraction kunde vi fastslĂ„ att den bĂ€sta övergripande precisionen uppnĂ„ddes genom att anvĂ€nda LDA-komprimering pĂ„ pixeldatan frĂ„n varje bild, medan PCA-komprimering var bĂ€st nĂ€r input-bilderna liknande de frĂ„n trĂ€ningen. NĂ€r vi undersökte systemets hastighet fann vi att nedskalning av bilder hade en stor effekt pĂ„ hastigheten, men detta sĂ€nkte ocksĂ„ bĂ„de precision och maximalt avstĂ„nd. Vi lyckades minska den negativa effekten som en minskad skala hos en bild hade pĂ„ precisionen, men det maximala avstĂ„ndet som sensorn fungerade pĂ„ var fortfarande relativ till skalan och i förlĂ€ngningen hastigheten.Eye contact detection sensors have the possibility of inferring user attention, which can be utilized by a system in a multitude of different ways, including supporting human-computer interaction and measuring human attention patterns. In this thesis we attempt to build a versatile eye contact sensor using a Raspberry Pi that is suited for real world practical usage. In order to ensure practicality, we constructed a set of criteria for the system based on previous implementations. To meet these criteria, we opted to use an appearance-based machine learning method where we train a classifier with training images in order to infer if users look at the camera or not. Our aim was to investigate how well we could detect eye contacts on the Raspberry Pi in terms of accuracy, speed and range. After extensive testing on combinations of four different feature extraction methods, we found that Linear Discriminant Analysis compression of pixel data provided the best overall accuracy, but Principal Component Analysis compression performed the best when tested on images from the same dataset as the training data. When investigating the speed of the system, we found that down-scaling input images had a huge effect on the speed, but also lowered the accuracy and range. While we managed to mitigate the effects the scale had on the accuracy, the range of the system is still relative to the scale of input images and by extension speed

    An investigation of transfer learning for deep architectures in group activity recognition

    No full text
    Pervasive technologies permeating our immediate surroundings provide a wide variety of means for sensing and actuating in our environment, having a great potential to impact the way we live, but also how we work. In this paper, we address the problem of activity recognition in office environments, as a means for inferring contextual information in order to automatically and proactively assists people in their daily activities. To this end we employ state-of-the-art image processing techniques and evaluate their capabilities in a real-world setup. Traditional machine learning is characterized by instances where both the training and test data share the same distribution. When this is not the case, the performance of the learned model is deteriorated. However, often times, the data is expensive or difficult to collect and label. It is therefore important to develop techniques that are able to make the best possible use of existing data sets from related domains, relative to the target domain. To this end, we further investigate in this work transfer learning techniques in deep learning architectures for the task of activity recognition in office settings. We provide herein a solution model that attains a 94% accuracy under the right conditions
    corecore