7 research outputs found

    Human Attention Detection Using AM-FM Representations

    Get PDF
    Human activity detection from digital videos presents many challenges to the computer vision and image processing communities. Recently, many methods have been developed to detect human activities with varying degree of success. Yet, the general human activity detection problem remains very challenging, especially when the methods need to work “in the wild” (e.g., without having precise control over the imaging geometry). The thesis explores phase-based solutions for (i) detecting faces, (ii) back of the heads, (iii) joint detection of faces and back of the heads, and (iv) whether the head is looking to the left or the right, using standard video cameras without any control on the imaging geometry. The proposed phase-based approach is based on the development of simple and robust methods that relie on the use of Amplitude Modulation - Frequency Modulation (AM-FM) models. The approach is validated using video frames extracted from the Advancing Outof- school Learning in Mathematics and Engineering (AOLME) project. The dataset consisted of 13,265 images from ten students looking at the camera, and 6,122 images from five students looking away from the camera. For the students facing the camera, the method was able to correctly classify 97.1% of them looking to the left and 95.9% of them looking to the right. For the students facing the back of the camera, the method was able to correctly classify 87.6% of them looking to the left and 93.3% of them looking to the right. The results indicate that AM-FM based methods hold great promise for analyzing human activity videos

    The Importance of the Instantaneous Phase in Detecting Faces with Convolutional Neural Networks

    Get PDF
    Convolutional Neural Networks (CNN) have provided new and accurate methods for processing digital images and videos. Yet, training CNNs is extremely demanding in terms of computational resources. Also, for simple applications, the standard use of transfer learning also tends to require far more resources than what may be needed. Furthermore, the final systems tend to operate as black boxes that are difficult to interpret. The current thesis considers the problem of detecting faces from the AOLME video dataset. The AOLME dataset consists of a large video collection of group interactions that are recorded in unconstrained classroom environments. For the thesis, still image frames were extracted at every minute from 18 24-minute videos. Then, each video frame was divided into 9x5 blocks with 50x50 pixels each. For each of the 19440 blocks, the percentage of face pixels was set as ground truth. Face detection was then defined as a regression problem for determining the face pixel percentage for each block. For testing different methods, 12 videos were used for training and validation. The remaining 6 videos were used for testing. The thesis examines the impact of using the instantaneous phase for the AOLME block-based face detection application. For comparison, the thesis compares the use of the Frequency modulation image based on the instantaneous phase, the use of the instantaneous amplitude, and the original gray scale image. To generate the FM and AM inputs, the thesis uses dominant component analysis that aims to decrease the training overhead while maintaining interpretability. The results indicate that the use of the FM image yielded about the same performance as the MobileNet V2 architecture (AUC of 0.78 vs 0.79), with vastly reduced training times. Training was 7x faster for an Intel Xeon with a GTX 1080 based desktop and 11x faster on a laptop with Intel i5 with a GTX 1050. Furthermore, the proposed architecture trains 123x less parameters than what is needed for MobileNet V2. The FM-based neural network architecture uses a single convolutional layer. In comparison, the full LeNet-5 on the same image block using the original image could not be trained for face detection (AUC of 0.5)

    The Importance of the Instantaneous Phase in Detecting Faces with Convolutional Neural Networks

    Full text link
    Convolutional Neural Networks (CNN) have provided new and accurate methods for processing digital images and videos. Yet, training CNNs is extremely demanding in terms of computational resources. Also, for specific applications, the standard use of transfer learning also tends to require far more resources than what may be needed. Furthermore, the final systems tend to operate as black boxes that are difficult to interpret. The current thesis considers the problem of detecting faces from the AOLME video dataset. The AOLME dataset consists of a large video collection of group interactions that are recorded in unconstrained classroom environments. For the thesis, still image frames were extracted at every minute from 18 24-minute videos. Then, each video frame was divided into 9x5 blocks with 50x50 pixels each. For each of the 19440 blocks, the percentage of face pixels was set as ground truth. Face detection was then defined as a regression problem for determining the face pixel percentage for each block. For testing different methods, 12 videos were used for training and validation. The remaining 6 videos were used for testing. The thesis examines the impact of using the instantaneous phase for the AOLME block-based face detection application. For comparison, the thesis compares the use of the Frequency Modulation image based on the instantaneous phase, the use of the instantaneous amplitude, and the original gray scale image. To generate the FM and AM inputs, the thesis uses dominant component analysis that aims to decrease the training overhead while maintaining interpretability.Comment: Master Thesi

    Distributed and Scalable Video Analysis Architecture for Human Activity Recognition Using Cloud Services

    Get PDF
    This thesis proposes an open-source, maintainable system for detecting human activity in large video datasets using scalable hardware architectures. The system is validated by detecting writing and typing activities that were collected as part of the Advancing Out of School Learning in Mathematics and Engineering (AOLME) project. The implementation of the system using Amazon Web Services (AWS) is shown to be both horizontally and vertically scalable. The software associated with the system was designed to be robust so as to facilitate reproducibility and extensibility for future research

    Despeckle Filtering for Multiscale Amplitude-Modulation Frequency-Modulation (AM-FM) Texture Analysis of Ultrasound Images of the Intima-Media Complex

    Get PDF
    The intima-media thickness (IMT) of the common carotid artery (CCA) is widely used as an early indicator of cardiovascular disease (CVD). Typically, the IMT grows with age and this is used as a sign of increased risk of CVD. Beyond thickness, there is also clinical interest in identifying how the composition and texture of the intima-media complex (IMC) changed and how these textural changes grow into atherosclerotic plaques that can cause stroke. Clearly though texture analysis of ultrasound images can be greatly affected by speckle noise, our goal here is to develop effective despeckle noise methods that can recover image texture associated with increased rates of atherosclerosis disease. In this study, we perform a comparative evaluation of several despeckle filtering methods, on 100 ultrasound images of the CCA, based on the extracted multiscale Amplitude-Modulation Frequency-Modulation (AM-FM) texture features and visual image quality assessment by two clinical experts. Texture features were extracted from the automatically segmented IMC for three different age groups. The despeckle filters hybrid median and the homogeneous mask area filter showed the best performance by improving the class separation between the three age groups and also yielded significantly improved image quality

    Advancements and Breakthroughs in Ultrasound Imaging

    Get PDF
    Ultrasonic imaging is a powerful diagnostic tool available to medical practitioners, engineers and researchers today. Due to the relative safety, and the non-invasive nature, ultrasonic imaging has become one of the most rapidly advancing technologies. These rapid advances are directly related to the parallel advancements in electronics, computing, and transducer technology together with sophisticated signal processing techniques. This book focuses on state of the art developments in ultrasonic imaging applications and underlying technologies presented by leading practitioners and researchers from many parts of the world
    corecore