4 research outputs found

    Evaluation of Motion Velocity as a Feature for Sign Language Detection

    Get PDF
    Popular video sharing websites contain a large collection of videos in various sign languages. These websites have the potential of being a significant source of knowledge sharing and communication for the members of the deaf and hard-of-hearing community. However, prior studies have shown that traditional keyword-based search does not do a good job of discovering these videos. Dr. Frank Shipman and others have been working towards building a distributed digital library by indexing the sign language videos available online. This system employs an automatic detector, based on visual features extracted from the video, for filtering out non-sign language content. Features such as the amount and location of hand movements, symmetry of motion etc. have been experimented with for this purpose. Caio Monteiro and his team designed a classifier which uses face detection to identify the region-of-interest (ROI) in a frame, and foreground segmentation to estimate amount of hand motion within the region. It was later improved upon by Karappa et al. by dividing the ROI using polar coordinates and estimating motion in each division to form a composite feature set. This thesis work examines another visual feature associated with the signing activity i.e. speed of hand movements. Speed based features performed better compared to the foreground-based features for a complex dataset of SL and non-SL videos. The F1 score showed a jump from 0.73 to 0.78. However, for a second dataset consisting of videos with single signers and static backgrounds, the classification scores dipped. More consistent performance improvements were observed when features from the two feature sets were used in conjunction. F1 score of 0.76 was observed for the complex dataset. For the second dataset, the F1 score changed from 0.85 to 0.86. Another associated problem is identifying the sign language in a video. The impact of speed of motion on the problem of classifying American Sign Language versus British Sign Language was found to be minimal. We concluded that it is the location of motion which influences this problem more than either the speed or the amount of motion. Non-speed related analyses of sign language detection were also explored. Since the American Sign Language alphabet is one-handed, it was expected that videos with left-handed signing might be falsely identified as British Sign Language, which has a two-handed alphabet. We briefly studied this issue with respect to our corpus of ASL and BSL videos and discovered that our classifier design does not suffer from this issue. Apart from this, we explored speeding up the classification process by computing symmetry of motion in the ROI on selected keyframes as a single feature for classification. The resulting feature extraction was significantly faster but the precision and recall values depreciated to 59% and 62% respectively for a F1 score of .61

    Evaluation of Motion Velocity as a Feature for Sign Language Detection

    Get PDF
    Popular video sharing websites contain a large collection of videos in various sign languages. These websites have the potential of being a significant source of knowledge sharing and communication for the members of the deaf and hard-of-hearing community. However, prior studies have shown that traditional keyword-based search does not do a good job of discovering these videos. Dr. Frank Shipman and others have been working towards building a distributed digital library by indexing the sign language videos available online. This system employs an automatic detector, based on visual features extracted from the video, for filtering out non-sign language content. Features such as the amount and location of hand movements, symmetry of motion etc. have been experimented with for this purpose. Caio Monteiro and his team designed a classifier which uses face detection to identify the region-of-interest (ROI) in a frame, and foreground segmentation to estimate amount of hand motion within the region. It was later improved upon by Karappa et al. by dividing the ROI using polar coordinates and estimating motion in each division to form a composite feature set. This thesis work examines another visual feature associated with the signing activity i.e. speed of hand movements. Speed based features performed better compared to the foreground-based features for a complex dataset of SL and non-SL videos. The F1 score showed a jump from 0.73 to 0.78. However, for a second dataset consisting of videos with single signers and static backgrounds, the classification scores dipped. More consistent performance improvements were observed when features from the two feature sets were used in conjunction. F1 score of 0.76 was observed for the complex dataset. For the second dataset, the F1 score changed from 0.85 to 0.86. Another associated problem is identifying the sign language in a video. The impact of speed of motion on the problem of classifying American Sign Language versus British Sign Language was found to be minimal. We concluded that it is the location of motion which influences this problem more than either the speed or the amount of motion. Non-speed related analyses of sign language detection were also explored. Since the American Sign Language alphabet is one-handed, it was expected that videos with left-handed signing might be falsely identified as British Sign Language, which has a two-handed alphabet. We briefly studied this issue with respect to our corpus of ASL and BSL videos and discovered that our classifier design does not suffer from this issue. Apart from this, we explored speeding up the classification process by computing symmetry of motion in the ROI on selected keyframes as a single feature for classification. The resulting feature extraction was significantly faster but the precision and recall values depreciated to 59% and 62% respectively for a F1 score of .61

    Evaluation of Alternative Face Detection Techniques and Video Segment Lengths on Sign Language Detection

    Get PDF
    Sign language is the primary medium of communication for people who are hearing impaired. Sign language videos are hard to discover in video sharing sites as the text-based search is based on metadata rather than the content of the videos. The sign language community currently shares content through ad-hoc mechanisms as no library meets their requirements. Low cost or even real-time classification techniques are valuable to create a sign language digital library with its content being updated as new videos are uploaded to YouTube and other video sharing sites. Prior research was able to detect sign language videos using face detection and background subtraction with recall and precision that is suitable to create a digital library. This approach analyzed one minute of each video being classified. Polar Motion Profiles achieved better recall with videos containing multiple signers but at a significant computational cost as it included five face trackers. This thesis explores techniques to reduce the computation time involved in feature extraction without overly impacting precision and recall deeply. This thesis explores three optimizations to the above techniques. First, we compared the individual performance of the five face detectors and determined the best performing single face detector. Second, we evaluated the performance detection using Polar Motion Profiles when face detection was performed on sampled frames rather than detecting in every frame. From our results, Polar Motion Profiles performed well even when the information between frames is sacrificed. Finally, we looked at the effect of using shorter video segment lengths for feature extraction. We found that the drop in precision is minor as video segments were made shorter from the initial empirical length of a minute. Through our work, we found an empirical configuration that can classify videos with close to two orders of magnitude less computation but with precision and recall not too much below the original voting scheme. Our model improves detection time of sign language videos that in turn would help enrich the digital library with fresh content quickly. Future work can be focused on enabling diarization by segmenting the video to find sign language content and non-sign language content with effective background subtraction techniques for shorter videos

    Detection of Sign-Language Content in Video through Polar Motion Profiles

    Get PDF
    Locating sign language (SL) videos on video sharing sites (e.g., YouTube) is challenging because search engines generally do not use the visual content of videos for indexing. Instead, indexing is done solely based on textual content (e.g., title, description, metadata etc.). As a result, untagged SL videos do not appear in the search results. In this thesis, we present and evaluate an approach to detect SL content in videos based on their visual content. Our work focuses on detection of SL content and not on transcription. Our approach relies on face detection and background modeling techniques, combined with a head-centric polar representation of hand movements. The approach uses an ensemble of Haar-based face detectors to define regions of interest (ROI) and a probabilistic background model to segment movements in the ROI. The resulting two-dimensional (2D) distribution of foreground pixels in the ROI is then reduced to two 1D polar motion profiles (PMPs) by means of a polar-coordinate transformation. These profiles are then used for classification of SL videos from others. We evaluate three distinct approaches to process information from the PMPs for classification/detection of SL videos. In the first method, we average out the PMPs across all the ROIs to obtain a single PMP vector for each video. These vectors are then used as input features for an SVM classifier. In the second method, we follow the bag-of-words approach of information retrieval to compute a distribution of PMPs (bag-of-PMPs) for each video. In the third method, we perform linear discriminant analysis (LDA) of PMPs and use the distribution of PMPs projected in the LDA space for classification. When evaluated on a dataset comprising of 205 videos (obtained from YouTube), the average PMP approach achieves a precision of 81% and recall of 94%, whereas the bag-of-PMPs approach leads to a precision of 72% and recall of 70%. In contrast to the first two methods, supervised feature extraction by the third method achieves a higher precision (84%) and recall (94%). Though this thesis presents a successful means by which to detect sign language in videos, our approaches do not consider temporal information, only the distribution of profiles for a given video. Future work should consider extracting temporal information from the sequence of PMPs to utilize the dynamic signatures of sign languages and potentially improve retrieval results. The SL detection techniques presented in this thesis may be used as an automatic tagging tool to annotate user-contributed videos in sharing sites such as YouTube, in this way making sign-language content more accessible to members of the deaf community
    corecore