841 research outputs found

    Designing a Visual Front End in Audio-Visual Automatic Speech Recognition System

    Get PDF
    Audio-visual automatic speech recognition (AVASR) is a speech recognition technique integrating audio and video signals as input. Traditional audio-only speech recognition system only uses acoustic information from an audio source. However the recognition performance degrades significantly in acoustically noisy environments. It has been shown that visual information also can be used to identify speech. To improve the speech recognition performance, audio-visual automatic speech recognition has been studied. In this paper, we focus on the design of the visual front end of an AVASR system, which mainly consists of face detection and lip localization. The front end is built upon the AVICAR database that was recorded in moving vehicles. Therefore, diverse lighting conditions and poor quality of imagery are the problems we must overcome. We first propose the use of the Viola-Jones face detection algorithm that can process images rapidly with high detection accuracy. When the algorithm is applied to the AVICAR database, we reach an accuracy of 89% face detection rate. By separately detecting and integrating the detection results from all different color channels, we further improve the detection accuracy to 95%. To reliably localize the lips, three algorithms are studied and compared: the Gabor filter algorithm, the lip enhancement algorithm, and the modified Viola-Jones algorithm for lip features. Finally, to increase detection rate, a modified Viola-Jones algorithm and lip enhancement algorithms are cascaded based on the results of three lip localization methods. Overall, the front end achieves an accuracy of 90% for lip localization

    What makes for effective detection proposals?

    Full text link
    Current top performing object detectors employ detection proposals to guide the search for objects, thereby avoiding exhaustive sliding window search across images. Despite the popularity and widespread use of detection proposals, it is unclear which trade-offs are made when using them during object detection. We provide an in-depth analysis of twelve proposal methods along with four baselines regarding proposal repeatability, ground truth annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM, R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object detection improving proposal localisation accuracy is as important as improving recall. We introduce a novel metric, the average recall (AR), which rewards both high recall and good localisation and correlates surprisingly well with detection performance. Our findings show common strengths and weaknesses of existing methods, and provide insights and metrics for selecting and tuning proposal methods.Comment: TPAMI final version, duplicate proposals removed in experiment

    Object detection and activity recognition in digital image and video libraries

    Get PDF
    This thesis is a comprehensive study of object-based image and video retrieval, specifically for car and human detection and activity recognition purposes. The thesis focuses on the problem of connecting low level features to high level semantics by developing relational object and activity presentations. With the rapid growth of multimedia information in forms of digital image and video libraries, there is an increasing need for intelligent database management tools. The traditional text based query systems based on manual annotation process are impractical for today\u27s large libraries requiring an efficient information retrieval system. For this purpose, a hierarchical information retrieval system is proposed where shape, color and motion characteristics of objects of interest are captured in compressed and uncompressed domains. The proposed retrieval method provides object detection and activity recognition at different resolution levels from low complexity to low false rates. The thesis first examines extraction of low level features from images and videos using intensity, color and motion of pixels and blocks. Local consistency based on these features and geometrical characteristics of the regions is used to group object parts. The problem of managing the segmentation process is solved by a new approach that uses object based knowledge in order to group the regions according to a global consistency. A new model-based segmentation algorithm is introduced that uses a feedback from relational representation of the object. The selected unary and binary attributes are further extended for application specific algorithms. Object detection is achieved by matching the relational graphs of objects with the reference model. The major advantages of the algorithm can be summarized as improving the object extraction by reducing the dependence on the low level segmentation process and combining the boundary and region properties. The thesis then addresses the problem of object detection and activity recognition in compressed domain in order to reduce computational complexity. New algorithms for object detection and activity recognition in JPEG images and MPEG videos are developed. It is shown that significant information can be obtained from the compressed domain in order to connect to high level semantics. Since our aim is to retrieve information from images and videos compressed using standard algorithms such as JPEG and MPEG, our approach differentiates from previous compressed domain object detection techniques where the compression algorithms are governed by characteristics of object of interest to be retrieved. An algorithm is developed using the principal component analysis of MPEG motion vectors to detect the human activities; namely, walking, running, and kicking. Object detection in JPEG compressed still images and MPEG I frames is achieved by using DC-DCT coefficients of the luminance and chrominance values in the graph based object detection algorithm. The thesis finally addresses the problem of object detection in lower resolution and monochrome images. Specifically, it is demonstrated that the structural information of human silhouettes can be captured from AC-DCT coefficients

    Face Detection And Lip Localization

    Get PDF
    Integration of audio and video signals for automatic speech recognition has become an important field of study. The Audio-Visual Speech Recognition (AVSR) system is known to have accuracy higher than audio-only or visual-only system. The research focused on the visual front end and has been centered around lip segmentation. Experiments performed for lip feature extraction were mainly done in constrained environment with controlled background noise. In this thesis we focus our attention to a database collected in the environment of a moving car which hampered the quality of the imagery. We first introduce the concept of illumination compensation, where we try to reduce the dependency of light from over- or under-exposed images. As a precursor to lip segmentation, we focus on a robust face detection technique which reaches an accuracy of 95%. We have detailed and compared three different face detection techniques and found a successful way of concatenating them in order to increase the overall accuracy. One of the detection techniques used was the object detection algorithm proposed by Viola-Jones. We have experimented with different color spaces using the Viola-Jones algorithm and have reached interesting conclusions. Following face detection we implement a lip localization algorithm based on the vertical gradients of hybrid equations of color. Despite the challenging background and image quality, success rate of 88% was achieved for lip segmentation

    Solar rotation speed by detecting and tracking of Coronal Bright Points

    Get PDF
    Coronal Bright Points are one of many Solar manifestations that provide scientists evi-dences of its activity and are usually recognized by being small light dots, like scattered jewels. For many years these Bright Points have been overlooked due to another element of solar activity, sunspots, which drawn scientists full attention mainly because they were easier to detect. Never-theless, CBPs as a result of a clear distribution across all latitudes, provide better tracers to study Solar corona rotation. A literature review on CBPs detection and tracking unveiled limitations both in detection accuracy and lacking an automated image processing feature. The purpose of this dissertation was to present an alternative method for detecting CBPs using advanced image processing techniques and provide an automatic recognition software. The proposed methodology is divided into pre-processing methods, a segmentation section, post processing and a data evaluation approach to increase the CBP detection efficiency. As iden-tified by the study of the available data, pre-processing transformations were needed to ensure each image met certain specifications for future detection. The detection process includes a gra-dient based segmentation algorithm, previously developed for retinal image analysis, which is now successfully applied to this CBP case study. The outcome is the CBP list obtained by the detection algorithm which is then filtered and evaluated to remove false positives. To validate the proposed methodology, CBPs need to be tracked along time, to obtain the rotation of the Solar corona. Therefore, the images used in this study were taken from 19.3nm wavelength by the AIA 193 instrument on board of the Solar Dynamics Observatory (SDO) sat-ellite over 3 days during august 2010. These images allowed the perception of how CBPs angular rotation velocity not only depends on heliographic latitude, but also on other factors such as time. From the results obtained it was clear that the proposed methodology is an effective method to detect and track CBPs providing a consistent method for its detection

    Improving Iris Recognition through Quality and Interoperability Metrics

    Get PDF
    The ability to identify individuals based on their iris is known as iris recognition. Over the past decade iris recognition has garnered much attention because of its strong performance in comparison with other mainstream biometrics such as fingerprint and face recognition. Performance of iris recognition systems is driven by application scenario requirements. Standoff distance, subject cooperation, underlying optics, and illumination are a few examples of these requirements which dictate the nature of images an iris recognition system has to process. Traditional iris recognition systems, dubbed stop and stare , operate under highly constrained conditions. This ensures that the captured image is of sufficient quality so that the success of subsequent processing stages, segmentation, encoding, and matching are not compromised. When acquisition constraints are relaxed, such as for surveillance or iris on the move, the fidelity of subsequent processing steps lessens.;In this dissertation we propose a multi-faceted framework for mitigating the difficulties associated with non-ideal iris. We develop and investigate a comprehensive iris image quality metric that is predictive of iris matching performance. The metric is composed of photometric measures such as defocus, motion blur, and illumination, but also contains domain specific measures such as occlusion, and gaze angle. These measures are then combined through a fusion rule based on Dempster-Shafer theory. Related to iris segmentation, which is arguably one of the most important tasks in iris recognition, we develop metrics which are used to evaluate the precision of the pupil and iris boundaries. Furthermore, we illustrate three methods which take advantage of the proposed segmentation metrics for rectifying incorrect segmentation boundaries. Finally, we look at the issue of iris image interoperability and demonstrate that techniques from the field of hardware fingerprinting can be utilized to improve iris matching performance when images captured from distinct sensors are involved

    Robust iris recognition under unconstrained settings

    Get PDF
    Tese de mestrado integrado. Bioengenharia. Faculdade de Engenharia. Universidade do Porto. 201
    corecore