23 research outputs found

    Asymmetric Pruning for Learning Cascade Detectors

    Full text link
    Cascade classifiers are one of the most important contributions to real-time object detection. Nonetheless, there are many challenging problems arising in training cascade detectors. One common issue is that the node classifier is trained with a symmetric classifier. Having a low misclassification error rate does not guarantee an optimal node learning goal in cascade classifiers, i.e., an extremely high detection rate with a moderate false positive rate. In this work, we present a new approach to train an effective node classifier in a cascade detector. The algorithm is based on two key observations: 1) Redundant weak classifiers can be safely discarded; 2) The final detector should satisfy the asymmetric learning objective of the cascade architecture. To achieve this, we separate the classifier training into two steps: finding a pool of discriminative weak classifiers/features and training the final classifier by pruning weak classifiers which contribute little to the asymmetric learning criterion (asymmetric classifier construction). Our model reduction approach helps accelerate the learning time while achieving the pre-determined learning objective. Experimental results on both face and car data sets verify the effectiveness of the proposed algorithm. On the FDDB face data sets, our approach achieves the state-of-the-art performance, which demonstrates the advantage of our approach.Comment: 14 page

    Taking the bite out of automated naming of characters in TV video

    No full text
    We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”

    Learning a Family of Detectors

    Full text link
    Object detection and recognition are important problems in computer vision. The challenges of these problems come from the presence of noise, background clutter, large within class variations of the object class and limited training data. In addition, the computational complexity in the recognition process is also a concern in practice. In this thesis, we propose one approach to handle the problem of detecting an object class that exhibits large within-class variations, and a second approach to speed up the classification processes. In the first approach, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly solved with using a multiplicative form of two kernel functions. One kernel measures similarity for foreground-background classification. The other kernel accounts for latent factors that control within-class variation and implicitly enables feature sharing among foreground training samples. For applications where explicit parameterization of the within-class states is unavailable, a nonparametric formulation of the kernel can be constructed with a proper foreground distance/similarity measure. Detector training is accomplished via standard Support Vector Machine learning. The resulting detectors are tuned to specific variations in the foreground class. They also serve to evaluate hypotheses of the foreground state. When the image masks for foreground objects are provided in training, the detectors can also produce object segmentation. Methods for generating a representative sample set of detectors are proposed that can enable efficient detection and tracking. In addition, because individual detectors verify hypotheses of foreground state, they can also be incorporated in a tracking-by-detection frame work to recover foreground state in image sequences. To run the detectors efficiently at the online stage, an input-sensitive speedup strategy is proposed to select the most relevant detectors quickly. The proposed approach is tested on data sets of human hands, vehicles and human faces. On all data sets, the proposed approach achieves improved detection accuracy over the best competing approaches. In the second part of the thesis, we formulate a filter-and-refine scheme to speed up recognition processes. The binary outputs of the weak classifiers in a boosted detector are used to identify a small number of candidate foreground state hypotheses quickly via Hamming distance or weighted Hamming distance. The approach is evaluated in three applications: face recognition on the face recognition grand challenge version 2 data set, hand shape detection and parameter estimation on a hand data set, and vehicle detection and estimation of the view angle on a multi-pose vehicle data set. On all data sets, our approach is at least five times faster than simply evaluating all foreground state hypotheses with virtually no loss in classification accuracy

    Minimal Training, Large Lexicon, Unconstrained Sign Language Recognition

    Get PDF
    This paper presents a flexible monocular system capable of recognising sign lexicons far greater in number than previous approaches. The power of the system is due to four key elements: (i) Head and hand detection based upon boosting which removes the need for temperamental colour segmentation; (ii) A body centred description of activity which overcomes issues with camera placement, calibration and user; (iii) A two stage classification in which stage I generates a high level linguistic description of activity which naturally generalises and hence reduces training; (iv) A stage II classifier bank which does not require HMMs, further reducing training requirements. The outcome of which is a system capable of running in real-time, and generating extremely high recognition rates for large lexicons with as little as a single training instance per sign. We demonstrate classification rates as high as 92% for a lexicon of 164 words with extremely low training requirements outperforming previous approaches where thousands of training examples are required

    A Near Real-Time, Highly Scalable, Parallel and Distributed Adaptive Object Detection and Re-Training Framework Based on the Adaboost Algorithm

    Get PDF
    Object detection, such as face detection using supervised learning, often requires extensive training for the computer, which results in high execution times. If the trained system needs re-training in order to accommodate a missed detection, waiting several hours or days before the system is ready may be unacceptable in practical implementations. This dissertation presents a generalized object detection framework whereby the system can efficiently adapt to misclassified data and be re-trained within a few minutes. Our developed methodology is based on the popular AdaBoost algorithm for object detection. AdaBoost functions by iteratively selecting the best among weak classifiers, and then combining several weak classifiers in order to obtain a stronger classifier. Even though AdaBoost has proven to be very effective, its learning execution time can be high depending upon the application. For example, in face detection, learning can take several days. In our dissertation, we present two techniques that contribute to reducing to the learning execution time within the AdaBoost algorithm. Our first technique utilizes a highly parallel and distributed AdaBoost algorithm that exploits the multiple cores in a CPU via lightweight threads. In addition, our technique uses multiple machines in a web service similar to a map-reduce architecture in order to achieve a high scalability, which results in a training execution time of a few minutes rather than several days. Our second technique is a methodology to create an optimal training subset to further reduce the training execution time. We obtained this subset through a novel score-keeping of the weight distribution within the AdaBoost algorithm, and then removed the images that had a minimal effect on the overall trained classifier. Finally, we incorporated our parallel and distributed AdaBoost algorithm, along with the optimized training subset, into a generalized object detection framework that efficiently adapts and makes corrections when it encounters misclassified data. We demonstrated the usefulness of our adaptive framework by providing detailed testing on face and car detection, and explained how our framework applies to developing any other object detection task

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Automatic target recognition in sonar imagery using a cascade of boosted classifiers

    Get PDF
    This thesis is concerned with the problem of automating the interpretation of data representing the underwater environment retrieved from sensors. This is an important task which potentially allows underwater robots to become completely autonomous, keeping humans out of harm’s way and reducing the operational time and cost of many underwater applications. Typical applications include unexploded ordnance clearance, ship/plane wreck hunting (e.g. Malaysia Airlines flight MH370), and oilfield inspection (e.g. Deepwater Horizon disaster). Two attributes of the processing are crucial if automated interpretation is to be successful. First, computational efficiency is required to allow real-time analysis to be performed on-board robots with limited resources. Second, detection accuracy comparable to human experts is required in order to replace them. Approaches in the open literature do not appear capable of achieving these requirements and this therefore has become the objective of this thesis. This thesis proposes a novel approach capable of recognizing targets in sonar data extremely rapidly with a low number of false alarms. The approach was originally developed for face detection in video, and it is applied to sonar data here for the first time. Aside from the application, the main contribution of this thesis, therefore, is in the way this approach is extended to reduce its training time and improve its detection accuracy. Results obtained on large sets of real sonar data on a variety of challenging terrains are presented to show the discriminative power of the proposed approach. In real field trials, the proposed approach was capable of processing sonar data real-time on-board underwater robots. In direct comparison with human experts, the proposed approach offers 40% reduction in the number of false alarms
    corecore