12 research outputs found
Modification of the AdaBoost-based Detector for Partially Occluded Faces
While face detection seems a solved problem under general conditions, most state-of-the-art systems degrade rapidly when faces are partially occluded by other objects. This paper presents a solution to detect partially occluded faces by reasonably modifying the AdaBoost-based face detector. Our basic idea is that the weak classifiers in the AdaBoost-based face detector, each corresponding to a Haar-like feature, are inherently a patch-based model. Therefore, one can divide the whole face region into multiple patches, and map those weak classifiers to the patches. The weak classifiers belonging to each patch are re-formed to be a new classifier to determine if it is a valid face patch—without occlusion. Finally, we combine all of the valid face patches by assigning the patches with different weights to make the final decision whether the input subwindow is a face. The experimental results show that the proposed method is promising for the detection of occluded faces. 1
A Deep Cascade Model for Multi-Document Reading Comprehension
A fundamental trade-off between effectiveness and efficiency needs to be
balanced when designing an online question answering system. Effectiveness
comes from sophisticated functions such as extractive machine reading
comprehension (MRC), while efficiency is obtained from improvements in
preliminary retrieval components such as candidate document selection and
paragraph ranking. Given the complexity of the real-world multi-document MRC
scenario, it is difficult to jointly optimize both in an end-to-end system. To
address this problem, we develop a novel deep cascade learning model, which
progressively evolves from the document-level and paragraph-level ranking of
candidate texts to more precise answer extraction with machine reading
comprehension. Specifically, irrelevant documents and paragraphs are first
filtered out with simple functions for efficiency consideration. Then we
jointly train three modules on the remaining texts for better tracking the
answer: the document extraction, the paragraph extraction and the answer
extraction. Experiment results show that the proposed method outperforms the
previous state-of-the-art methods on two large-scale multi-document benchmark
datasets, i.e., TriviaQA and DuReader. In addition, our online system can
stably serve typical scenarios with millions of daily requests in less than
50ms.Comment: Accepted at AAAI 201
Enhanced face detection framework based on skin color and false alarm rejection
Fast and precise face detection is a challenging task in computer vision. Human face detection plays an essential role in the first stage of face processing applications such as recognition tracking, and image database management. In the applications, face objects often come from an inconsequential part of images that contain variations namely different illumination, pose, and occlusion. These variations can decrease face detection rate noticeably. Besides that, detection time is an important factor, especially in real time systems. Most existing face detection approaches are not accurate as they have not been able to resolve unstructured images due to large appearance variations and can only detect human face under one particular variation. Existing frameworks of face detection need enhancement to detect human face under the stated variations to improve detection rate and reduce detection time. In this study, an enhanced face detection framework was proposed to improve detection rate based on skin color and provide a validity process. A preliminary segmentation of input images based on skin color can significantly reduce search space and accelerate the procedure of human face detection. The main detection process is based on Haar-like features and Adaboost algorithm. A validity process is introduced to reject non-face objects, which may be selected during a face detection process. The validity process is based on a two-stage Extended Local Binary Patterns. Experimental results on CMU-MIT and Caltech 10000 datasets over a wide range of facial variations in different colors, positions, scales, and lighting conditions indicated a successful face detection rate. As a conclusion, the proposed enhanced face detection framework in color images with the presence of varying lighting conditions and under different poses has resulted in high detection rate and reducing overall detection time
Sistema de visão para detecção de pessoas em movimento
Estágio realizado na Acronym-I.T., e orientado pelo Eng. Carlos SilvaTese de mestrado integrado. Engenharia Informática e Computação. Universidade do Porto. Faculdade de Engenharia. 201
Tracking the emergence of visual recognition through multivariate approaches
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 121-130).The visual system is a dynamic entity whose response properties depend on context and experience. In this thesis, I examine how the brain changes as we learn to see - what changes occur during the onset of recognition, in the mature visual system on the one hand, and in a developmentally nascent one, on the other? Working with normal adults, I focus on the processes that underlie the interpretation of images as meaningful entities. This interpretation is greatly facilitated by prior information about a stimulus. What are the neural sites that exhibit experience dependent changes? Using multivariate decoding techniques, I find pervasive evidence of such changes throughout the visual system. Critically, cortical regions previously implicated in such learning are not the same loci as sites of increased information. Examining the temporal mechanisms of recognition, I identify the perceptual state transitions corresponding to the onset of meaning in an observed image. Furthermore, decoding techniques reveal the flow of information during this 'eureka moment.' I find feedback processing when a degraded image is first meaningfully interpreted, and then a rapid transition into feed-forward processing for more coherent images. Complementing the studies with mature subjects, my work with developmentally nascent observers explores the genesis of visual interpretation. What neural changes accompany the earliest stages of visual learning? I show that children treated for congenital blindness exhibit significant cortical re-organization after sight onset, in contrast to the classical notion of a critical period for visual plasticity. The specific kind of reorganization suggests that visual experience enhances information coding efficiency in visual cortex. Additionally, I present evidence of rapid development of functionally specialized cortical regions. Overall, the thesis presents two complementary perspectives on the genesis of visual meaning. The results help advance our understanding of how short-term experience, as well as developmental history, shapes our interpretation of the complex visual world.by Scott Gorlin.Ph.D
Recommended from our members
Video content analysis for automated detection and tracking of humans in CCTV surveillance applications
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The problems of achieving high detection rate with low false alarm rate for human detection and tracking in video sequence, performance scalability, and improving response time are addressed in this thesis. The underlying causes are the effect of scene complexity, human-to-human interactions, scale changes, and scene background-human interactions. A two-stage processing solution, namely, human detection, and human tracking with two novel pattern classifiers is presented. Scale independent human detection is achieved by processing in the wavelet domain using square wavelet features. These features used to characterise human silhouettes at different scales are similar to rectangular features used in [Viola 2001]. At the detection stage two detectors are combined to improve detection rate. The first detector is based on shape-outline of humans extracted from the scene using a reduced complexity outline extraction algorithm. A Shape mismatch measure is used to differentiate between the human and the background class. The second detector uses rectangular features as primitives for silhouette description in the wavelet domain. The marginal distribution of features collocated at a particular position on a candidate human (a patch of the image) is used to describe statistically the silhouette. Two similarity measures are computed between a candidate human and the model histograms of human and non human classes. The similarity measure is used to discriminate between the human and the non human class. At the tracking stage, a tracker based on joint probabilistic data association filter (JPDAF) for data association, and motion correspondence is presented. Track clustering is used to reduce hypothesis enumeration complexity. Towards improving response time with increase in frame dimension, scene complexity, and number of channels; a scalable algorithmic architecture and operating accuracy prediction technique is presented. A scheduling strategy for improving the response time and throughput by parallel processing is also presented