72 research outputs found

    Object Detection: Current and Future Directions

    Get PDF

    Object detection and segmentation using discriminative learning

    Get PDF
    Object detection and segmentation algorithms need to use prior knowledge of objects' shape and appearance to guide solutions to correct ones. A promising way of obtaining prior knowledge is to learn it directly from expert annotations by using machine learning techniques. Previous approaches commonly use generative learning approaches to achieve this goal. In this dissertation, I propose a series of discriminative learning algorithms based on boosting principles to learn prior knowledge from image databases with expert annotations. The learned knowledge improves the performance of detection and segmentation, leading to fast and accurate solutions. For object detection, I present a learning procedure called a Probabilistic Boosting Network (PBN) suitable for real-time object detection and pose estimation. Based on the law of total probability, PBN integrates evidence from two building blocks, namely a multiclass classifier for pose estimation and a detection cascade for object detection. Both the classifier and detection cascade employ boosting. By inferring the pose parameter, I avoid the exhaustive scan over pose parameters, which hampers real-time detection. I implement PBN using a graph-structured network that alternates the two tasks of object detection and pose estimation in an effort to reject negative cases as quickly as possible. Compared with previous approaches, PBN has higher accuracy in object localization and pose estimation with noticeable reduced computation. For object segmentation, I cast deformable object segmentation as optimizing the conditional probability density function p(C|I), where I is an image and C is a vector of model parameters describing the object shape. I propose a regression approach to learn the density p(C|I) discriminatively based on boosting principles. The learned density p(C|I) possesses a desired unimodal, smooth shape, which can be used by optimization algorithms to efficiently estimate a solution. To handle the high-dimensional learning challenges, I propose a multi-level approach and a gradient-based sampling strategy to learn regression functions efficiently. I show that the regression approach consistently outperforms state-of-the-art methods on a variety of testing datasets. Finally, I present a comparative study on how to apply three discriminative learning approaches - classification, regression, and ranking - to deformable shape segmentation. I discuss how to extend the idea of the regression approach to build discriminative models using classification and ranking. I propose sampling strategies to collect training examples from a high-dimensional model space for the classification and the ranking approach. I also propose a ranking algorithm based on Rankboost to learn a discriminative model for segmentation. Experimental results on left ventricle and left atrium segmentation from ultrasound images and facial feature localization demonstrate that the discriminative models outperform generative models and energy minimization methods by a large margin

    An Efficient Boosted Classifier Tree-Based Feature Point Tracking System for Facial Expression Analysis

    Get PDF
    The study of facial movement and expression has been a prominent area of research since the early work of Charles Darwin. The Facial Action Coding System (FACS), developed by Paul Ekman, introduced the first universal method of coding and measuring facial movement. Human-Computer Interaction seeks to make human interaction with computer systems more effective, easier, safer, and more seamless. Facial expression recognition can be broken down into three distinctive subsections: Facial Feature Localization, Facial Action Recognition, and Facial Expression Classification. The first and most important stage in any facial expression analysis system is the localization of key facial features. Localization must be accurate and efficient to ensure reliable tracking and leave time for computation and comparisons to learned facial models while maintaining real-time performance. Two possible methods for localizing facial features are discussed in this dissertation. The Active Appearance Model is a statistical model describing an object\u27s parameters through the use of both shape and texture models, resulting in appearance. Statistical model-based training for object recognition takes multiple instances of the object class of interest, or positive samples, and multiple negative samples, i.e., images that do not contain objects of interest. Viola and Jones present a highly robust real-time face detection system, and a statistically boosted attentional detection cascade composed of many weak feature detectors. A basic algorithm for the elimination of unnecessary sub-frames while using Viola-Jones face detection is presented to further reduce image search time. A real-time emotion detection system is presented which is capable of identifying seven affective states (agreeing, concentrating, disagreeing, interested, thinking, unsure, and angry) from a near-infrared video stream. The Active Appearance Model is used to place 23 landmark points around key areas of the eyes, brows, and mouth. A prioritized binary decision tree then detects, based on the actions of these key points, if one of the seven emotional states occurs as frames pass. The completed system runs accurately and achieves a real-time frame rate of approximately 36 frames per second. A novel facial feature localization technique utilizing a nested cascade classifier tree is proposed. A coarse-to-fine search is performed in which the regions of interest are defined by the response of Haar-like features comprising the cascade classifiers. The individual responses of the Haar-like features are also used to activate finer-level searches. A specially cropped training set derived from the Cohn-Kanade AU-Coded database is also developed and tested. Extensions of this research include further testing to verify the novel facial feature localization technique presented for a full 26-point face model, and implementation of a real-time intensity sensitive automated Facial Action Coding System

    Matryoshka Representation Learning

    Full text link
    Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we design a flexible representation that can adapt to multiple downstream tasks with varying computational resources? Our main contribution is Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT). MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.Comment: 35 pages, 12 figures. NeurIPS 2022 camera ready publicatio

    Efficient Human Pose Estimation with Image-dependent Interactions

    Get PDF
    Human pose estimation from 2D images is one of the most challenging and computationally-demanding problems in computer vision. Standard models such as Pictorial Structures consider interactions between kinematically connected joints or limbs, leading to inference cost that is quadratic in the number of pixels. As a result, researchers and practitioners have restricted themselves to simple models which only measure the quality of limb-pair possibilities by their 2D geometric plausibility. In this talk, we propose novel methods which allow for efficient inference in richer models with data-dependent interactions. First, we introduce structured prediction cascades, a structured analog of binary cascaded classifiers, which learn to focus computational effort where it is needed, filtering out many states cheaply while ensuring the correct output is unfiltered. Second, we propose a way to decompose models of human pose with cyclic dependencies into a collection of tree models, and provide novel methods to impose model agreement. Finally, we develop a local linear approach that learns bases centered around modes in the training data, giving us image-dependent local models which are fast and accurate. These techniques allow for sparse and efficient inference on the order of minutes or seconds per image. As a result, we can afford to model pairwise interaction potentials much more richly with data-dependent features such as contour continuity, segmentation alignment, color consistency, optical flow and multiple modes. We show empirically that these richer models are worthwhile, obtaining significantly more accurate pose estimation on popular datasets

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Better prognostic markers for nonmuscle invasive papillary urothelial carcinomas

    Get PDF
    Bladder cancer is a common type of cancer, especially among men in developed countries. Most cancers in the urinary bladder are papillary urothelial carcinomas. They are characterized by a high recurrence frequency (up to 70 %) after local resection. It is crucial for prognosis to discover these recurrent tumours at an early stage, especially before they become muscle-invasive. Reliable prognostic biomarkers for tumour recurrence and stage progression are lacking. This is why patients diagnosed with a non-muscle invasive bladder cancer follow extensive follow-up regimens with possible serious side effects and with high costs for the healthcare systems. WHO grade and tumour stage are two central biomarkers currently having great impact on both treatment decisions and follow-up regimens. However, there are concerns regarding the reproducibility of WHO grading, and stage classification is challenging in small and fragmented tumour material. In Paper I, we examined the reproducibility and the prognostic value of all the individual microscopic features making up the WHO grading system. Among thirteen extracted features there was considerable variation in both reproducibility and prognostic value. The only feature being both reasonably reproducible and statistically significant prognostic was cell polarity. We concluded that further validation studies are needed on these features, and that future grading systems should be based on well-defined features with true prognostic value. With the implementation of immunotherapy, there is increasing interest in tumour immune response and the tumour microenvironment. In a search for better prognostic biomarkers for tumour recurrence and stage progression, in Paper II, we investigated the prognostic value of tumour infiltrating immune cells (CD4, CD8, CD25 and CD138) and previously investigated cell proliferation markers (Ki-67, PPH3 and MAI). Low Ki 67 and tumour multifocality were associated with increased recurrence risk. Recurrence risk was not affected by the composition of immune cells. For stage progression, the only prognostic immune cell marker was CD25. High values for MAI was also strongly associated with stage progression. However, in a multivariate analysis, the most prognostic feature was a combination of MAI and CD25. BCG-instillations in the bladder are indicated in intermediate and high-risk non-muscle invasive bladder cancer patients. This old-fashion immunotherapy has proved to reduce both recurrence- and progression-risk, although it is frequently followed by unpleasant side-effects. As many as 30-50% of high-risk patients receiving BCG instillations, fail by develop high-grade recurrences. They do not only suffer from unnecessary side-effects, but will also have a delay in further treatment. Together with colleagues at three different Dutch hospitals, in Paper III, we looked at the prognostic and predictive value of T1-substaging. A T1-tumour invades the lamina propria, and we wanted to separate those with micro- from those with extensive invasion. We found that BCG-failure was more common among patients with extensive invasion. Furthermore, T1-substaging was associated with both high-grade recurrence-free and progression-free survival. Finally, in Paper IV, we wanted to investigate the prognostic value of two classical immunohistochemical markers, p53 and CK20, and compare them with previously investigated proliferation markers. p53 is a surrogate marker for mutations in the gene TP53, considered to be a main characteristic for muscle-invasive tumours. CK20 is a surrogate marker for luminal tumours in the molecular classification of bladder cancer, and is frequently used to distinguish reactive urothelial changes from urothelial carcinoma in situ. We found both positivity for p53 and CK20 to be significantly associated with stage progression, although not performing better than WHO grade and stage. The proliferation marker MAI, had the highest prognostic value in our study. Any combination of variables did not perform better in a multivariate analysis than MAI alone
    • …
    corecore