188 research outputs found

    Multiclass Alignment of Confidence and Certainty for Network Calibration

    Full text link
    Deep neural networks (DNNs) have made great strides in pushing the state-of-the-art in several challenging domains. Recent studies reveal that they are prone to making overconfident predictions. This greatly reduces the overall trust in model predictions, especially in safety-critical applications. Early work in improving model calibration employs post-processing techniques which rely on limited parameters and require a hold-out set. Some recent train-time calibration methods, which involve all model parameters, can outperform the postprocessing methods. To this end, we propose a new train-time calibration method, which features a simple, plug-and-play auxiliary loss known as multi-class alignment of predictive mean confidence and predictive certainty (MACC). It is based on the observation that a model miscalibration is directly related to its predictive certainty, so a higher gap between the mean confidence and certainty amounts to a poor calibration both for in-distribution and out-of-distribution predictions. Armed with this insight, our proposed loss explicitly encourages a confident (or underconfident) model to also provide a low (or high) spread in the presoftmax distribution. Extensive experiments on ten challenging datasets, covering in-domain, out-domain, non-visual recognition and medical image classification scenarios, show that our method achieves state-of-the-art calibration performance for both in-domain and out-domain predictions. Our code and models will be publicly released.Comment: Accepted at GCPR 202

    Visual tracking over multiple temporal scales

    Get PDF
    Visual tracking is the task of repeatedly inferring the state (position, motion, etc.) of the desired target in an image sequence. It is an important scientific problem as humans can visually track targets in a broad range of settings. However, visual tracking algorithms struggle to robustly follow a target in unconstrained scenarios. Among the many challenges faced by visual trackers, two important ones are occlusions and abrupt motion variations. Occlusions take place when (an)other object(s) obscures the camera's view of the tracked target. A target may exhibit abrupt variations in apparent motion due to its own unexpected movement, camera movement, and low frame rate image acquisition. Each of these issues can cause a tracker to lose its target. This thesis introduces the idea of learning and propagation of tracking information over multiple temporal scales to overcome occlusions and abrupt motion variations. A temporal scale is a specific sequence of moments in time Models (describing appearance and/or motion of the target) can be learned from the target tracking history over multiple temporal scales and applied over multiple temporal scales in the future. With the rise of multiple motion model tracking frameworks, there is a need for a broad range of search methods and ways of selecting between the available motion models. The potential benefits of learning over multiple temporal scales are first assessed by studying both motion and appearance variations in the ground-truth data associated with several image sequences. A visual tracker operating over multiple temporal scales is then proposed that is capable of handling occlusions and abrupt motion variations. Experiments are performed to compare the performance of the tracker with competing methods, and to analyze the impact on performance of various elements of the proposed approach. Results reveal a simple, yet general framework for dealing with occlusions and abrupt motion variations. In refining the proposed framework, a search method is generalized for multiple competing hypotheses in visual tracking, and a new motion model selection criterion is proposed

    Synergy between face alignment and tracking via Discriminative Global Consensus Optimization

    Get PDF
    An open question in facial landmark localization in video is whether one should perform tracking or tracking-by-detection (i.e. face alignment). Tracking produces fittings of high accuracy but is prone to drifting. Tracking-by-detection is drift-free but results in low accuracy fittings. To provide a solution to this problem, we describe the very first, to the best of our knowledge, synergistic approach between detection (face alignment) and tracking which completely eliminates drifting from face tracking, and does not merely perform tracking-by-detection. Our first main contribution is to show that one can achieve this synergy between detection and tracking using a principled optimization framework based on the theory of Global Variable Consensus Optimization using ADMM; Our second contribution is to show how the proposed analytic framework can be integrated within state-of-the-art discriminative methods for face alignment and tracking based on cascaded regression and deeply learned features. Overall, we call our method Discriminative Global Consensus Model (DGCM). Our third contribution is to show that DGCM achieves large performance improvement over the currently best performing face tracking methods on the most challenging category of the 300-VW dataset

    Visual tracking over multiple temporal scales

    Get PDF
    Visual tracking is the task of repeatedly inferring the state (position, motion, etc.) of the desired target in an image sequence. It is an important scientific problem as humans can visually track targets in a broad range of settings. However, visual tracking algorithms struggle to robustly follow a target in unconstrained scenarios. Among the many challenges faced by visual trackers, two important ones are occlusions and abrupt motion variations. Occlusions take place when (an)other object(s) obscures the camera's view of the tracked target. A target may exhibit abrupt variations in apparent motion due to its own unexpected movement, camera movement, and low frame rate image acquisition. Each of these issues can cause a tracker to lose its target. This thesis introduces the idea of learning and propagation of tracking information over multiple temporal scales to overcome occlusions and abrupt motion variations. A temporal scale is a specific sequence of moments in time Models (describing appearance and/or motion of the target) can be learned from the target tracking history over multiple temporal scales and applied over multiple temporal scales in the future. With the rise of multiple motion model tracking frameworks, there is a need for a broad range of search methods and ways of selecting between the available motion models. The potential benefits of learning over multiple temporal scales are first assessed by studying both motion and appearance variations in the ground-truth data associated with several image sequences. A visual tracker operating over multiple temporal scales is then proposed that is capable of handling occlusions and abrupt motion variations. Experiments are performed to compare the performance of the tracker with competing methods, and to analyze the impact on performance of various elements of the proposed approach. Results reveal a simple, yet general framework for dealing with occlusions and abrupt motion variations. In refining the proposed framework, a search method is generalized for multiple competing hypotheses in visual tracking, and a new motion model selection criterion is proposed

    Generalizing to Unseen Domains in Diabetic Retinopathy Classification

    Full text link
    Diabetic retinopathy (DR) is caused by long-standing diabetes and is among the fifth leading cause for visual impairments. The process of early diagnosis and treatments could be helpful in curing the disease, however, the detection procedure is rather challenging and mostly tedious. Therefore, automated diabetic retinopathy classification using deep learning techniques has gained interest in the medical imaging community. Akin to several other real-world applications of deep learning, the typical assumption of i.i.d data is also violated in DR classification that relies on deep learning. Therefore, developing DR classification methods robust to unseen distributions is of great value. In this paper, we study the problem of generalizing a model to unseen distributions or domains (a.k.a domain generalization) in DR classification. To this end, we propose a simple and effective domain generalization (DG) approach that achieves self-distillation in vision transformers (ViT) via a novel prediction softening mechanism. This prediction softening is an adaptive convex combination one-hot labels with the model's own knowledge. We perform extensive experiments on challenging open-source DR classification datasets under both multi-source and single-source DG settings with three different ViT backbones to establish the efficacy and applicability of our approach against competing methods. For the first time, we report the performance of several state-of-the-art DG methods on open-source DR classification datasets after conducting thorough experiments. Finally, our method is also capable of delivering improved calibration performance than other methods, showing its suitability for safety-critical applications, including healthcare. We hope that our contributions would investigate more DG research across the medical imaging community.Comment: Accepted at WACV 202

    DNA-based Eye Color Prediction of Pakhtun Population Living in District Swat KP Pakistan

    Get PDF
    Background: Forensic DNA Phenotyping (FDP) or the prediction of Externally Visible Characteristics (EVCs) from a DNA sample has gained importance in the last decade or so in the forensic community. If and when the traditional forensic DNA typing via Short Tandem Repeats (STR) fails due to the absence of a reference sample, an individual can be traced by a DNA sample using FDP. Amongst the many available EVCs, eye color is one such character that can be predicted by employing previously developed IrisPlex system using Single Nucleotide Polymorphism (SNP) assay. In this study, we applied the IrisPlex system to samples collected from population of District Swat for prediction of eye colours from DNA.Method: Eye colour digital photographs and buccal swab samples were collected from 267 Pakhtun individuals of District Swat. Any person with eye disease was excluded from the study. Genomic DNA was extracted through Phenol-Chloroform extraction method. The amplified SNPs were typed using Multiplexed Single Base Extension (SBE). The genotypes were checked for eye color phenotypes through IrisPlex online tool and correlation were checked between SNPs, Gender, pie score and eye color.Result: Brown eye color was found prevalent as compared to intermediate and blue. Females have highly brown eye color compared to males while males have intermediate and blue. Three SNPs rs12913832 (in the HERC2), rs1393350 (TYR gene), rs1800407 (OCA2 gene) were strongly significant to eye color. Pie score was also significant to eye color and rs12913832 SNP. IrisPlex analysis in 20 individuals of District Swat was performed. The prediction accuracy of IrisPlex for blue or brown was 100% in the studied individuals. However, the IrisPlex tool predicted the intermediate phenotype incorrectly as brown or blue.Conclusion: It is concluded from the data that intermediate eye colour was not predicted accurately, therefore, inclusion of more SNPs in the IrisPlex system is needed to predict intermediate eye colour accurately.Keywords: Eye colour, IrisPlex, SNPs, Multiplex genotyping, DNA, District Swa

    Unsupervised Landmark Discovery Using Consistency Guided Bottleneck

    Full text link
    We study a challenging problem of unsupervised discovery of object landmarks. Many recent methods rely on bottlenecks to generate 2D Gaussian heatmaps however, these are limited in generating informed heatmaps while training, presumably due to the lack of effective structural cues. Also, it is assumed that all predicted landmarks are semantically relevant despite having no ground truth supervision. In the current work, we introduce a consistency-guided bottleneck in an image reconstruction-based pipeline that leverages landmark consistency, a measure of compatibility score with the pseudo-ground truth to generate adaptive heatmaps. We propose obtaining pseudo-supervision via forming landmark correspondence across images. The consistency then modulates the uncertainty of the discovered landmarks in the generation of adaptive heatmaps which rank consistent landmarks above their noisy counterparts, providing effective structural information for improved robustness. Evaluations on five diverse datasets including MAFL, AFLW, LS3D, Cats, and Shoes demonstrate excellent performance of the proposed approach compared to the existing state-of-the-art methods. Our code is publicly available at https://github.com/MamonaAwan/CGB_ULD.Comment: Accepted ORAL at BMVC 2023 ; Code: https://github.com/MamonaAwan/CGB_UL

    Prevalence of Diabetic Retinopathy and Correlation with HbA1c in Patients Admitted in Khyber Teaching Hospital Peshawar

    Get PDF
    Objective: To determine the prevalence of diabetic retinopathy in patients admitted in Khyber Teaching Hospital Peshawar and to correlate different stages of diabetic retinopathy with HbA1C levels. Methodology: This cross sectional study was conducted at Department of Ophthalmology, Khyber Teaching Hospital, MTI, Peshawar from December 2019 to May 2020. All patients over the age of 15 years who were diagnosed with diabetes mellitus were included in the study while patients with cataract or retinopathy due to other pathologies were excluded. All diabetic patients were admitted through outpatient department. In the ward their blood pressures were recorded and HbA1c levels were also measured. Visual acuity (VA) was checked. Screening for diabetic retinopathy was done by a consultant ophthalmologist by Optos Ultrawide Field Imaging of retina and Optical Coherence Tomography (OCT) of macula to establish stages of diabetic retinopathy and presence of diabetic macular edema respectively. Results: A total of 103 diabetic patients were included. Their retina was photographed, viewed and analyzed. Diabetic retinopathy, irrespective of the type, was found in 69 patients with a prevalence of 66.9%. Patients with lower ranges of HbA1c (below 6%) showed no evidence of DR. The clustering of majority of patients with diabetic retinopathy with HbA1c levels of 8 to 12 %, showed a significant relationship between high blood sugar levels and severity. Conclusion: In our study the higher frequency of retinopathy is alarming by considering it one of the leading causes of blindness in working class. It is highly recommended that routine ophthalmologic examination may be carried out along with optimal diabetic control
    corecore