188 research outputs found
Multiclass Alignment of Confidence and Certainty for Network Calibration
Deep neural networks (DNNs) have made great strides in pushing the
state-of-the-art in several challenging domains. Recent studies reveal that
they are prone to making overconfident predictions. This greatly reduces the
overall trust in model predictions, especially in safety-critical applications.
Early work in improving model calibration employs post-processing techniques
which rely on limited parameters and require a hold-out set. Some recent
train-time calibration methods, which involve all model parameters, can
outperform the postprocessing methods. To this end, we propose a new train-time
calibration method, which features a simple, plug-and-play auxiliary loss known
as multi-class alignment of predictive mean confidence and predictive certainty
(MACC). It is based on the observation that a model miscalibration is directly
related to its predictive certainty, so a higher gap between the mean
confidence and certainty amounts to a poor calibration both for in-distribution
and out-of-distribution predictions. Armed with this insight, our proposed loss
explicitly encourages a confident (or underconfident) model to also provide a
low (or high) spread in the presoftmax distribution. Extensive experiments on
ten challenging datasets, covering in-domain, out-domain, non-visual
recognition and medical image classification scenarios, show that our method
achieves state-of-the-art calibration performance for both in-domain and
out-domain predictions. Our code and models will be publicly released.Comment: Accepted at GCPR 202
Visual tracking over multiple temporal scales
Visual tracking is the task of repeatedly inferring the state (position, motion, etc.) of the desired target in an image sequence. It is an important scientific problem as humans can visually track targets in a broad range of settings. However, visual tracking algorithms struggle to robustly follow a target in unconstrained scenarios. Among the many challenges faced by visual trackers, two important ones are occlusions and abrupt motion variations. Occlusions take place when (an)other object(s) obscures the camera's view of the tracked target. A target may exhibit abrupt variations in apparent motion due to its own unexpected movement, camera movement, and low frame rate image acquisition. Each of these issues can cause a tracker to lose its target.
This thesis introduces the idea of learning and propagation of tracking information over multiple temporal scales to overcome occlusions and abrupt motion variations. A temporal scale is a specific sequence of moments in time Models (describing appearance and/or motion of the target) can be learned from the target tracking history over multiple temporal scales and applied over multiple temporal scales in the future. With the rise of multiple motion model tracking frameworks, there is a need for a broad range of search methods and ways of selecting between the available motion models.
The potential benefits of learning over multiple temporal scales are first assessed by studying both motion and appearance variations in the ground-truth data associated with several image sequences. A visual tracker operating over multiple temporal scales is then proposed that is capable of handling occlusions and abrupt motion variations.
Experiments are performed to compare the performance of the tracker with competing methods, and to analyze the impact on performance of various elements of the proposed approach. Results reveal a simple, yet general framework for dealing with occlusions and abrupt motion variations. In refining the proposed framework, a search method is generalized for multiple competing hypotheses in visual tracking, and a new motion model selection criterion is proposed
Synergy between face alignment and tracking via Discriminative Global Consensus Optimization
An open question in facial landmark localization in video is whether one should perform tracking or tracking-by-detection (i.e. face alignment). Tracking produces fittings of high accuracy but is prone to drifting. Tracking-by-detection is drift-free but results in low accuracy fittings. To provide a solution to this problem, we describe the very first, to the best of our knowledge, synergistic approach between detection (face alignment) and tracking which completely eliminates drifting from face tracking, and does not merely perform tracking-by-detection. Our first main contribution is to show that one can achieve this synergy between detection and tracking using a principled optimization framework based on the theory of Global Variable Consensus Optimization using ADMM; Our second contribution is to show how the proposed analytic framework can be integrated within state-of-the-art discriminative methods for face alignment and tracking based on cascaded regression and deeply learned features. Overall, we call our method Discriminative Global Consensus Model (DGCM). Our third contribution is to show that DGCM achieves large performance improvement over the currently best performing face tracking methods on the most challenging category of the 300-VW dataset
Visual tracking over multiple temporal scales
Visual tracking is the task of repeatedly inferring the state (position, motion, etc.) of the desired target in an image sequence. It is an important scientific problem as humans can visually track targets in a broad range of settings. However, visual tracking algorithms struggle to robustly follow a target in unconstrained scenarios. Among the many challenges faced by visual trackers, two important ones are occlusions and abrupt motion variations. Occlusions take place when (an)other object(s) obscures the camera's view of the tracked target. A target may exhibit abrupt variations in apparent motion due to its own unexpected movement, camera movement, and low frame rate image acquisition. Each of these issues can cause a tracker to lose its target.
This thesis introduces the idea of learning and propagation of tracking information over multiple temporal scales to overcome occlusions and abrupt motion variations. A temporal scale is a specific sequence of moments in time Models (describing appearance and/or motion of the target) can be learned from the target tracking history over multiple temporal scales and applied over multiple temporal scales in the future. With the rise of multiple motion model tracking frameworks, there is a need for a broad range of search methods and ways of selecting between the available motion models.
The potential benefits of learning over multiple temporal scales are first assessed by studying both motion and appearance variations in the ground-truth data associated with several image sequences. A visual tracker operating over multiple temporal scales is then proposed that is capable of handling occlusions and abrupt motion variations.
Experiments are performed to compare the performance of the tracker with competing methods, and to analyze the impact on performance of various elements of the proposed approach. Results reveal a simple, yet general framework for dealing with occlusions and abrupt motion variations. In refining the proposed framework, a search method is generalized for multiple competing hypotheses in visual tracking, and a new motion model selection criterion is proposed
Generalizing to Unseen Domains in Diabetic Retinopathy Classification
Diabetic retinopathy (DR) is caused by long-standing diabetes and is among
the fifth leading cause for visual impairments. The process of early diagnosis
and treatments could be helpful in curing the disease, however, the detection
procedure is rather challenging and mostly tedious. Therefore, automated
diabetic retinopathy classification using deep learning techniques has gained
interest in the medical imaging community. Akin to several other real-world
applications of deep learning, the typical assumption of i.i.d data is also
violated in DR classification that relies on deep learning. Therefore,
developing DR classification methods robust to unseen distributions is of great
value. In this paper, we study the problem of generalizing a model to unseen
distributions or domains (a.k.a domain generalization) in DR classification. To
this end, we propose a simple and effective domain generalization (DG) approach
that achieves self-distillation in vision transformers (ViT) via a novel
prediction softening mechanism. This prediction softening is an adaptive convex
combination one-hot labels with the model's own knowledge. We perform extensive
experiments on challenging open-source DR classification datasets under both
multi-source and single-source DG settings with three different ViT backbones
to establish the efficacy and applicability of our approach against competing
methods. For the first time, we report the performance of several
state-of-the-art DG methods on open-source DR classification datasets after
conducting thorough experiments. Finally, our method is also capable of
delivering improved calibration performance than other methods, showing its
suitability for safety-critical applications, including healthcare. We hope
that our contributions would investigate more DG research across the medical
imaging community.Comment: Accepted at WACV 202
DNA-based Eye Color Prediction of Pakhtun Population Living in District Swat KP Pakistan
Background: Forensic DNA Phenotyping (FDP) or the prediction of Externally Visible Characteristics (EVCs) from a DNA sample has gained importance in the last decade or so in the forensic community. If and when the traditional forensic DNA typing via Short Tandem Repeats (STR) fails due to the absence of a reference sample, an individual can be traced by a DNA sample using FDP. Amongst the many available EVCs, eye color is one such character that can be predicted by employing previously developed IrisPlex system using Single Nucleotide Polymorphism (SNP) assay. In this study, we applied the IrisPlex system to samples collected from population of District Swat for prediction of eye colours from DNA.Method: Eye colour digital photographs and buccal swab samples were collected from 267 Pakhtun individuals of District Swat. Any person with eye disease was excluded from the study. Genomic DNA was extracted through Phenol-Chloroform extraction method. The amplified SNPs were typed using Multiplexed Single Base Extension (SBE). The genotypes were checked for eye color phenotypes through IrisPlex online tool and correlation were checked between SNPs, Gender, pie score and eye color.Result: Brown eye color was found prevalent as compared to intermediate and blue. Females have highly brown eye color compared to males while males have intermediate and blue. Three SNPs rs12913832 (in the HERC2), rs1393350 (TYR gene), rs1800407 (OCA2 gene) were strongly significant to eye color. Pie score was also significant to eye color and rs12913832 SNP. IrisPlex analysis in 20 individuals of District Swat was performed. The prediction accuracy of IrisPlex for blue or brown was 100% in the studied individuals. However, the IrisPlex tool predicted the intermediate phenotype incorrectly as brown or blue.Conclusion: It is concluded from the data that intermediate eye colour was not predicted accurately, therefore, inclusion of more SNPs in the IrisPlex system is needed to predict intermediate eye colour accurately.Keywords: Eye colour, IrisPlex, SNPs, Multiplex genotyping, DNA, District Swa
Unsupervised Landmark Discovery Using Consistency Guided Bottleneck
We study a challenging problem of unsupervised discovery of object landmarks.
Many recent methods rely on bottlenecks to generate 2D Gaussian heatmaps
however, these are limited in generating informed heatmaps while training,
presumably due to the lack of effective structural cues. Also, it is assumed
that all predicted landmarks are semantically relevant despite having no ground
truth supervision. In the current work, we introduce a consistency-guided
bottleneck in an image reconstruction-based pipeline that leverages landmark
consistency, a measure of compatibility score with the pseudo-ground truth to
generate adaptive heatmaps. We propose obtaining pseudo-supervision via forming
landmark correspondence across images. The consistency then modulates the
uncertainty of the discovered landmarks in the generation of adaptive heatmaps
which rank consistent landmarks above their noisy counterparts, providing
effective structural information for improved robustness. Evaluations on five
diverse datasets including MAFL, AFLW, LS3D, Cats, and Shoes demonstrate
excellent performance of the proposed approach compared to the existing
state-of-the-art methods. Our code is publicly available at
https://github.com/MamonaAwan/CGB_ULD.Comment: Accepted ORAL at BMVC 2023 ; Code:
https://github.com/MamonaAwan/CGB_UL
Prevalence of Diabetic Retinopathy and Correlation with HbA1c in Patients Admitted in Khyber Teaching Hospital Peshawar
Objective: To determine the prevalence of diabetic retinopathy in patients admitted in Khyber Teaching Hospital Peshawar and to correlate different stages of diabetic retinopathy with HbA1C levels.
Methodology: This cross sectional study was conducted at Department of Ophthalmology, Khyber Teaching Hospital, MTI, Peshawar from December 2019 to May 2020. All patients over the age of 15 years who were diagnosed with diabetes mellitus were included in the study while patients with cataract or retinopathy due to other pathologies were excluded. All diabetic patients were admitted through outpatient department. In the ward their blood pressures were recorded and HbA1c levels were also measured. Visual acuity (VA) was checked. Screening for diabetic retinopathy was done by a consultant ophthalmologist by Optos Ultrawide Field Imaging of retina and Optical Coherence Tomography (OCT) of macula to establish stages of diabetic retinopathy and presence of diabetic macular edema respectively.
Results: A total of 103 diabetic patients were included. Their retina was photographed, viewed and analyzed. Diabetic retinopathy, irrespective of the type, was found in 69 patients with a prevalence of 66.9%. Patients with lower ranges of HbA1c (below 6%) showed no evidence of DR. The clustering of majority of patients with diabetic retinopathy with HbA1c levels of 8 to 12 %, showed a significant relationship between high blood sugar levels and severity.
Conclusion: In our study the higher frequency of retinopathy is alarming by considering it one of the leading causes of blindness in working class. It is highly recommended that routine ophthalmologic examination may be carried out along with optimal diabetic control
- …