144 research outputs found

    MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection

    Full text link
    We propose an extremely simple and highly effective approach to faithfully combine different object detectors to obtain a Mixture of Experts (MoE) that has a superior accuracy to the individual experts in the mixture. We find that naively combining these experts in a similar way to the well-known Deep Ensembles (DEs), does not result in an effective MoE. We identify the incompatibility between the confidence score distribution of different detectors to be the primary reason for such failure cases. Therefore, to construct the MoE, our proposal is to first calibrate each individual detector against a target calibration function. Then, filter and refine all the predictions from different detectors in the mixture. We term this approach as MoCaE and demonstrate its effectiveness through extensive experiments on object detection, instance segmentation and rotated object detection tasks. Specifically, MoCaE improves (i) three strong object detectors on COCO test-dev by 2.42.4 AP\mathrm{AP} by reaching 59.059.0 AP\mathrm{AP}; (ii) instance segmentation methods on the challenging long-tailed LVIS dataset by 2.32.3 AP\mathrm{AP}; and (iii) all existing rotated object detectors by reaching 82.6282.62 AP50\mathrm{AP_{50}} on DOTA dataset, establishing a new state-of-the-art (SOTA). Code will be made public

    Localization Recall Precision (LRP): A New Performance Metric for Object Detection

    Get PDF
    Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose 'Localization Recall Precision (LRP) Error', a new metric which we specifically designed for object detection. LRP Error is composed of three components related to localization, false negative (FN) rate and false positive (FP) rate. Based on LRP, we introduce the 'Optimal LRP', the minimum achievable LRP error representing the best achievable configuration of the detector in terms of recall-precision and the tightness of the boxes. In contrast to AP, which considers precisions over the entire recall domain, Optimal LRP determines the 'best' confidence score threshold for a class, which balances the trade-off between localization and recall-precision. In our experiments, we show that, for state-of-the-art object (SOTA) detectors, Optimal LRP provides richer and more discriminative information than AP. We also demonstrate that the best confidence score thresholds vary significantly among classes and detectors. Moreover, we present LRP results of a simple online video object detector which uses a SOTA still image object detector and show that the class-specific optimized thresholds increase the accuracy against the common approach of using a general threshold for all classes. At https://github.com/cancam/LRP we provide the source code that can compute LRP for the PASCAL VOC and MSCOCO datasets. Our source code can easily be adapted to other datasets as well.Comment: to appear in ECCV 201

    Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation

    Full text link
    Referring Image Segmentation (RIS) - the problem of identifying objects in images through natural language sentences - is a challenging task currently mostly solved through supervised learning. However, while collecting referred annotation masks is a time-consuming process, the few existing weakly-supervised and zero-shot approaches fall significantly short in performance compared to fully-supervised learning ones. To bridge the performance gap without mask annotations, we propose a novel weakly-supervised framework that tackles RIS by decomposing it into three steps: obtaining instance masks for the object mentioned in the referencing instruction (segment), using zero-shot learning to select a potentially correct mask for the given instruction (select), and bootstrapping a model which allows for fixing the mistakes of zero-shot selection (correct). In our experiments, using only the first two steps (zero-shot segment and select) outperforms other zero-shot baselines by as much as 19%, while our full method improves upon this much stronger baseline and sets the new state-of-the-art for weakly-supervised RIS, reducing the gap between the weakly-supervised and fully-supervised methods in some cases from around 33% to as little as 14%. Code is available at https://github.com/fgirbal/segment-select-correct

    Learning associations between clinical information and motion-based descriptors using a large scale MR-derived cardiac motion atlas

    Full text link
    The availability of large scale databases containing imaging and non-imaging data, such as the UK Biobank, represents an opportunity to improve our understanding of healthy and diseased bodily function. Cardiac motion atlases provide a space of reference in which the motion fields of a cohort of subjects can be directly compared. In this work, a cardiac motion atlas is built from cine MR data from the UK Biobank (~ 6000 subjects). Two automated quality control strategies are proposed to reject subjects with insufficient image quality. Based on the atlas, three dimensionality reduction algorithms are evaluated to learn data-driven cardiac motion descriptors, and statistical methods used to study the association between these descriptors and non-imaging data. Results show a positive correlation between the atlas motion descriptors and body fat percentage, basal metabolic rate, hypertension, smoking status and alcohol intake frequency. The proposed method outperforms the ability to identify changes in cardiac function due to these known cardiovascular risk factors compared to ejection fraction, the most commonly used descriptor of cardiac function. In conclusion, this work represents a framework for further investigation of the factors influencing cardiac health.Comment: 2018 International Workshop on Statistical Atlases and Computational Modeling of the Hear

    What makes and breaks safety fine-tuning? a mechanistic study

    Get PDF
    Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation framework that captures salient aspects of an unsafe input by modeling the interaction between the task the model is asked to perform (e.g., "design") versus the specific concepts the task is asked to be performed upon (e.g., a "cycle" vs. a "bomb"). Using this, we investigate three well-known safety fine-tuning methods---supervised safety fine-tuning, direct preference optimization, and unlearning---and provide significant evidence demonstrating that these methods minimally transform MLP weights to specifically align unsafe inputs into its weights' null space. This yields a clustering of inputs based on whether the model deems them safe or not. Correspondingly, when an adversarial input (e.g., a jailbreak) is provided, its activations are closer to safer samples, leading to the model processing such an input as if it were safe. We validate our findings, wherever possible, on real-world models---specifically, Llama-2 7B and Llama-3 8B

    Localization recall precision (LRP): A new performance metric for object detection

    Get PDF
    Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose “Localization Recall Precision (LRP) Error”, a new metric specifically designed for object detection. LRP Error is composed of three components related to localization, false negative (FN) rate and false positive (FP) rate. Based on LRP, we introduce the “Optimal LRP” (oLRP), the minimum achievable LRP error representing the best achievable configuration of the detector in terms of recall-precision and the tightness of the boxes. In contrast to AP, which considers precisions over the entire recall domain, oLRP determines the “best” confidence score threshold for a class, which balances the trade-off between localization and recall-precision. In our experiments, we show that oLRP provides richer and more discriminative information than AP. We also demonstrate that the best confidence score thresholds vary significantly among classes and detectors. Moreover, we present LRP results of a simple online video object detector and show that the class-specific optimized thresholds increase the accuracy against the common approach of using a general threshold for all classes. Our experiments demonstrate that LRP is more competent than AP in capturing the performance of detectors. Our source code for PASCAL VOC AND MSCOCO datasets are provided at https://github.com/cancam/LRP

    Comparative efficacy of topical tetraVisc versus lidocaine gel in cataract surgery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To compare the clinical efficacy of lidocaine 2% with tetracaine 0.5% for cataract surgery.</p> <p>Methods</p> <p>In a randomized, multi-surgeon, controlled clinical trial,122 consecutive cataract cases eligible for topical anesthesia, were randomly assigned to receive lidocaine 2% gel (1 ml) or tetracaine solution 0.5% (TetraVisc, 0.5 ml) before clear corneal phacoemulsification. Main outcome measure was visual analog scale (0 to 10), which was used to measure intra-operative pain. Secondary outcome measures included patients' discomfort due to tissue manipulation and surgeon graded patients' cooperation. Duration of surgery and intra-operative complications were also recorded.</p> <p>Results</p> <p>The mean age in TetraVisc (TV) group was 70.4 years and in the lidocaine gel group (LG) it was 70.6 years (p = 0.89). Patient reported mean intra-operative pain scores by visual analog scale were 0.70 ± 0.31 in TV group and 1.8 ± 0.4 in LG group (<it>P </it>< 0.001). Mean patient cooperation was also marginally better in the TV group (8.3 ± 0.3) compared to LG group (8.4 ± 0.6) (P = 0.25). 96% of patients in TV group showed intra-operative corneal clarity compared to 91% in LG group. TV group had less (1 out of 61 patients, 1.6%) intra-operative complications than LG group (3 out of 61 patients, 4.8%). No anesthesia related complications were noted in either group</p> <p>Conclusion</p> <p>Topical TetraVisc solution was superior to lidocaine 2% gel for pain control in patients undergoing clear corneal phacoemulsification. Lidocaine 2% gel is similar to TetraVisc in patient comfort and surgeon satisfaction.</p> <p>Trial Registration</p> <p><b>Clinical trials number</b>: ISRCTN78374774</p

    Second-look PET-CT following an initial incomplete PET-CT response to (chemo)radiotherapy for head and neck squamous cell carcinoma

    Get PDF
    OBJECTIVES: The limited positive predictive value of an incomplete response on PET-CT following (chemo)radiotherapy for head and neck squamous cell carcinoma (HNSCC) means that the optimal management strategy remains uncertain. The aim of the study is to assess the utility of a 'second-look' interval PET-CT. METHODS: Patients with HNSCC who were treated with (chemo)radiotherapy between 2008 and 2017 and underwent (i) baseline and (ii) response assessment PET-CT and (iii) second-look PET-CT following incomplete (positive or equivocal scan) response were included. Endpoints were conversion rate to complete response (CR) and test characteristics of the second-look PET-CT. RESULTS: Five hundred sixty-two patients with HNSCC underwent response assessment PET-CT at a median of 17 weeks post-radiotherapy. Following an incomplete response on PET-CT, 40 patients underwent a second-look PET-CT at a median of 13 weeks (range 6-25) from the first response PET-CT. Thirty-four out of 40 (85%) patients had oropharyngeal carcinoma. Twenty-four out of 40 (60%) second-look PET-CT scans converted to a complete locoregional response. The primary tumour conversion rate was 15/27 (56%) and the lymph node conversion rate was 14/19 (74%). The sensitivity, specificity, positive predictive value and negative predictive value (NPV) of the second-look PET-CT were 75%, 75%, 25% and 96% for the primary tumour and 100%, 92%, 40% and 100% for lymph nodes. There were no cases of progression following conversion to CR in the primary site or lymph nodes. CONCLUSIONS: The majority of patients who undergo a second-look PET-CT convert to a CR. The NPV of a second-look PET-CT is high, suggesting the potential to avoid surgical intervention. KEY POINTS: • PET-CT is a useful tool for response assessment following (chemo)radiotherapy for head and neck squamous cell carcinoma. • An incomplete response on PET-CT has a limited positive predictive value and optimal management is uncertain. • These data show that with a 'second-look' interval PET-CT, the majority of patients convert to a complete metabolic response. When there is doubt about clinical and radiological response, a 'second-look' PET-CT can be used to spare patients unnecessary surgical intervention

    Mediator Condensates Localize Signaling Factors to Key Cell Identity Genes

    Get PDF
    The gene expression programs that define the identity of each cell are controlled by master transcription factors (TFs) that bind cell-type-specific enhancers, as well as signaling factors, which bring extracellular stimuli to these enhancers. Recent studies have revealed that master TFs form phase-separated condensates with the Mediator coactivator at super-enhancers. Here, we present evidence that signaling factors for the WNT, TGF-β, and JAK/STAT pathways use their intrinsically disordered regions (IDRs) to enter and concentrate in Mediator condensates at super-enhancers. We show that the WNT coactivator β-catenin interacts both with components of condensates and DNA-binding factors to selectively occupy super-enhancer-associated genes. We propose that the cell-type specificity of the response to signaling is mediated in part by the IDRs of the signaling factors, which cause these factors to partition into condensates established by the master TFs and Mediator at genes with prominent roles in cell identity
    • …
    corecore