784 research outputs found

    ODN: Opening the Deep Network for Open-set Action Recognition

    Full text link
    In recent years, the performance of action recognition has been significantly improved with the help of deep neural networks. Most of the existing action recognition works hold the \textit{closed-set} assumption that all action categories are known beforehand while deep networks can be well trained for these categories. However, action recognition in the real world is essentially an \textit{open-set} problem, namely, it is impossible to know all action categories beforehand and consequently infeasible to prepare sufficient training samples for those emerging categories. In this case, applying closed-set recognition methods will definitely lead to unseen-category errors. To address this challenge, we propose the Open Deep Network (ODN) for the open-set action recognition task. Technologically, ODN detects new categories by applying a multi-class triplet thresholding method, and then dynamically reconstructs the classification layer and "opens" the deep network by adding predictors for new categories continually. In order to transfer the learned knowledge to the new category, two novel methods, Emphasis Initialization and Allometry Training, are adopted to initialize and incrementally train the new predictor so that only few samples are needed to fine-tune the model. Extensive experiments show that ODN can effectively detect and recognize new categories with little human intervention, thus applicable to the open-set action recognition tasks in the real world. Moreover, ODN can even achieve comparable performance to some closed-set methods.Comment: 6 pages, 3 figures, ICME 201

    Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing

    Full text link
    The current neuron reconstruction pipeline for electron microscopy (EM) data usually includes automatic image segmentation followed by extensive human expert proofreading. In this work, we aim to reduce human workload by predicting connectivity between over-segmented neuron pieces, taking both microscopy image and 3D morphology features into account, similar to human proofreading workflow. To this end, we first construct a dataset, named FlyTracing, that contains millions of pairwise connections of segments expanding the whole fly brain, which is three orders of magnitude larger than existing datasets for neuron segment connection. To learn sophisticated biological imaging features from the connectivity annotations, we propose a novel connectivity-aware contrastive learning method to generate dense volumetric EM image embedding. The learned embeddings can be easily incorporated with any point or voxel-based morphological representations for automatic neuron tracing. Extensive comparisons of different combination schemes of image and morphological representation in identifying split errors across the whole fly brain demonstrate the superiority of the proposed approach, especially for the locations that contain severe imaging artifacts, such as section missing and misalignment. The dataset and code are available at https://github.com/Levishery/Flywire-Neuron-Tracing.Comment: 9 pages, 6 figures, AAAI 2024 accepte

    Coded Residual Transform for Generalizable Deep Metric Learning

    Full text link
    A fundamental challenge in deep metric learning is the generalization capability of the feature embedding network model since the embedding network learned on training classes need to be evaluated on new test classes. To address this challenge, in this paper, we introduce a new method called coded residual transform (CRT) for deep metric learning to significantly improve its generalization capability. Specifically, we learn a set of diversified prototype features, project the feature map onto each prototype, and then encode its features using their projection residuals weighted by their correlation coefficients with each prototype. The proposed CRT method has the following two unique characteristics. First, it represents and encodes the feature map from a set of complimentary perspectives based on projections onto diversified prototypes. Second, unlike existing transformer-based feature representation approaches which encode the original values of features based on global correlation analysis, the proposed coded residual transform encodes the relative differences between the original features and their projected prototypes. Embedding space density and spectral decay analysis show that this multi-perspective projection onto diversified prototypes and coded residual representation are able to achieve significantly improved generalization capability in metric learning. Finally, to further enhance the generalization performance, we propose to enforce the consistency on their feature similarity matrices between coded residual transforms with different sizes of projection prototypes and embedding dimensions. Our extensive experimental results and ablation studies demonstrate that the proposed CRT method outperform the state-of-the-art deep metric learning methods by large margins and improving upon the current best method by up to 4.28% on the CUB dataset.Comment: Accepted by NeurIPS 202

    STS-TransUNet: Semi-supervised Tooth Segmentation Transformer U-Net for dental panoramic image

    Get PDF
    In this paper, we introduce a novel deep learning method for dental panoramic image segmentation, which is crucial in oral medicine and orthodontics for accurate diagnosis and treatment planning. Traditional methods often fail to effectively combine global and local context, and struggle with unlabeled data, limiting performance in varied clinical settings. We address these issues with an advanced TransUNet architecture, enhancing feature retention and utilization by connecting the input and output layers directly. Our architecture further employs spatial and channel attention mechanisms in the decoder segments for targeted region focus, and deep supervision techniques to overcome the vanishing gradient problem for more efficient training. Additionally, our network includes a self-learning algorithm using unlabeled data, boosting generalization capabilities. Named the Semi-supervised Tooth Segmentation Transformer U-Net (STS-TransUNet), our method demonstrated superior performance on the MICCAI STS-2D dataset, proving its effectiveness and robustness in tooth segmentation tasks

    The efficacy and safety of intra-articular injection of corticosteroids in multimodal analgesic cocktails in total knee arthroplasty—a historically controlled study

    Get PDF
    BackgroundTotal knee arthroplasty (TKA) is a common and effective procedure. Optimizing pain control and reducing postoperative discomfort are essential for patient satisfaction. No studies have examined the safety and efficacy of intra-articular corticosteroid injections following TKA. This study aims to examine the safety and efficacy of corticosteroids in intra-articular multimodal analgesic injections.Materials and methodsThis was a historically controlled study conducted at a single academic institution. Before May 2019, patients received an intra-articular cocktail injection without corticosteroids during surgery, referred to as the non-corticosteroid (NC) group. After June 2019, intraoperatively, patients received an intra-articular cocktail injection containing corticosteroids, referred to as the corticosteroid (C) group. Finally, 738 patients were evaluated, 370 in the C cohort and 368 in the NC cohort. The mean follow-up duration was 30.4 months for the C group and 48.4 months for the NC group.ResultsThe mean VAS scores at rest on postoperative day (POD) 1 (2.35) and POD3 (3.88) were significantly lower in the C group than those in the NC group, which were 2.86 (POD1) and 5.26 (POD3) (p < 0.05). Walking pain in the C group (4.42) was also significantly lower than that (5.96) in the NC group on POD3 (p < 0.05). Patients in the C group had a significantly higher mean range of motion (ROM) (92.55) on POD3 than that (86.38) in the NC group. The mean time to straight leg raise for group C (2.77) was significantly shorter than that (3.61) for the NC group (p < 0.05). The C group also had significantly fewer rescue morphine (1.9) and metoclopramide (0.21) uses per patient than the NC group, which were 3.1 and 0.24, respectively. No significant differences in fever or vomiting rates between groups were found. Patients in neither group developed periprosthetic joint infections or skin necrosis. One patient in the C group suffered from wound dehiscence, and the wound healed well after debridement. No patient died or had a re-operation in either group.ConclusionsThis pilot trial found that intra-articular injection of multimodal analgesia (including corticosteroids) reduced initial postoperative pain, increased ROM in the early postoperative days (up to POD3), and did not increase wound complications or infection rates in approximately 30 months of follow-up

    Exploring Contextual Relationships for Cervical Abnormal Cell Detection

    Full text link
    Cervical abnormal cell detection is a challenging task as the morphological discrepancies between abnormal and normal cells are usually subtle. To determine whether a cervical cell is normal or abnormal, cytopathologists always take surrounding cells as references to identify its abnormality. To mimic these behaviors, we propose to explore contextual relationships to boost the performance of cervical abnormal cell detection. Specifically, both contextual relationships between cells and cell-to-global images are exploited to enhance features of each region of interest (RoI) proposals. Accordingly, two modules, dubbed as RoI-relationship attention module (RRAM) and global RoI attention module (GRAM), are developed and their combination strategies are also investigated. We establish a strong baseline by using Double-Head Faster R-CNN with feature pyramid network (FPN) and integrate our RRAM and GRAM into it to validate the effectiveness of the proposed modules. Experiments conducted on a large cervical cell detection dataset reveal that the introduction of RRAM and GRAM both achieves better average precision (AP) than the baseline methods. Moreover, when cascading RRAM and GRAM, our method outperforms the state-of-the-art (SOTA) methods. Furthermore, we also show the proposed feature enhancing scheme can facilitate both image-level and smear-level classification. The code and trained models are publicly available at https://github.com/CVIU-CSU/CR4CACD.Comment: 10 pages, 14 tables, and 3 figure
    • …
    corecore