784 research outputs found
ODN: Opening the Deep Network for Open-set Action Recognition
In recent years, the performance of action recognition has been significantly
improved with the help of deep neural networks. Most of the existing action
recognition works hold the \textit{closed-set} assumption that all action
categories are known beforehand while deep networks can be well trained for
these categories. However, action recognition in the real world is essentially
an \textit{open-set} problem, namely, it is impossible to know all action
categories beforehand and consequently infeasible to prepare sufficient
training samples for those emerging categories. In this case, applying
closed-set recognition methods will definitely lead to unseen-category errors.
To address this challenge, we propose the Open Deep Network (ODN) for the
open-set action recognition task. Technologically, ODN detects new categories
by applying a multi-class triplet thresholding method, and then dynamically
reconstructs the classification layer and "opens" the deep network by adding
predictors for new categories continually. In order to transfer the learned
knowledge to the new category, two novel methods, Emphasis Initialization and
Allometry Training, are adopted to initialize and incrementally train the new
predictor so that only few samples are needed to fine-tune the model. Extensive
experiments show that ODN can effectively detect and recognize new categories
with little human intervention, thus applicable to the open-set action
recognition tasks in the real world. Moreover, ODN can even achieve comparable
performance to some closed-set methods.Comment: 6 pages, 3 figures, ICME 201
Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing
The current neuron reconstruction pipeline for electron microscopy (EM) data
usually includes automatic image segmentation followed by extensive human
expert proofreading. In this work, we aim to reduce human workload by
predicting connectivity between over-segmented neuron pieces, taking both
microscopy image and 3D morphology features into account, similar to human
proofreading workflow. To this end, we first construct a dataset, named
FlyTracing, that contains millions of pairwise connections of segments
expanding the whole fly brain, which is three orders of magnitude larger than
existing datasets for neuron segment connection. To learn sophisticated
biological imaging features from the connectivity annotations, we propose a
novel connectivity-aware contrastive learning method to generate dense
volumetric EM image embedding. The learned embeddings can be easily
incorporated with any point or voxel-based morphological representations for
automatic neuron tracing. Extensive comparisons of different combination
schemes of image and morphological representation in identifying split errors
across the whole fly brain demonstrate the superiority of the proposed
approach, especially for the locations that contain severe imaging artifacts,
such as section missing and misalignment. The dataset and code are available at
https://github.com/Levishery/Flywire-Neuron-Tracing.Comment: 9 pages, 6 figures, AAAI 2024 accepte
Coded Residual Transform for Generalizable Deep Metric Learning
A fundamental challenge in deep metric learning is the generalization
capability of the feature embedding network model since the embedding network
learned on training classes need to be evaluated on new test classes. To
address this challenge, in this paper, we introduce a new method called coded
residual transform (CRT) for deep metric learning to significantly improve its
generalization capability. Specifically, we learn a set of diversified
prototype features, project the feature map onto each prototype, and then
encode its features using their projection residuals weighted by their
correlation coefficients with each prototype. The proposed CRT method has the
following two unique characteristics. First, it represents and encodes the
feature map from a set of complimentary perspectives based on projections onto
diversified prototypes. Second, unlike existing transformer-based feature
representation approaches which encode the original values of features based on
global correlation analysis, the proposed coded residual transform encodes the
relative differences between the original features and their projected
prototypes. Embedding space density and spectral decay analysis show that this
multi-perspective projection onto diversified prototypes and coded residual
representation are able to achieve significantly improved generalization
capability in metric learning. Finally, to further enhance the generalization
performance, we propose to enforce the consistency on their feature similarity
matrices between coded residual transforms with different sizes of projection
prototypes and embedding dimensions. Our extensive experimental results and
ablation studies demonstrate that the proposed CRT method outperform the
state-of-the-art deep metric learning methods by large margins and improving
upon the current best method by up to 4.28% on the CUB dataset.Comment: Accepted by NeurIPS 202
STS-TransUNet: Semi-supervised Tooth Segmentation Transformer U-Net for dental panoramic image
In this paper, we introduce a novel deep learning method for dental panoramic image segmentation, which is crucial in oral medicine and orthodontics for accurate diagnosis and treatment planning. Traditional methods often fail to effectively combine global and local context, and struggle with unlabeled data, limiting performance in varied clinical settings. We address these issues with an advanced TransUNet architecture, enhancing feature retention and utilization by connecting the input and output layers directly. Our architecture further employs spatial and channel attention mechanisms in the decoder segments for targeted region focus, and deep supervision techniques to overcome the vanishing gradient problem for more efficient training. Additionally, our network includes a self-learning algorithm using unlabeled data, boosting generalization capabilities. Named the Semi-supervised Tooth Segmentation Transformer U-Net (STS-TransUNet), our method demonstrated superior performance on the MICCAI STS-2D dataset, proving its effectiveness and robustness in tooth segmentation tasks
The efficacy and safety of intra-articular injection of corticosteroids in multimodal analgesic cocktails in total knee arthroplasty—a historically controlled study
BackgroundTotal knee arthroplasty (TKA) is a common and effective procedure. Optimizing pain control and reducing postoperative discomfort are essential for patient satisfaction. No studies have examined the safety and efficacy of intra-articular corticosteroid injections following TKA. This study aims to examine the safety and efficacy of corticosteroids in intra-articular multimodal analgesic injections.Materials and methodsThis was a historically controlled study conducted at a single academic institution. Before May 2019, patients received an intra-articular cocktail injection without corticosteroids during surgery, referred to as the non-corticosteroid (NC) group. After June 2019, intraoperatively, patients received an intra-articular cocktail injection containing corticosteroids, referred to as the corticosteroid (C) group. Finally, 738 patients were evaluated, 370 in the C cohort and 368 in the NC cohort. The mean follow-up duration was 30.4 months for the C group and 48.4 months for the NC group.ResultsThe mean VAS scores at rest on postoperative day (POD) 1 (2.35) and POD3 (3.88) were significantly lower in the C group than those in the NC group, which were 2.86 (POD1) and 5.26 (POD3) (p < 0.05). Walking pain in the C group (4.42) was also significantly lower than that (5.96) in the NC group on POD3 (p < 0.05). Patients in the C group had a significantly higher mean range of motion (ROM) (92.55) on POD3 than that (86.38) in the NC group. The mean time to straight leg raise for group C (2.77) was significantly shorter than that (3.61) for the NC group (p < 0.05). The C group also had significantly fewer rescue morphine (1.9) and metoclopramide (0.21) uses per patient than the NC group, which were 3.1 and 0.24, respectively. No significant differences in fever or vomiting rates between groups were found. Patients in neither group developed periprosthetic joint infections or skin necrosis. One patient in the C group suffered from wound dehiscence, and the wound healed well after debridement. No patient died or had a re-operation in either group.ConclusionsThis pilot trial found that intra-articular injection of multimodal analgesia (including corticosteroids) reduced initial postoperative pain, increased ROM in the early postoperative days (up to POD3), and did not increase wound complications or infection rates in approximately 30 months of follow-up
Exploring Contextual Relationships for Cervical Abnormal Cell Detection
Cervical abnormal cell detection is a challenging task as the morphological
discrepancies between abnormal and normal cells are usually subtle. To
determine whether a cervical cell is normal or abnormal, cytopathologists
always take surrounding cells as references to identify its abnormality. To
mimic these behaviors, we propose to explore contextual relationships to boost
the performance of cervical abnormal cell detection. Specifically, both
contextual relationships between cells and cell-to-global images are exploited
to enhance features of each region of interest (RoI) proposals. Accordingly,
two modules, dubbed as RoI-relationship attention module (RRAM) and global RoI
attention module (GRAM), are developed and their combination strategies are
also investigated. We establish a strong baseline by using Double-Head Faster
R-CNN with feature pyramid network (FPN) and integrate our RRAM and GRAM into
it to validate the effectiveness of the proposed modules. Experiments conducted
on a large cervical cell detection dataset reveal that the introduction of RRAM
and GRAM both achieves better average precision (AP) than the baseline methods.
Moreover, when cascading RRAM and GRAM, our method outperforms the
state-of-the-art (SOTA) methods. Furthermore, we also show the proposed feature
enhancing scheme can facilitate both image-level and smear-level
classification. The code and trained models are publicly available at
https://github.com/CVIU-CSU/CR4CACD.Comment: 10 pages, 14 tables, and 3 figure
- …