98 research outputs found
AdaCompress: Adaptive Compression for Online Computer Vision Services
With the growth of computer vision based applications and services, an
explosive amount of images have been uploaded to cloud servers which host such
computer vision algorithms, usually in the form of deep learning models. JPEG
has been used as the {\em de facto} compression and encapsulation method before
one uploads the images, due to its wide adaptation. However, standard JPEG
configuration does not always perform well for compressing images that are to
be processed by a deep learning model, e.g., the standard quality level of JPEG
leads to 50\% of size overhead (compared with the best quality level selection)
on ImageNet under the same inference accuracy in popular computer vision models
including InceptionNet, ResNet, etc. Knowing this, designing a better JPEG
configuration for online computer vision services is still extremely
challenging: 1) Cloud-based computer vision models are usually a black box to
end-users; thus it is difficult to design JPEG configuration without knowing
their model structures. 2) JPEG configuration has to change when different
users use it. In this paper, we propose a reinforcement learning based JPEG
configuration framework. In particular, we design an agent that adaptively
chooses the compression level according to the input image's features and
backend deep learning models. Then we train the agent in a reinforcement
learning way to adapt it for different deep learning cloud services that act as
the {\em interactive training environment} and feeding a reward with
comprehensive consideration of accuracy and data size. In our real-world
evaluation on Amazon Rekognition, Face++ and Baidu Vision, our approach can
reduce the size of images by 1/2 -- 1/3 while the overall classification
accuracy only decreases slightly.Comment: ACM Multimedi
SCPAT-GAN: Structural Constrained and Pathology Aware Convolutional Transformer-GAN for Virtual Histology Staining of Human Coronary OCT images
There is a significant need for the generation of virtual histological
information from coronary optical coherence tomography (OCT) images to better
guide the treatment of coronary artery disease. However, existing methods
either require a large pixel-wisely paired training dataset or have limited
capability to map pathological regions. To address these issues, we proposed a
structural constrained, pathology aware, transformer generative adversarial
network, namely SCPAT-GAN, to generate virtual stained H&E histology from OCT
images. The proposed SCPAT-GAN advances existing methods via a novel design to
impose pathological guidance on structural layers using transformer-based
network.Comment: 9 pages, 4 figure
Frequency-aware optical coherence tomography image super-resolution via conditional generative adversarial neural network
Optical coherence tomography (OCT) has stimulated a wide range of medical
image-based diagnosis and treatment in fields such as cardiology and
ophthalmology. Such applications can be further facilitated by deep
learning-based super-resolution technology, which improves the capability of
resolving morphological structures. However, existing deep learning-based
method only focuses on spatial distribution and disregard frequency fidelity in
image reconstruction, leading to a frequency bias. To overcome this limitation,
we propose a frequency-aware super-resolution framework that integrates three
critical frequency-based modules (i.e., frequency transformation, frequency
skip connection, and frequency alignment) and frequency-based loss function
into a conditional generative adversarial network (cGAN). We conducted a
large-scale quantitative study from an existing coronary OCT dataset to
demonstrate the superiority of our proposed framework over existing deep
learning frameworks. In addition, we confirmed the generalizability of our
framework by applying it to fish corneal images and rat retinal images,
demonstrating its capability to super-resolve morphological details in eye
imaging.Comment: 13 pages, 7 figures, submitted to Biomedical Optics Express special
issu
WGIT*: Workspace-Guided Informed Tree for Motion Planning in Restricted Environments
The motion planning of robots faces formidable challenges in restricted environments, particularly in the aspects of rapidly searching feasible solutions and converging towards optimal solutions. This paper proposes Workspace-guided Informed Tree (WGIT*) to improve planning efficiency and ensure high-quality solutions in restricted environments. Specifically, WGIT* preprocesses the workspace by constructing a hierarchical structure to obtain critical restricted regions and connectivity information sequentially. The refined workspace information guides the sampling and exploration of WGIT*, increasing the sample density in restricted areas and prioritizing the search tree exploration in promising directions, respectively. Furthermore, WGIT* utilizes gradually enriched configuration space information as feedback to rectify the guidance from the workspace and balance the information of the two spaces, which leads to efficient convergence toward the optimal solution. The theoretical analysis highlights the valuable properties of the proposed WGIT*. Finally, a series of simulations and experiments verify the ability of WGIT* to quickly find initial solutions and converge towards optimal solutions
WGIT*: Workspace-Guided Informed Tree for Motion Planning in Restricted Environments
The motion planning of robots faces formidable challenges in restricted environments, particularly in the aspects of rapidly searching feasible solutions and converging towards optimal solutions. This paper proposes Workspace-guided Informed Tree (WGIT*) to improve planning efficiency and ensure high-quality solutions in restricted environments. Specifically, WGIT* preprocesses the workspace by constructing a hierarchical structure to obtain critical restricted regions and connectivity information sequentially. The refined workspace information guides the sampling and exploration of WGIT*, increasing the sample density in restricted areas and prioritizing the search tree exploration in promising directions, respectively. Furthermore, WGIT* utilizes gradually enriched configuration space information as feedback to rectify the guidance from the workspace and balance the information of the two spaces, which leads to efficient convergence toward the optimal solution. The theoretical analysis highlights the valuable properties of the proposed WGIT*. Finally, a series of simulations and experiments verify the ability of WGIT* to quickly find initial solutions and converge towards optimal solutions
Push the Boundary of SAM: A Pseudo-label Correction Framework for Medical Segmentation
Segment anything model (SAM) has emerged as the leading approach for
zero-shot learning in segmentation, offering the advantage of avoiding
pixel-wise annotation. It is particularly appealing in medical image
segmentation where annotation is laborious and expertise-demanding. However,
the direct application of SAM often yields inferior results compared to
conventional fully supervised segmentation networks. While using SAM generated
pseudo label could also benefit the training of fully supervised segmentation,
the performance is limited by the quality of pseudo labels. In this paper, we
propose a novel label corruption to push the boundary of SAM-based
segmentation. Our model utilizes a novel noise detection module to distinguish
between noisy labels from clean labels. This enables us to correct the noisy
labels using an uncertainty-based self-correction module, thereby enriching the
clean training set. Finally, we retrain the network with updated labels to
optimize its weights for future predictions. One key advantage of our model is
its ability to train deep networks using SAM-generated pseudo labels without
relying on a subset of expert-level annotations. We demonstrate the
effectiveness of our proposed model on both X-ray and lung CT datasets,
indicating its ability to improve segmentation accuracy and outperform baseline
methods in label correction
Deep Learning based 3D Segmentation: A Survey
3D object segmentation is a fundamental and challenging problem in computer
vision with applications in autonomous driving, robotics, augmented reality and
medical image analysis. It has received significant attention from the computer
vision, graphics and machine learning communities. Traditionally, 3D
segmentation was performed with hand-crafted features and engineered methods
which failed to achieve acceptable accuracy and could not generalize to
large-scale data. Driven by their great success in 2D computer vision, deep
learning techniques have recently become the tool of choice for 3D segmentation
tasks as well. This has led to an influx of a large number of methods in the
literature that have been evaluated on different benchmark datasets. This paper
provides a comprehensive survey of recent progress in deep learning based 3D
segmentation covering over 150 papers. It summarizes the most commonly used
pipelines, discusses their highlights and shortcomings, and analyzes the
competitive results of these segmentation methods. Based on the analysis, it
also provides promising research directions for the future.Comment: Under review of ACM Computing Surveys, 36 pages, 10 tables, 9 figure
First Place Solution to the CVPR'2023 AQTC Challenge: A Function-Interaction Centric Approach with Spatiotemporal Visual-Language Alignment
Affordance-Centric Question-driven Task Completion (AQTC) has been proposed
to acquire knowledge from videos to furnish users with comprehensive and
systematic instructions. However, existing methods have hitherto neglected the
necessity of aligning spatiotemporal visual and linguistic signals, as well as
the crucial interactional information between humans and objects. To tackle
these limitations, we propose to combine large-scale pre-trained
vision-language and video-language models, which serve to contribute stable and
reliable multimodal data and facilitate effective spatiotemporal visual-textual
alignment. Additionally, a novel hand-object-interaction (HOI) aggregation
module is proposed which aids in capturing human-object interaction
information, thereby further augmenting the capacity to understand the
presented scenario. Our method achieved first place in the CVPR'2023 AQTC
Challenge, with a Recall@1 score of 78.7\%. The code is available at
https://github.com/tomchen-ctj/CVPR23-LOVEU-AQTC.Comment: Winner of CVPR2023 Long-form Video Understanding and Generation
Challenge (Track 3
- …