258 research outputs found
Camouflaged Object Detection with Feature Grafting and Distractor Aware
The task of Camouflaged Object Detection (COD) aims to accurately segment
camouflaged objects that integrated into the environment, which is more
challenging than ordinary detection as the texture between the target and
background is visually indistinguishable. In this paper, we proposed a novel
Feature Grafting and Distractor Aware network (FDNet) to handle the COD task.
Specifically, we use CNN and Transformer to encode multi-scale images in
parallel. In order to better explore the advantages of the two encoders, we
design a cross-attention-based Feature Grafting Module to graft features
extracted from Transformer branch into CNN branch, after which the features are
aggregated in the Feature Fusion Module. A Distractor Aware Module is designed
to explicitly model the two possible distractors in the COD task to refine the
coarse camouflage map. We also proposed the largest artificial camouflaged
object dataset which contains 2000 images with annotations, named ACOD2K. We
conducted extensive experiments on four widely used benchmark datasets and the
ACOD2K dataset. The results show that our method significantly outperforms
other state-of-the-art methods. The code and the ACOD2K will be available at
https://github.com/syxvision/FDNet.Comment: ICME2023 pape
HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA
As language model agents leveraging external tools rapidly evolve,
significant progress has been made in question-answering(QA) methodologies
utilizing supplementary documents and the Retrieval-Augmented Generation (RAG)
approach. This advancement has improved the response quality of language models
and alleviates the appearance of hallucination. However, these methods exhibit
limited retrieval accuracy when faced with massive indistinguishable documents,
presenting notable challenges in their practical application. In response to
these emerging challenges, we present HiQA, an advanced framework for
multi-document question-answering (MDQA) that integrates cascading metadata
into content as well as a multi-route retrieval mechanism. We also release a
benchmark called MasQA to evaluate and research in MDQA. Finally, HiQA
demonstrates the state-of-the-art performance in multi-document environments
Robust Calibrate Proxy Loss for Deep Metric Learning
The mainstream researche in deep metric learning can be divided into two
genres: proxy-based and pair-based methods. Proxy-based methods have attracted
extensive attention due to the lower training complexity and fast network
convergence. However, these methods have limitations as the poxy optimization
is done by network, which makes it challenging for the proxy to accurately
represent the feature distrubtion of the real class of data. In this paper, we
propose a Calibrate Proxy (CP) structure, which uses the real sample
information to improve the similarity calculation in proxy-based loss and
introduces a calibration loss to constraint the proxy optimization towards the
center of the class features. At the same time, we set a small number of
proxies for each class to alleviate the impact of intra-class differences on
retrieval performance. The effectiveness of our method is evaluated by
extensive experiments on three public datasets and multiple synthetic
label-noise datasets. The results show that our approach can effectively
improve the performance of commonly used proxy-based losses on both regular and
noisy datasets
Prompt Learning with Optimal Transport for Vision-Language Models
With the increasing attention to large vision-language models such as CLIP,
there has been a significant amount of effort dedicated to building efficient
prompts. Unlike conventional methods of only learning one single prompt, we
propose to learn multiple comprehensive prompts to describe diverse
characteristics of categories such as intrinsic attributes or extrinsic
contexts. However, directly matching each prompt to the same visual feature is
problematic, as it pushes the prompts to converge to one point. To solve this
problem, we propose to apply optimal transport to match the vision and text
modalities. Specifically, we first model images and the categories with visual
and textual feature sets. Then, we apply a two-stage optimization strategy to
learn the prompts. In the inner loop, we optimize the optimal transport
distance to align visual features and prompts by the Sinkhorn algorithm, while
in the outer loop, we learn the prompts by this distance from the supervised
data. Extensive experiments are conducted on the few-shot recognition task and
the improvement demonstrates the superiority of our method
Advances in reprogramming of energy metabolism in tumor T cells
Cancer is a leading cause of human death worldwide, and the modulation of the metabolic properties of T cells employed in cancer immunotherapy holds great promise for combating cancer. As a crucial factor, energy metabolism influences the activation, proliferation, and function of T cells, and thus metabolic reprogramming of T cells is a unique research perspective in cancer immunology. Special conditions within the tumor microenvironment and high-energy demands lead to alterations in the energy metabolism of T cells. In-depth research on the reprogramming of energy metabolism in T cells can reveal the mechanisms underlying tumor immune tolerance and provide important clues for the development of new tumor immunotherapy strategies as well. Therefore, the study of T cell energy metabolism has important clinical significance and potential applications. In the study, the current achievements in the reprogramming of T cell energy metabolism were reviewed. Then, the influencing factors associated with T cell energy metabolism were introduced. In addition, T cell energy metabolism in cancer immunotherapy was summarized, which highlighted its potential significance in enhancing T cell function and therapeutic outcomes. In summary, energy exhaustion of T cells leads to functional exhaustion, thus resulting in immune evasion by cancer cells. A better understanding of reprogramming of T cell energy metabolism may enable immunotherapy to combat cancer and holds promise for optimizing and enhancing existing therapeutic approaches
Tactile-based Object Retrieval From Granular Media
We introduce GEOTACT, a robotic manipulation method capable of retrieving
objects buried in granular media. This is a challenging task due to the need to
interact with granular media, and doing so based exclusively on tactile
feedback, since a buried object can be completely hidden from vision. Tactile
feedback is in itself challenging in this context, due to ubiquitous contact
with the surrounding media, and the inherent noise level induced by the tactile
readings. To address these challenges, we use a learning method trained
end-to-end with simulated sensor noise. We show that our problem formulation
leads to the natural emergence of learned pushing behaviors that the
manipulator uses to reduce uncertainty and funnel the object to a stable grasp
despite spurious and noisy tactile readings. We also introduce a training
curriculum that enables learning these behaviors in simulation, followed by
zero-shot transfer to real hardware. To the best of our knowledge, GEOTACT is
the first method to reliably retrieve a number of different objects from a
granular environment, doing so on real hardware and with integrated tactile
sensing. Videos and additional information can be found at
https://jxu.ai/geotact
Deep Learning and Medical Imaging for COVID-19 Diagnosis: A Comprehensive Survey
COVID-19 (Coronavirus disease 2019) has been quickly spreading since its
outbreak, impacting financial markets and healthcare systems globally.
Countries all around the world have adopted a number of extraordinary steps to
restrict the spreading virus, where early COVID-19 diagnosis is essential.
Medical images such as X-ray images and Computed Tomography scans are becoming
one of the main diagnostic tools to combat COVID-19 with the aid of deep
learning-based systems. In this survey, we investigate the main contributions
of deep learning applications using medical images in fighting against COVID-19
from the aspects of image classification, lesion localization, and severity
quantification, and review different deep learning architectures and some image
preprocessing techniques for achieving a preciser diagnosis. We also provide a
summary of the X-ray and CT image datasets used in various studies for COVID-19
detection. The key difficulties and potential applications of deep learning in
fighting against COVID-19 are finally discussed. This work summarizes the
latest methods of deep learning using medical images to diagnose COVID-19,
highlighting the challenges and inspiring more studies to keep utilizing the
advantages of deep learning to combat COVID-19
- …