258 research outputs found

    Camouflaged Object Detection with Feature Grafting and Distractor Aware

    Full text link
    The task of Camouflaged Object Detection (COD) aims to accurately segment camouflaged objects that integrated into the environment, which is more challenging than ordinary detection as the texture between the target and background is visually indistinguishable. In this paper, we proposed a novel Feature Grafting and Distractor Aware network (FDNet) to handle the COD task. Specifically, we use CNN and Transformer to encode multi-scale images in parallel. In order to better explore the advantages of the two encoders, we design a cross-attention-based Feature Grafting Module to graft features extracted from Transformer branch into CNN branch, after which the features are aggregated in the Feature Fusion Module. A Distractor Aware Module is designed to explicitly model the two possible distractors in the COD task to refine the coarse camouflage map. We also proposed the largest artificial camouflaged object dataset which contains 2000 images with annotations, named ACOD2K. We conducted extensive experiments on four widely used benchmark datasets and the ACOD2K dataset. The results show that our method significantly outperforms other state-of-the-art methods. The code and the ACOD2K will be available at https://github.com/syxvision/FDNet.Comment: ICME2023 pape

    HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA

    Full text link
    As language model agents leveraging external tools rapidly evolve, significant progress has been made in question-answering(QA) methodologies utilizing supplementary documents and the Retrieval-Augmented Generation (RAG) approach. This advancement has improved the response quality of language models and alleviates the appearance of hallucination. However, these methods exhibit limited retrieval accuracy when faced with massive indistinguishable documents, presenting notable challenges in their practical application. In response to these emerging challenges, we present HiQA, an advanced framework for multi-document question-answering (MDQA) that integrates cascading metadata into content as well as a multi-route retrieval mechanism. We also release a benchmark called MasQA to evaluate and research in MDQA. Finally, HiQA demonstrates the state-of-the-art performance in multi-document environments

    Robust Calibrate Proxy Loss for Deep Metric Learning

    Full text link
    The mainstream researche in deep metric learning can be divided into two genres: proxy-based and pair-based methods. Proxy-based methods have attracted extensive attention due to the lower training complexity and fast network convergence. However, these methods have limitations as the poxy optimization is done by network, which makes it challenging for the proxy to accurately represent the feature distrubtion of the real class of data. In this paper, we propose a Calibrate Proxy (CP) structure, which uses the real sample information to improve the similarity calculation in proxy-based loss and introduces a calibration loss to constraint the proxy optimization towards the center of the class features. At the same time, we set a small number of proxies for each class to alleviate the impact of intra-class differences on retrieval performance. The effectiveness of our method is evaluated by extensive experiments on three public datasets and multiple synthetic label-noise datasets. The results show that our approach can effectively improve the performance of commonly used proxy-based losses on both regular and noisy datasets

    HPMC:A multi-target tracking algorithm for the IoT

    Get PDF

    Prompt Learning with Optimal Transport for Vision-Language Models

    Full text link
    With the increasing attention to large vision-language models such as CLIP, there has been a significant amount of effort dedicated to building efficient prompts. Unlike conventional methods of only learning one single prompt, we propose to learn multiple comprehensive prompts to describe diverse characteristics of categories such as intrinsic attributes or extrinsic contexts. However, directly matching each prompt to the same visual feature is problematic, as it pushes the prompts to converge to one point. To solve this problem, we propose to apply optimal transport to match the vision and text modalities. Specifically, we first model images and the categories with visual and textual feature sets. Then, we apply a two-stage optimization strategy to learn the prompts. In the inner loop, we optimize the optimal transport distance to align visual features and prompts by the Sinkhorn algorithm, while in the outer loop, we learn the prompts by this distance from the supervised data. Extensive experiments are conducted on the few-shot recognition task and the improvement demonstrates the superiority of our method

    Advances in reprogramming of energy metabolism in tumor T cells

    Get PDF
    Cancer is a leading cause of human death worldwide, and the modulation of the metabolic properties of T cells employed in cancer immunotherapy holds great promise for combating cancer. As a crucial factor, energy metabolism influences the activation, proliferation, and function of T cells, and thus metabolic reprogramming of T cells is a unique research perspective in cancer immunology. Special conditions within the tumor microenvironment and high-energy demands lead to alterations in the energy metabolism of T cells. In-depth research on the reprogramming of energy metabolism in T cells can reveal the mechanisms underlying tumor immune tolerance and provide important clues for the development of new tumor immunotherapy strategies as well. Therefore, the study of T cell energy metabolism has important clinical significance and potential applications. In the study, the current achievements in the reprogramming of T cell energy metabolism were reviewed. Then, the influencing factors associated with T cell energy metabolism were introduced. In addition, T cell energy metabolism in cancer immunotherapy was summarized, which highlighted its potential significance in enhancing T cell function and therapeutic outcomes. In summary, energy exhaustion of T cells leads to functional exhaustion, thus resulting in immune evasion by cancer cells. A better understanding of reprogramming of T cell energy metabolism may enable immunotherapy to combat cancer and holds promise for optimizing and enhancing existing therapeutic approaches

    Tactile-based Object Retrieval From Granular Media

    Full text link
    We introduce GEOTACT, a robotic manipulation method capable of retrieving objects buried in granular media. This is a challenging task due to the need to interact with granular media, and doing so based exclusively on tactile feedback, since a buried object can be completely hidden from vision. Tactile feedback is in itself challenging in this context, due to ubiquitous contact with the surrounding media, and the inherent noise level induced by the tactile readings. To address these challenges, we use a learning method trained end-to-end with simulated sensor noise. We show that our problem formulation leads to the natural emergence of learned pushing behaviors that the manipulator uses to reduce uncertainty and funnel the object to a stable grasp despite spurious and noisy tactile readings. We also introduce a training curriculum that enables learning these behaviors in simulation, followed by zero-shot transfer to real hardware. To the best of our knowledge, GEOTACT is the first method to reliably retrieve a number of different objects from a granular environment, doing so on real hardware and with integrated tactile sensing. Videos and additional information can be found at https://jxu.ai/geotact

    Deep Learning and Medical Imaging for COVID-19 Diagnosis: A Comprehensive Survey

    Full text link
    COVID-19 (Coronavirus disease 2019) has been quickly spreading since its outbreak, impacting financial markets and healthcare systems globally. Countries all around the world have adopted a number of extraordinary steps to restrict the spreading virus, where early COVID-19 diagnosis is essential. Medical images such as X-ray images and Computed Tomography scans are becoming one of the main diagnostic tools to combat COVID-19 with the aid of deep learning-based systems. In this survey, we investigate the main contributions of deep learning applications using medical images in fighting against COVID-19 from the aspects of image classification, lesion localization, and severity quantification, and review different deep learning architectures and some image preprocessing techniques for achieving a preciser diagnosis. We also provide a summary of the X-ray and CT image datasets used in various studies for COVID-19 detection. The key difficulties and potential applications of deep learning in fighting against COVID-19 are finally discussed. This work summarizes the latest methods of deep learning using medical images to diagnose COVID-19, highlighting the challenges and inspiring more studies to keep utilizing the advantages of deep learning to combat COVID-19
    corecore