715 research outputs found

    Assessment of IBM and NASA's geospatial foundation model in flood inundation mapping

    Full text link
    Vision foundation models are a new frontier in GeoAI research because of their potential to enable powerful image analysis by learning and extracting important image features from vast amounts of geospatial data. This paper evaluates the performance of the first-of-its-kind geospatial foundation model, IBM-NASA's Prithvi, to support a crucial geospatial analysis task: flood inundation mapping. This model is compared with popular convolutional neural network and vision transformer-based architectures in terms of mapping accuracy for flooded areas. A benchmark dataset, Sen1Floods11, is used in the experiments, and the models' predictability, generalizability, and transferability are evaluated based on both a test dataset and a dataset that is completely unseen by the model. Results show the impressive transferability of the Prithvi model, highlighting its performance advantages in segmenting flooded areas in previously unseen regions. The findings also suggest areas for improvement for the Prithvi model in terms of adopting multi-scale representation learning, developing more end-to-end pipelines for high-level image analysis tasks, and offering more flexibility in terms of input data bands.Comment: 11 pages, 4 figure

    P2RBox: A Single Point is All You Need for Oriented Object Detection

    Full text link
    Oriented object detection, a specialized subfield in computer vision, finds applications across diverse scenarios, excelling particularly when dealing with objects of arbitrary orientations. Conversely, point annotation, which treats objects as single points, offers a cost-effective alternative to rotated and horizontal bounding boxes but sacrifices performance due to the loss of size and orientation information. In this study, we introduce the P2RBox network, which leverages point annotations and a mask generator to create mask proposals, followed by filtration through our Inspector Module and Constrainer Module. This process selects high-quality masks, which are subsequently converted into rotated box annotations for training a fully supervised detector. Specifically, we've thoughtfully crafted an Inspector Module rooted in multi-instance learning principles to evaluate the semantic score of masks. We've also proposed a more robust mask quality assessment in conjunction with the Constrainer Module. Furthermore, we've introduced a Symmetry Axis Estimation (SAE) Module inspired by the spectral theorem for symmetric matrices to transform the top-performing mask proposal into rotated bounding boxes. P2RBox performs well with three fully supervised rotated object detectors: RetinaNet, Rotated FCOS, and Oriented R-CNN. By combining with Oriented R-CNN, P2RBox achieves 62.26% on DOTA-v1.0 test dataset. As far as we know, this is the first attempt at training an oriented object detector with point supervision

    Turning a CLIP Model into a Scene Text Detector

    Full text link
    The recent large-scale Contrastive Language-Image Pretraining (CLIP) model has shown great potential in various downstream tasks via leveraging the pretrained vision and language knowledge. Scene text, which contains rich textual and visual information, has an inherent connection with a model like CLIP. Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection. In contrast to these works, this paper proposes a new method, termed TCM, focusing on Turning the CLIP Model directly for text detection without pretraining process. We demonstrate the advantages of the proposed TCM as follows: (1) The underlying principle of our framework can be applied to improve existing scene text detector. (2) It facilitates the few-shot training capability of existing methods, e.g., by using 10% of labeled data, we significantly improve the performance of the baseline method with an average of 22% in terms of the F-measure on 4 benchmarks. (3) By turning the CLIP model into existing scene text detection methods, we further achieve promising domain adaptation ability. The code will be publicly released at https://github.com/wenwenyu/TCM.Comment: CVPR202

    Turning a CLIP Model into a Scene Text Spotter

    Full text link
    We exploit the potential of the large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning and cross-attention in CLIP to extract image and text-based prior knowledge. Using predefined and learnable prompts, FastTCM-CR50 introduces an instance-language matching process to enhance the synergy between image and text embeddings, thereby refining text regions. Our Bimodal Similarity Matching (BSM) module facilitates dynamic language prompt generation, enabling offline computations and improving performance. FastTCM-CR50 offers several advantages: 1) It can enhance existing text detectors and spotters, improving performance by an average of 1.7% and 1.5%, respectively. 2) It outperforms the previous TCM-CR50 backbone, yielding an average improvement of 0.2% and 0.56% in text detection and spotting tasks, along with a 48.5% increase in inference speed. 3) It showcases robust few-shot training capabilities. Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26.5% and 5.5% for text detection and spotting tasks, respectively. 4) It consistently enhances performance on out-of-distribution text detection and spotting datasets, particularly the NightTime-ArT subset from ICDAR2019-ArT and the DOTA dataset for oriented object detection. The code is available at https://github.com/wenwenyu/TCM.Comment: arXiv admin note: text overlap with arXiv:2302.1433

    Upregulated expression of indoleamine 2, 3-dioxygenase in CHO cells induces apoptosis of competent T cells and increases proportion of Treg cells

    Get PDF
    <p>Abstract</p> <p>Introduction</p> <p>The inflammatory enzyme indoleamine 2, 3-dioxygenase (IDO) participates in immune tolerance and promotes immune escape of IDO+ tumors. A recent hypothesis suggested that IDO may contribute to the differentiation of new T regulatory cells (Tregs) from naive CD4+ T cells. In this study we investigated the role of IDO in induction of immunosuppression in breast cancer by increasing the apoptosis of T cells and the proportion of Tregs.</p> <p>Methods</p> <p>An IDO expression plasmid was constructed and Chinese hamster ovary (CHO) cells were stably transfected with human IDO. Purified CD3+ T cells were isolated from the peripheral blood monouclear cells of breast cancer patients. After co-culturing IDO expressing or untransfected (control) CHO cells with T cells, T cells apoptosis were determined by flow cytometry analysis and annexin-V and PI staining. The proportion of the regulatory T cell (Tregs [CD4 + CD25 + CD127-]) subset was measured by flow cytometry analysis. T cells total RNA and cellular protein samples were isolated for detecting Foxp3 gene and protein expression.</p> <p>Results</p> <p>IDO transgenic CHO cells yielded high levels of IDO enzymatic activity, resulting in complete depletion of tryptophan from the culture medium. We found that apoptosis occurred in 79.07 ± 8.13% of CD3+T cells after co-cultured with IDO+ CHO cells for 3 days and the proportion of CD4 + CD25 + CD127- T cells increased from 3.43 ± 1.07% to 8.98 ± 1.88% (<it>P </it>< 0.05) as well. The specific inhibitor of IDO,1-MT efficiently reversed enhancement of T cells apoptosis and amplification of Tregs in vitro. Increased expression of Foxp3, a key molecular marker of Tregs, was confirmed by RT-PCR, real-time RT-PCR and Western blot analysis at the same time.</p> <p>Conclusions</p> <p>These results suggest that IDO helps to create a tolerogenic milieu in breast tumors by directly inducing T cell apoptosis and enhancing Treg-mediated immunosuppression.</p
    corecore