357 research outputs found

    Towards Open-Ended Visual Recognition with Large Language Model

    Full text link
    Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception. Recent methods have endeavored to address the issue by employing a class-agnostic mask (or box) proposal model, complemented by an open-vocabulary classifier (e.g., CLIP) using pre-extracted text embeddings. However, it is worth noting that these open-vocabulary recognition models still exhibit limitations in practical applications. On one hand, they rely on the provision of class names during testing, where the recognition performance heavily depends on this predefined set of semantic classes by users. On the other hand, when training with multiple datasets, human intervention is required to alleviate the label definition conflict between them. In this paper, we introduce the OmniScient Model (OSM), a novel Large Language Model (LLM) based mask classifier, as a straightforward and effective solution to the aforementioned challenges. Specifically, OSM predicts class labels in a generative manner, thus removing the supply of class names during both training and testing. It also enables cross-dataset training without any human interference, exhibiting robust generalization capabilities due to the world knowledge acquired from the LLM. By combining OSM with an off-the-shelf mask proposal model, we present promising results on various benchmarks, and demonstrate its effectiveness in handling novel concepts. Code/model are available at https://github.com/bytedance/OmniScient-Model

    Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

    Full text link
    Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a shared embedding space, which bridges the gap between closed-vocabulary and open-vocabulary recognition. Hence, existing methods often adopt a two-stage framework to tackle the problem, where the inputs first go through a mask generator and then through the CLIP model along with the predicted masks. This process involves extracting features from images multiple times, which can be ineffective and inefficient. By contrast, we propose to build everything into a single-stage framework using a shared Frozen Convolutional CLIP backbone, which not only significantly simplifies the current two-stage pipeline, but also remarkably yields a better accuracy-cost trade-off. The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining. When training on COCO panoptic data only and testing in a zero-shot manner, FC-CLIP achieve 26.8 PQ, 16.8 AP, and 34.1 mIoU on ADE20K, 18.2 PQ, 27.9 mIoU on Mapillary Vistas, 44.0 PQ, 26.8 AP, 56.2 mIoU on Cityscapes, outperforming the prior art by +4.2 PQ, +2.4 AP, +4.2 mIoU on ADE20K, +4.0 PQ on Mapillary Vistas and +20.1 PQ on Cityscapes, respectively. Additionally, the training and testing time of FC-CLIP is 7.5x and 6.6x significantly faster than the same prior art, while using 5.9x fewer parameters. FC-CLIP also sets a new state-of-the-art performance across various open-vocabulary semantic segmentation datasets. Code at https://github.com/bytedance/fc-clipComment: code and model available at https://github.com/bytedance/fc-cli

    EvIcon: Designing High-Usability Icon with Human-in-the-loop Exploration and IconCLIP

    Full text link
    Interface icons are prevalent in various digital applications. Due to limited time and budgets, many designers rely on informal evaluation, which often results in poor usability icons. In this paper, we propose a unique human-in-the-loop framework that allows our target users, i.e., novice and professional UI designers, to improve the usability of interface icons efficiently. We formulate several usability criteria into a perceptual usability function and enable users to iteratively revise an icon set with an interactive design tool, EvIcon. We take a large-scale pre-trained joint image-text embedding (CLIP) and fine-tune it to embed icon visuals with icon tags in the same embedding space (IconCLIP). During the revision process, our design tool provides two types of instant perceptual usability feedback. First, we provide perceptual usability feedback modeled by deep learning models trained on IconCLIP embeddings and crowdsourced perceptual ratings. Second, we use the embedding space of IconCLIP to assist users in improving icons' visual distinguishability among icons within the user-prepared icon set. To provide the perceptual prediction, we compiled IconCEPT10K, the first large-scale dataset of perceptual usability ratings over 10,00010,000 interface icons, by conducting a crowdsourcing study. We demonstrated that our framework could benefit UI designers' interface icon revision process with a wide range of professional experience. Moreover, the interface icons designed using our framework achieved better semantic distance and familiarity, verified by an additional online user study

    MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation

    Full text link
    Video panoptic segmentation requires consistently segmenting (for both `thing' and `stuff' classes) and tracking objects in a video over time. In this work, we present MaXTron, a general framework that exploits Mask XFormer with Trajectory Attention to tackle the task. MaXTron enriches an off-the-shelf mask transformer by leveraging trajectory attention. The deployed mask transformer takes as input a short clip consisting of only a few frames and predicts the clip-level segmentation. To enhance the temporal consistency, MaXTron employs within-clip and cross-clip tracking modules, efficiently utilizing trajectory attention. Originally designed for video classification, trajectory attention learns to model the temporal correspondences between neighboring frames and aggregates information along the estimated motion paths. However, it is nontrivial to directly extend trajectory attention to the per-pixel dense prediction tasks due to its quadratic dependency on input size. To alleviate the issue, we propose to adapt the trajectory attention for both the dense pixel features and object queries, aiming to improve the short-term and long-term tracking results, respectively. Particularly, in our within-clip tracking module, we propose axial-trajectory attention that effectively computes the trajectory attention for tracking dense pixels sequentially along the height- and width-axes. The axial decomposition significantly reduces the computational complexity for dense pixel features. In our cross-clip tracking module, since the object queries in mask transformer are learned to encode the object information, we are able to capture the long-term temporal connections by applying trajectory attention to object queries, which learns to track each object across different clips. Without bells and whistles, MaXTron demonstrates state-of-the-art performances on video segmentation benchmarks.Comment: Code at https://github.com/TACJu/MaXTro

    Single nucleotide polymorphisms of one-carbon metabolism and cancers of the esophagus, stomach, and liver in a Chinese population.

    Get PDF
    One-carbon metabolism (folate metabolism) is considered important in carcinogenesis because of its involvement in DNA synthesis and biological methylation reactions. We investigated the associations of single nucleotide polymorphisms (SNPs) in folate metabolic pathway and the risk of three GI cancers in a population-based case-control study in Taixing City, China, with 218 esophageal cancer cases, 206 stomach cancer cases, 204 liver cancer cases, and 415 healthy population controls. Study participants were interviewed with a standardized questionnaire, and blood samples were collected after the interviews. We genotyped SNPs of the MTHFR, MTR, MTRR, DNMT1, and ALDH2 genes, using PCR-RFLP, SNPlex, or TaqMan assays. To account for multiple comparisons and reduce the chances of false reports, we employed semi-Bayes (SB) shrinkage analysis. After shrinkage and adjusting for potential confounding factors, we found positive associations between MTHFR rs1801133 and stomach cancer (any T versus C/C, SB odds-ratio [SBOR]: 1.79, 95% posterior limits: 1.18, 2.71) and liver cancer (SBOR: 1.51, 95% posterior limits: 0.98, 2.32). There was an inverse association between DNMT1 rs2228612 and esophageal cancer (any G versus A/A, SBOR: 0.60, 95% posterior limits: 0.39, 0.94). In addition, we detected potential heterogeneity across alcohol drinking status for ORs relating MTRR rs1801394 to esophageal (posterior homogeneity P = 0.005) and stomach cancer (posterior homogeneity P = 0.004), and ORs relating MTR rs1805087 to liver cancer (posterior homogeneity P = 0.021). Among non-alcohol drinkers, the variant allele (allele G) of these two SNPs was inversely associated with the risk of these cancers; while a positive association was observed among ever-alcohol drinkers. Our results suggest that genetic polymorphisms related to one-carbon metabolism may be associated with cancers of the esophagus, stomach, and liver. Heterogeneity across alcohol consumption status of the associations between MTR/MTRR polymorphisms and these cancers indicates potential interactions between alcohol drinking and one-carbon metabolic pathway

    Anti-IL-17A antibody-associated de novo vitiligo: Case report and review of literature

    Get PDF
    Interleukin (IL)-17 inhibitor is a biological therapy approved for moderate to severe psoriasis and psoriatic arthritis. The common adverse events of IL-17 inhibitor include injection site reaction, infections, nasopharyngitis, and headache. However, vitiligo associated with the use of IL-17 inhibitors was rarely reported in the previous literature. Here we described a woman who developed de novo vitiligo after 4 months of IL-17A inhibitor treatment for psoriasis and psoriatic arthritis. Upon discontinuation of IL-17A inhibitor and shifting to a broader T cell inhibitor—cyclosporine, our patient had control of both psoriasis and vitiligo and achieved 75% repigmentation after 3 months of oral cyclosporine without phototherapy. Due to the increasing use of anti-IL-17 biologics in psoriasis patients, clinicians should inquire about vitiligo’s history before treatment and inform patients of the possible adverse effects

    Endothelial FGF signaling is protective in hypoxia-induced pulmonary hypertension

    Get PDF
    Hypoxia-induced pulmonary hypertension (PH) is one of the most common and deadliest forms of PH. Fibroblast growth factor receptors 1 and 2 (FGFR1/2) are elevated in patients with PH and in mice exposed to chronic hypoxia. Endothelial FGFR1/2 signaling is important for the adaptive response to several injury types and we hypothesized that endothelial FGFR1/2 signaling would protect against hypoxia-induced PH. Mice lacking endothelial FGFR1/2, mice with activated endothelial FGFR signaling, and human pulmonary artery endothelial cells (HPAECs) were challenged with hypoxia. We assessed the effect of FGFR activation and inhibition on right ventricular pressure, vascular remodeling, and endothelial-mesenchymal transition (EndMT), a known pathologic change seen in patients with PH. Hypoxia-exposed mice lacking endothelial FGFRs developed increased PH, while mice overexpressing a constitutively active FGFR in endothelial cells did not develop PH. Mechanistically, lack of endothelial FGFRs or inhibition of FGFRs in HPAECs led to increased TGF-β signaling and increased EndMT in response to hypoxia. These phenotypes were reversed in mice with activated endothelial FGFR signaling, suggesting that FGFR signaling inhibits TGF-β pathway-mediated EndMT during chronic hypoxia. Consistent with these observations, lung tissue from patients with PH showed activation of FGFR and TGF-β signaling. Collectively, these data suggest that activation of endothelial FGFR signaling could be therapeutic for hypoxia-induced PH

    Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency

    Full text link
    Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency variation due to the difference and limitation of deep learning accelerators on mobile devices. In this paper, we conduct a search of portable network architectures for better quality-latency trade-off across mobile devices. We further present the effectiveness of widely used network optimizations for image deblurring task. This paper provides comprehensive experiments and comparisons to uncover the in-depth analysis for both latency and image quality. Through all the above works, we demonstrate the successful deployment of image deblurring application on mobile devices with the acceleration of deep learning accelerators. To the best of our knowledge, this is the first paper that addresses all the deployment issues of image deblurring task across mobile devices. This paper provides practical deployment-guidelines, and is adopted by the championship-winning team in NTIRE 2020 Image Deblurring Challenge on Smartphone Track.Comment: CVPR 2020 Workshop on New Trends in Image Restoration and Enhancement (NTIRE
    • …
    corecore