Search CORE

357 research outputs found

Towards Open-Ended Visual Recognition with Large Language Model

Author: Chen Liang-Chieh
Shen Xiaohui
Yu Qihang
Publication venue
Publication date: 14/11/2023
Field of study

Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception. Recent methods have endeavored to address the issue by employing a class-agnostic mask (or box) proposal model, complemented by an open-vocabulary classifier (e.g., CLIP) using pre-extracted text embeddings. However, it is worth noting that these open-vocabulary recognition models still exhibit limitations in practical applications. On one hand, they rely on the provision of class names during testing, where the recognition performance heavily depends on this predefined set of semantic classes by users. On the other hand, when training with multiple datasets, human intervention is required to alleviate the label definition conflict between them. In this paper, we introduce the OmniScient Model (OSM), a novel Large Language Model (LLM) based mask classifier, as a straightforward and effective solution to the aforementioned challenges. Specifically, OSM predicts class labels in a generative manner, thus removing the supply of class names during both training and testing. It also enables cross-dataset training without any human interference, exhibiting robust generalization capabilities due to the world knowledge acquired from the LLM. By combining OSM with an off-the-shelf mask proposal model, we present promising results on various benchmarks, and demonstrate its effectiveness in handling novel concepts. Code/model are available at https://github.com/bytedance/OmniScient-Model

arXiv.org e-Print Archive

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Author: Chen Liang-Chieh
Deng Xueqing
He Ju
Shen Xiaohui
Yu Qihang
Publication venue
Publication date: 04/08/2023
Field of study

Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a shared embedding space, which bridges the gap between closed-vocabulary and open-vocabulary recognition. Hence, existing methods often adopt a two-stage framework to tackle the problem, where the inputs first go through a mask generator and then through the CLIP model along with the predicted masks. This process involves extracting features from images multiple times, which can be ineffective and inefficient. By contrast, we propose to build everything into a single-stage framework using a shared Frozen Convolutional CLIP backbone, which not only significantly simplifies the current two-stage pipeline, but also remarkably yields a better accuracy-cost trade-off. The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining. When training on COCO panoptic data only and testing in a zero-shot manner, FC-CLIP achieve 26.8 PQ, 16.8 AP, and 34.1 mIoU on ADE20K, 18.2 PQ, 27.9 mIoU on Mapillary Vistas, 44.0 PQ, 26.8 AP, 56.2 mIoU on Cityscapes, outperforming the prior art by +4.2 PQ, +2.4 AP, +4.2 mIoU on ADE20K, +4.0 PQ on Mapillary Vistas and +20.1 PQ on Cityscapes, respectively. Additionally, the training and testing time of FC-CLIP is 7.5x and 6.6x significantly faster than the same prior art, while using 5.9x fewer parameters. FC-CLIP also sets a new state-of-the-art performance across various open-vocabulary semantic segmentation datasets. Code at https://github.com/bytedance/fc-clipComment: code and model available at https://github.com/bytedance/fc-cli

arXiv.org e-Print Archive

Recommended from our members

T Oligo-Primed Polymerase Chain Reaction (TOP-PCR): A Robust Method for the Amplification of Minute DNA Fragments in Body Fluids.

Author: Chen Chien-Jen
Chen Tzu-Han
Chiu Kuo Ping
Huang Yu-Feng
Midha Mohit K
Nai Yu-Shin
Shen Chen-Yang
Shiau Hsin-Chieh
Yu Alice L
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Body fluid DNA sequencing is a powerful noninvasive approach for the diagnosis of genetic defects, infectious agents and diseases. The success relies on the quantity and quality of the DNA samples. However, numerous clinical samples are either at low quantity or of poor quality due to various reasons. To overcome these problems, we have developed T oligo-primed polymerase chain reaction (TOP-PCR) for full-length nonselective amplification of minute quantity of DNA fragments. TOP-PCR adopts homogeneous "half adaptor" (HA), generated by annealing P oligo (carrying a phosphate group at the 5' end) and T oligo (carrying a T-tail at the 3' end), for efficient ligation to target DNA and subsequent PCR amplification primed by the T oligo alone. Using DNA samples from body fluids, we demonstrate that TOP-PCR recovers minute DNA fragments and maintains the DNA size profile, while enhancing the major molecular populations. Our results also showed that TOP-PCR is a superior method for detecting apoptosis and outperforms the method adopted by Illumina for DNA amplification

eScholarship - University of California

EvIcon: Designing High-Usability Icon with Human-in-the-loop Exploration and IconCLIP

Author: Chen Bing-Yu
Cherng Fu-Yin
Igarashi Takeo
Lin Wen-Chieh
Shen I-Chao
Publication venue
Publication date: 27/05/2023
Field of study

Interface icons are prevalent in various digital applications. Due to limited time and budgets, many designers rely on informal evaluation, which often results in poor usability icons. In this paper, we propose a unique human-in-the-loop framework that allows our target users, i.e., novice and professional UI designers, to improve the usability of interface icons efficiently. We formulate several usability criteria into a perceptual usability function and enable users to iteratively revise an icon set with an interactive design tool, EvIcon. We take a large-scale pre-trained joint image-text embedding (CLIP) and fine-tune it to embed icon visuals with icon tags in the same embedding space (IconCLIP). During the revision process, our design tool provides two types of instant perceptual usability feedback. First, we provide perceptual usability feedback modeled by deep learning models trained on IconCLIP embeddings and crowdsourced perceptual ratings. Second, we use the embedding space of IconCLIP to assist users in improving icons' visual distinguishability among icons within the user-prepared icon set. To provide the perceptual prediction, we compiled IconCEPT10K, the first large-scale dataset of perceptual usability ratings over

10,000

interface icons, by conducting a crowdsourcing study. We demonstrated that our framework could benefit UI designers' interface icon revision process with a wide range of professional experience. Moreover, the interface icons designed using our framework achieved better semantic distance and familiarity, verified by an additional online user study

arXiv.org e-Print Archive

MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation

Author: Chen Liang-Chieh
Deng Xueqing
He Ju
Shen Xiaohui
Shin Inkyu
Yu Qihang
Yuille Alan
Publication venue
Publication date: 30/11/2023
Field of study

Video panoptic segmentation requires consistently segmenting (for both `thing' and `stuff' classes) and tracking objects in a video over time. In this work, we present MaXTron, a general framework that exploits Mask XFormer with Trajectory Attention to tackle the task. MaXTron enriches an off-the-shelf mask transformer by leveraging trajectory attention. The deployed mask transformer takes as input a short clip consisting of only a few frames and predicts the clip-level segmentation. To enhance the temporal consistency, MaXTron employs within-clip and cross-clip tracking modules, efficiently utilizing trajectory attention. Originally designed for video classification, trajectory attention learns to model the temporal correspondences between neighboring frames and aggregates information along the estimated motion paths. However, it is nontrivial to directly extend trajectory attention to the per-pixel dense prediction tasks due to its quadratic dependency on input size. To alleviate the issue, we propose to adapt the trajectory attention for both the dense pixel features and object queries, aiming to improve the short-term and long-term tracking results, respectively. Particularly, in our within-clip tracking module, we propose axial-trajectory attention that effectively computes the trajectory attention for tracking dense pixels sequentially along the height- and width-axes. The axial decomposition significantly reduces the computational complexity for dense pixel features. In our cross-clip tracking module, since the object queries in mask transformer are learned to encode the object information, we are able to capture the long-term temporal connections by applying trajectory attention to object queries, which learns to track each object across different clips. Without bells and whistles, MaXTron demonstrates state-of-the-art performances on video segmentation benchmarks.Comment: Code at https://github.com/TACJu/MaXTro

arXiv.org e-Print Archive

Single nucleotide polymorphisms of one-carbon metabolism and cancers of the esophagus, stomach, and liver in a Chinese population.

Author: Baecker Aileen
Butler Brendan
Cai Lin
Chang Po-Yin
Chang Shen-Chih
Goldstein Binh Y
Greenland Sander
Heber David
Li Liming
Lu Qing-Yi
Mu Lina
You Nai-Chieh Y
Yu Shun-Zhang
Zhang Zuo-Feng
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

One-carbon metabolism (folate metabolism) is considered important in carcinogenesis because of its involvement in DNA synthesis and biological methylation reactions. We investigated the associations of single nucleotide polymorphisms (SNPs) in folate metabolic pathway and the risk of three GI cancers in a population-based case-control study in Taixing City, China, with 218 esophageal cancer cases, 206 stomach cancer cases, 204 liver cancer cases, and 415 healthy population controls. Study participants were interviewed with a standardized questionnaire, and blood samples were collected after the interviews. We genotyped SNPs of the MTHFR, MTR, MTRR, DNMT1, and ALDH2 genes, using PCR-RFLP, SNPlex, or TaqMan assays. To account for multiple comparisons and reduce the chances of false reports, we employed semi-Bayes (SB) shrinkage analysis. After shrinkage and adjusting for potential confounding factors, we found positive associations between MTHFR rs1801133 and stomach cancer (any T versus C/C, SB odds-ratio [SBOR]: 1.79, 95% posterior limits: 1.18, 2.71) and liver cancer (SBOR: 1.51, 95% posterior limits: 0.98, 2.32). There was an inverse association between DNMT1 rs2228612 and esophageal cancer (any G versus A/A, SBOR: 0.60, 95% posterior limits: 0.39, 0.94). In addition, we detected potential heterogeneity across alcohol drinking status for ORs relating MTRR rs1801394 to esophageal (posterior homogeneity P = 0.005) and stomach cancer (posterior homogeneity P = 0.004), and ORs relating MTR rs1805087 to liver cancer (posterior homogeneity P = 0.021). Among non-alcohol drinkers, the variant allele (allele G) of these two SNPs was inversely associated with the risk of these cancers; while a positive association was observed among ever-alcohol drinkers. Our results suggest that genetic polymorphisms related to one-carbon metabolism may be associated with cancers of the esophagus, stomach, and liver. Heterogeneity across alcohol consumption status of the associations between MTR/MTRR polymorphisms and these cancers indicates potential interactions between alcohol drinking and one-carbon metabolic pathway

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

Anti-IL-17A antibody-associated de novo vitiligo: Case report and review of literature

Author: Chau Yee Ng
Chau Yee Ng
Chau Yee Ng
Chau Yee Ng
Cheng-Lung Ku
Hsing-Jou Su
Hsing-Jou Su
Peng-Chieh Shen
Peng-Chieh Shen
Yu-Pei Chan
Yu-Pei Chan
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2023
Field of study

Interleukin (IL)-17 inhibitor is a biological therapy approved for moderate to severe psoriasis and psoriatic arthritis. The common adverse events of IL-17 inhibitor include injection site reaction, infections, nasopharyngitis, and headache. However, vitiligo associated with the use of IL-17 inhibitors was rarely reported in the previous literature. Here we described a woman who developed de novo vitiligo after 4 months of IL-17A inhibitor treatment for psoriasis and psoriatic arthritis. Upon discontinuation of IL-17A inhibitor and shifting to a broader T cell inhibitor—cyclosporine, our patient had control of both psoriasis and vitiligo and achieved 75% repigmentation after 3 months of oral cyclosporine without phototherapy. Due to the increasing use of anti-IL-17 biologics in psoriasis patients, clinicians should inquire about vitiligo’s history before treatment and inform patients of the possible adverse effects

Directory of Open Access Journals

Endothelial FGF signaling is protective in hypoxia-induced pulmonary hypertension

Author: Byers Derek E
Chakinala Murali
Kovacs Attila
Lin Chieh-Yu
Nigro Jessica
Ornitz David M
Shen Isabel Y
Weinheimer Carla J
Woo Kel Vin
Publication venue: Digital Commons@Becker
Publication date: 01/09/2021
Field of study

Hypoxia-induced pulmonary hypertension (PH) is one of the most common and deadliest forms of PH. Fibroblast growth factor receptors 1 and 2 (FGFR1/2) are elevated in patients with PH and in mice exposed to chronic hypoxia. Endothelial FGFR1/2 signaling is important for the adaptive response to several injury types and we hypothesized that endothelial FGFR1/2 signaling would protect against hypoxia-induced PH. Mice lacking endothelial FGFR1/2, mice with activated endothelial FGFR signaling, and human pulmonary artery endothelial cells (HPAECs) were challenged with hypoxia. We assessed the effect of FGFR activation and inhibition on right ventricular pressure, vascular remodeling, and endothelial-mesenchymal transition (EndMT), a known pathologic change seen in patients with PH. Hypoxia-exposed mice lacking endothelial FGFRs developed increased PH, while mice overexpressing a constitutively active FGFR in endothelial cells did not develop PH. Mechanistically, lack of endothelial FGFRs or inhibition of FGFRs in HPAECs led to increased TGF-β signaling and increased EndMT in response to hypoxia. These phenotypes were reversed in mice with activated endothelial FGFR signaling, suggesting that FGFR signaling inhibits TGF-β pathway-mediated EndMT during chronic hypoxia. Consistent with these observations, lung tissue from patients with PH showed activation of FGFR and TGF-β signaling. Collectively, these data suggest that activation of endothelial FGFR signaling could be therapeutic for hypoxia-induced PH

Digital Commons@Becker

PubMed Central

Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency

Author: Chen Guan-Yu
Chen Hung-Jen
Cheng Chia-Ming
Chiang Cheng-Ming
Kao Kloze
Kuo Hsien-Kai
Lin Wei-Shiang
Lin Yu-Chieh
Shen BY
Tan Koan-Sin
Tsai Yi-Min
Tseng Shou-Yao Roy
Tseng Yu
Wang Wei-Ting
Xu Yu-Syuan
Yu Chia-Lin
Publication venue
Publication date: 27/04/2020
Field of study

Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency variation due to the difference and limitation of deep learning accelerators on mobile devices. In this paper, we conduct a search of portable network architectures for better quality-latency trade-off across mobile devices. We further present the effectiveness of widely used network optimizations for image deblurring task. This paper provides comprehensive experiments and comparisons to uncover the in-depth analysis for both latency and image quality. Through all the above works, we demonstrate the successful deployment of image deblurring application on mobile devices with the acceleration of deep learning accelerators. To the best of our knowledge, this is the first paper that addresses all the deployment issues of image deblurring task across mobile devices. This paper provides practical deployment-guidelines, and is adopted by the championship-winning team in NTIRE 2020 Image Deblurring Challenge on Smartphone Track.Comment: CVPR 2020 Workshop on New Trends in Image Restoration and Enhancement (NTIRE

arXiv.org e-Print Archive

Crossref