Search CORE

147 research outputs found

RSGPT: A Remote Sensing Vision Language Model and Benchmark

Author: Hu Yuan
Li Xiang
Lu Xiaonan
Wen Congcong
Yuan Jianlong
Publication venue
Publication date: 27/07/2023
Field of study

The emergence of large-scale large language models, with GPT-4 as a prominent example, has significantly propelled the rapid advancement of artificial general intelligence and sparked the revolution of Artificial Intelligence 2.0. In the realm of remote sensing (RS), there is a growing interest in developing large vision language models (VLMs) specifically tailored for data analysis in this domain. However, current research predominantly revolves around visual recognition tasks, lacking comprehensive, large-scale image-text datasets that are aligned and suitable for training large VLMs, which poses significant challenges to effectively training such models for RS applications. In computer vision, recent research has demonstrated that fine-tuning large vision language models on small-scale, high-quality datasets can yield impressive performance in visual and language understanding. These results are comparable to state-of-the-art VLMs trained from scratch on massive amounts of data, such as GPT-4. Inspired by this captivating idea, in this work, we build a high-quality Remote Sensing Image Captioning dataset (RSICap) that facilitates the development of large VLMs in the RS field. Unlike previous RS datasets that either employ model-generated captions or short descriptions, RSICap comprises 2,585 human-annotated captions with rich and high-quality information. This dataset offers detailed descriptions for each image, encompassing scene descriptions (e.g., residential area, airport, or farmland) as well as object information (e.g., color, shape, quantity, absolute position, etc). To facilitate the evaluation of VLMs in the field of RS, we also provide a benchmark evaluation dataset called RSIEval. This dataset consists of human-annotated captions and visual question-answer pairs, allowing for a comprehensive assessment of VLMs in the context of RS

arXiv.org e-Print Archive

Multi Receptive Field Network for Semantic Segmentation

Author: Deng Zelu
Jianlong Yuan
Shu Wang
Zhenbo Luo
Publication venue
Publication date: 17/11/2020
Field of study

Semantic segmentation is one of the key tasks in computer vision, which is to assign a category label to each pixel in an image. Despite significant progress achieved recently, most existing methods still suffer from two challenging issues: 1) the size of objects and stuff in an image can be very diverse, demanding for incorporating multi-scale features into the fully convolutional networks (FCNs); 2) the pixels close to or at the boundaries of object/stuff are hard to classify due to the intrinsic weakness of convolutional networks. To address the first issue, we propose a new Multi-Receptive Field Module (MRFM), explicitly taking multi-scale features into account. For the second issue, we design an edge-aware loss which is effective in distinguishing the boundaries of object/stuff. With these two designs, our Multi Receptive Field Network achieves new state-of-the-art results on two widely-used semantic segmentation benchmark datasets. Specifically, we achieve a mean IoU of 83.0 on the Cityscapes dataset and 88.4 mean IoU on the Pascal VOC2012 dataset.Comment: Accept by WACV 202

arXiv.org e-Print Archive

UniNeXt: Exploring A Unified Architecture for Vision Recognition

Author: Lin Fangjian
Wang Fan
Wang Zhibin
Wu Sitong
Yuan Jianlong
Publication venue
Publication date: 26/04/2023
Field of study

Vision Transformers have shown great potential in computer vision tasks. Most recent works have focused on elaborating the spatial token mixer for performance gains. However, we observe that a well-designed general architecture can significantly improve the performance of the entire backbone, regardless of which spatial token mixer is equipped. In this paper, we propose UniNeXt, an improved general architecture for the vision backbone. To verify its effectiveness, we instantiate the spatial token mixer with various typical and modern designs, including both convolution and attention modules. Compared with the architecture in which they are first proposed, our UniNeXt architecture can steadily boost the performance of all the spatial token mixers, and narrows the performance gap among them. Surprisingly, our UniNeXt equipped with naive local window attention even outperforms the previous state-of-the-art. Interestingly, the ranking of these spatial token mixers also changes under our UniNeXt, suggesting that an excellent spatial token mixer may be stifled due to a suboptimal general architecture, which further shows the importance of the study on the general architecture of vision backbone. All models and codes will be publicly available

arXiv.org e-Print Archive

Learning Profitable NFT Image Diffusions via Multiple Visual-Policy Guided Reinforcement Learning

Author: Chao Hongyang
Fu Jianlong
He Huiguo
Wang Tianfu
Yang Huan
Yin Jian
Yuan Nicholas Jing
Zhang Qi
Publication venue
Publication date: 20/06/2023
Field of study

We study the task of generating profitable Non-Fungible Token (NFT) images from user-input texts. Recent advances in diffusion models have shown great potential for image generation. However, existing works can fall short in generating visually-pleasing and highly-profitable NFT images, mainly due to the lack of 1) plentiful and fine-grained visual attribute prompts for an NFT image, and 2) effective optimization metrics for generating high-quality NFT images. To solve these challenges, we propose a Diffusion-based generation framework with Multiple Visual-Policies as rewards (i.e., Diffusion-MVP) for NFT images. The proposed framework consists of a large language model (LLM), a diffusion-based image generator, and a series of visual rewards by design. First, the LLM enhances a basic human input (such as "panda") by generating more comprehensive NFT-style prompts that include specific visual attributes, such as "panda with Ninja style and green background." Second, the diffusion-based image generator is fine-tuned using a large-scale NFT dataset to capture fine-grained image styles and accessory compositions of popular NFT elements. Third, we further propose to utilize multiple visual-policies as optimization goals, including visual rarity levels, visual aesthetic scores, and CLIP-based text-image relevances. This design ensures that our proposed Diffusion-MVP is capable of minting NFT images with high visual quality and market value. To facilitate this research, we have collected the largest publicly available NFT image dataset to date, consisting of 1.5 million high-quality images with corresponding texts and market values. Extensive experiments including objective evaluations and user studies demonstrate that our framework can generate NFT images showing more visually engaging elements and higher market value, compared with SOTA approaches

arXiv.org e-Print Archive

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Author: Fu Jianlong
Guo Baining
He Huiguo
Jin Qin
Liu Bei
Ma Yiyang
Ruan Ludan
Yang Huan
Yuan Nicholas Jing
Publication venue
Publication date: 24/03/2023
Field of study

We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i.e., MM-Diffusion), with two-coupled denoising autoencoders. In contrast to existing single-modal diffusion models, MM-Diffusion consists of a sequential multi-modal U-Net for a joint denoising process by design. Two subnets for audio and video learn to gradually generate aligned audio-video pairs from Gaussian noises. To ensure semantic consistency across modalities, we propose a novel random-shift based attention block bridging over the two subnets, which enables efficient cross-modal alignment, and thus reinforces the audio-video fidelity for each other. Extensive experiments show superior results in unconditional audio-video generation, and zero-shot conditional tasks (e.g., video-to-audio). In particular, we achieve the best FVD and FAD on Landscape and AIST++ dancing datasets. Turing tests of 10k votes further demonstrate dominant preferences for our model. The code and pre-trained models can be downloaded at https://github.com/researchmm/MM-Diffusion.Comment: Accepted by CVPR 202

arXiv.org e-Print Archive

Recommended from our members

MTR4 drives liver tumorigenesis by promoting cancer metabolic switch through alternative splicing.

Author: Chen Wancheng
Dou Wenlong
Feng Bingbing
Fu Xuemei
He Yang-Fan
Ji Kai-Yuan
Jiang Lei
Kim Jinchul
Li Qingjiao
Mai Taoyi
Tang Qingshuang
Xiang Le-Yang
XU Yang
Yang Dinghua
Ying Yue
Yu Lili
Zhou Jianlong
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

The metabolic switch from oxidative phosphorylation to glycolysis is required for tumorigenesis in order to provide cancer cells with energy and substrates of biosynthesis. Therefore, it is important to elucidate mechanisms controlling the cancer metabolic switch. MTR4 is a RNA helicase associated with a nuclear exosome that plays key roles in RNA processing and surveillance. We demonstrate that MTR4 is frequently overexpressed in hepatocellular carcinoma (HCC) and is an independent diagnostic marker predicting the poor prognosis of HCC patients. MTR4 drives cancer metabolism by ensuring correct alternative splicing of pre-mRNAs of critical glycolytic genes such as GLUT1 and PKM2. c-Myc binds to the promoter of the MTR4 gene and is important for MTR4 expression in HCC cells, indicating that MTR4 is a mediator of the functions of c-Myc in cancer metabolism. These findings reveal important roles of MTR4 in the cancer metabolic switch and present MTR4 as a promising therapeutic target for treating HCC

eScholarship - University of California

Metallogenic Dynamics Background of Ga’erqiong Cu-Au Deposit in Tibet, China

Author: Huang Hanxiao
Liu Hong
Ouyang Yuan
Yang Wunian
Zhang Jianhua
Zhang Jianlong
Publication venue: Universidad Nacional de Colombia - Sede Bogotá - Facultad de Ciencias - Departamento de Geociencia
Publication date: 01/04/2017
Field of study

The Ga’erqiong Cu-Au deposit, which sits on the north side of the Coqên-Xainzamagmatite belt, is a large-scale skarn-type deposit, whose ore body has formed in the skarn zone in the contact part of quartz diorite and marble of Duoai formation or the cracks of quartz diorite. Its mineralization is closely related to quartz diorite. And granite porphyry-related molybdenum ore still exists in its deep part. Currently, there are disputes about the metallogenic dynamics background of this deposit. From previous studies, this paper carried out zircon LA-LCPMS U-Pb dating and petrogeochemistry study for quartz diorite of Ga’erqiong Cu-Au deposit. The testing result indicates: quartz diorite and granite porphyry were formed respectively in 88±2Ma and 83±1Ma, belonging to the magmatic activity of the early stage of Upper Cretaceous; quartz diorite and granite porphyry have geochemical characteristics similar to those of island arc rock of subduction zone and geochemical indexes similar to “adakite.” Combining with the regional tectonic evolution, we think that quartz diorite and granite porphyry were all formed in the extension environment after the collision of Lhasa block and Qiangtang block. Quartz diorite is the result of the migmatization of basic melt and acid melt evoked by asthenosphere material raise caused by lower crustal delamination; the formation of granite porphyry may be crust-mantle material’s partial melting results due to delaminated lower crustal. Therefore, Ga’erqiongskarn-type Cu-Au deposit belongs to the metallogenic response to the collisional orogeny in the closing process of Meso-Tethys.El yacimiento de cobre y oro Ga'erqiong, que se ubica en el lado norte del cinturón Coqên-Xainzamagmatite, es un depósito tipo skarn a gran escala cuyo cuerpo mineral se formó en la zona Skarn, en la parte de contacto del cuarzo de diorita y mármol de la formación Duoai y de las grietas de cuarzo de diorita. Su mineralización está cercanamente relacionada a los cuarzos de diorita. La mena de molidbeno granítico relacionada a los pórfidos tiene presencia en estas zonas profundas. Actualmente, se presentan varias discusiones sobre el origen de las dinámicas metalogénicas de este yacimiento. Con base en trabajos previos, este estudio determinó la edad del circón uranio-plomo con la técnica LA-ICPMS y analizó la petrogeoquímica de cuarzos de diorita para el yacimiento Ga'erqiong. Los resultados del análisis indican que los cuarzos de diorita y los graníticos pórfidos se formaron en 88±2Ma y 83±1Ma, respectivamente, y pertenecen a la actividad magmática de la edad temprana del Cretácico Superior; los cuarzos de diorita y los graníticos pórfidos tienen características geoquímicas similares a aquellas de las rocas del arco insular en la zona de subducción e índice geoquímicos similares a la "adakita". En combinación con la evolución de la tectónica regional, se concluye que los cuarzos de diorita y los graníticos pórfidos se formaron en el ambiente extensivo tras la colisión de los bloques Lhasa y Qiantang. Los cuarzos de diorita son el resultado de la migmatización de fundición básica y fundición ácida suscitada por el material elevado a la astenosfera gracias a un deslaminado menor de la corteza; la formación de los graníticos pórfidos podría ser el resultado de la fundición parcial de material en el manto de la corteza debido a un deslaminado menor en la corteza. Además, el depósito Ga'erqiong corresponde a la respuesta metalogénica de la orogénesis colisional en el proceso de cierre del Mesotetis

Universidad Nacional De Colombia - Repositorio Institucional UN

Propolis Reduces Phosphatidylcholine-Specific Phospholipase C Activity and Increases Annexin a7 Level in Oxidized-LDL-Stimulated Human Umbilical Vein Endothelial Cells

Author: Chongluo Fu
Fuliang Hu
Hongzhuan Xuan
Jianlong Yuan
Jiying Wang
Kai Wang
Zhen Li
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

To understand the mechanisms underlying the regulating dyslipidemia action of Chinese propolis and Brazilian green propolis, we investigated their effects on phosphatidylcholine-specific phospholipase C (PC-PLC) activity and annexin a7 (ANXA7) level which play crucial roles in the control of the progress of atherosclerosis. Furthermore, active oxygen species (ROS) levels, nuclear factor-KappaB p65 (NF-κB p65), and mitochondrial membrane potential (MMP) were also investigated in oxidized-LDL- (ox-LDL-) stimulated human umbilical vein endothelial cells (HUVECs). Our data indicated that the treatment of both types of propolis 12.5 μg/mL significantly increased cell viability and attenuated apoptosis rate, increased ANXA7 level, and decreased PC-PLC activity. Both types of propolis also inhibited ROS generation as well as the subsequent MMP collapse, and NF-κB p65 activation induced by ox-LDL in HUVECs. Our results also indicated that Chinese propolis and Brazilian green propolis had similar biological activities and prevented ox-LDL induced cellular dysfunction in HUVECs

Crossref

Directory of Open Access Journals

PubMed Central

The Research Value of Biphasic Registration Quantitative Computed Tomography Emphysema Index in the Evaluation of Mild to Moderate COPD

Author: Jianlong LI
Juan PAN
Min SHEN
Weiling YIN
Xiaoqi HUANG
Xionghui WANG
Youmin GUO
Yuan NIU
Publication venue: Editorial Office of Computerized Tomography Theory and Application
Publication date: 01/03/2024
Field of study

Objective: To find the optimal quantitative index of emphysema by comparing and analyzing the quantitative indexes of emphysema in patients with mild to moderate chronic obstruction pulmonary disease (COPD) via registered biphasic quantitative computed tomography (QCT). Methods: We retrospectively collected 55 healthy controls, 21 Global Initiative for Chronic Obstructive Pulmonary Disease (GOLD) 1 case, and 31 GOLD 2 cases in our hospital. We imported the CT raw DICOM data into the "Digital Lung" analysis platform and measured the LAA-950% at the end of deep inspiration and the LAA-910% at the end of deep expiration. The expiratory and inspiratory CT images were registered. Then, the percentage of emphysema area (PRMEmph%), the percentage of functional small airway disease area (PRMfSAD%), and the percentage of the normal area (PRMNormal%) were calculated according to the threshold method. Pulmonary function indicators included FVC, FEV1%, and FEV1/FVC. Differences in general data, CT quantitative indexes, and pulmonary function between groups were assessed using the independent sample t-test, Mann–Whitney U test, or chi-square test, and the correlation was analyzed using Spearman correlation. The receiver operating characteristic (ROC) curve was drawn to analyze the diagnostic performance of CT quantitative parameters for emphysema in patients with mild to moderate COPD. Results: There were significant differences in sex, smoking index, FEV1%, FEV1/FVC, inspiratory phase LAA%-950, expiratory phase LAA%-910, PRMEmph%, PRMfSAD%, and PRMNormal% between the mild to moderate COPD patients and normal control groups. The inspiratory phase LAA%-950 was negatively correlated with FEV1/FVC, the expiratory phase LAA%-910 and PRMEmph% were negatively correlated with FVC, FEV1%, and FEV1/FVC. ROC curve analysis results showed that the areas under the curve of inspiration phase LAA%-950, expiratory phase LAA%-910, and PRMEmph% were 0.742, 0.861, and 0.876, respectively. Among them, the area under the curve of the PRMEmph% index was the largest, with a corresponding critical value of 9.84%, a sensitivity of 76.90%, and a specificity of 94.50%. Conclusion: Quantitative CT emphysema index LAA%-950 in the inspiratory phase, LAA%-910 in the expiratory phase, and PRMEmph% in biphasic can objectively evaluate emphysema in patients with mild to moderate COPD, among which PRMEmph% is the best evaluation index

Directory of Open Access Journals