15 research outputs found

    Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards

    Full text link
    Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from given text prompts. However, previous methods fail to perform accurate modality alignment between text concepts and generated images due to the lack of fine-level semantic guidance that successfully diagnoses the modality discrepancy. In this paper, we propose FineRewards to improve the alignment between text and images in text-to-image diffusion models by introducing two new fine-grained semantic rewards: the caption reward and the Semantic Segment Anything (SAM) reward. From the global semantic view, the caption reward generates a corresponding detailed caption that depicts all important contents in the synthetic image via a BLIP-2 model and then calculates the reward score by measuring the similarity between the generated caption and the given prompt. From the local semantic view, the SAM reward segments the generated images into local parts with category labels, and scores the segmented parts by measuring the likelihood of each category appearing in the prompted scene via a large language model, i.e., Vicuna-7B. Additionally, we adopt an assemble reward-ranked learning strategy to enable the integration of multiple reward functions to jointly guide the model training. Adapting results of text-to-image models on the MS-COCO benchmark show that the proposed semantic reward outperforms other baseline reward functions with a considerable margin on both visual quality and semantic similarity with the input prompt. Moreover, by adopting the assemble reward-ranked learning strategy, we further demonstrate that model performance is further improved when adapting under the unifying of the proposed semantic reward with the current image rewards

    3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation

    Full text link
    Text-guided 3D object generation aims to generate 3D objects described by user-defined captions, which paves a flexible way to visualize what we imagined. Although some works have been devoted to solving this challenging task, these works either utilize some explicit 3D representations (e.g., mesh), which lack texture and require post-processing for rendering photo-realistic views; or require individual time-consuming optimization for every single case. Here, we make the first attempt to achieve generic text-guided cross-category 3D object generation via a new 3D-TOGO model, which integrates a text-to-views generation module and a views-to-3D generation module. The text-to-views generation module is designed to generate different views of the target 3D object given an input caption. prior-guidance, caption-guidance and view contrastive learning are proposed for achieving better view-consistency and caption similarity. Meanwhile, a pixelNeRF model is adopted for the views-to-3D generation module to obtain the implicit 3D neural representation from the previously-generated views. Our 3D-TOGO model generates 3D objects in the form of the neural radiance field with good texture and requires no time-cost optimization for every single caption. Besides, 3D-TOGO can control the category, color and shape of generated 3D objects with the input caption. Extensive experiments on the largest 3D object dataset (i.e., ABO) are conducted to verify that 3D-TOGO can better generate high-quality 3D objects according to the input captions across 98 different categories, in terms of PSNR, SSIM, LPIPS and CLIP-score, compared with text-NeRF and Dreamfields

    Subsynchronous control interaction analysis between PMSG-based offshore wind farm and SVGs

    No full text
    With the rapid development of wind energy, the subsynchronous oscillations (SSO) involved with wind conversion systems, has been the focus of academic concern. To analyse the subsynchronous control interaction (SSCI) issue of multi-converters (wind-power converter and widespread SVGs), the state matrix of a typical PMSGs-based offshore wind farm system including SVGs is constructed . It is proposed that SVGs in wind farm may cause SSCI issue by their uncoordinated control targets. Under certain conditions, the SSCI between SVGs forms the dominant oscillation mode, delivered with variation of the SSO frequency. By calculating the influence factors along with different converters, it can be concluded that unreasonable parameter configuration of converters and variations of operating conditions affect the SSCI characteristics. At last, time domain simulation is used to verify the SSCI mechanism and characteristics of the PMSG-based offshore wind farm and SVGs

    Multi-view registration of unordered range scans by fast correspondence propagation of multi-scale descriptors.

    No full text
    This paper proposes a global approach for the multi-view registration of unordered range scans. Our method starts with the pair-wise registration, where multi-scale descriptor is selected for feature point and the propagation of feature correspondence is accordingly accelerated. Subsequently, we design an effective rule to judge the reliability of these pair-wise registration results. According to the judgment of reliability, we propose a model fusion method, which can utilize reliable results of pair-wise registration to augment the model shape. Finally, multi-view registration can be achieved by operating the pair-wise registration, reliability judgment, and model fusion alternately. The proposed approach can be applied to scene reconstruction and robot mapping. Experimental results conducted on public datasets show that the proposed approach can automatically achieve multi-view registration of unordered range scans. Compared with other related approaches, the proposed approach has superior performances in accuracy and effectiveness

    Study on circuit breaker TRV issues of UHV high series compensation lines

    No full text
    Under the influence of series compensation (SC) capacitance and rated voltage rise, when ultra high voltage (UHV) high SC line occurs inner fault, the short-circuit current through the SC line is greater and the residual voltage across the SC is higher, which cause the electromagnetic transient process excited by high amplitude current and voltage to be more serious in the event of fault occurrence and removal. The security of circuit breaker (CB) equipment is threatened, which make it need further research on the characteristics and put forward some effective solutions. In this study, the characteristics of macroscopic statistical distribution of peak value via short-circuit current of transient recovery voltage (TRV) across CB fracture during switching after short-circuit current zero crossing for UHV high SC line are put forward, and also of the microscopic characterization parameter of waveform rise rate and peak value time. It shows that the TRV switching capability of existing UHV CBs can not cover all the fault scenarios of high SC line. Solutions including comprehensive comparison of TRV optimal suppression measures and increase of the UHV CB switching expected TRV test requirements are put forward, which provide feasible technical scheme for CBs reliable switching of UHV high SC line

    Erratum to: 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation

    No full text
    The Original Article was published on 26 June 2023. &nbsp

    Inferring decelerated land subsidence and groundwater storage dynamics in Tianjin–Langfang using Sentinel-1 InSAR

    No full text
    To meet the growing demand for socioeconomic development, a large amount of groundwater is extracted from confined aquifers worldwide. The North China Plain has experienced considerable groundwater depletion and subsidence during the past six decades. In this study, we use Sentinel-1A/B SAR images from 2015 to 2020 to map the ground subsidence of the Tianjin–Langfang area. Three subsiding zones centered at Guangyang, Wuqing–Bazhou, and Jinghai are identified with maximum subsidence rates of 98.1, 121.8, and 104.7 mm/yr. Seasonal and long-term signals are separated from time series subsidence and hydraulic measurements using continuous wavelet transform to retrieve aquifer parameters. The long-term subsidence, which fits well with an exponential decaying model, remarkably slows down in our study area. The elastic skeletal storage coefficients range between 0.52×10−3 and 9.66×10−3. We then retrieve the spatial–temporal variations of total groundwater storage, recoverable groundwater storage, and irreversible groundwater storage. Groundwater storage depletion rates are apparently reducing, which benefits from the operation of the South-to-North Water Transfer Project and local groundwater management practices
    corecore