147 research outputs found

    Motion-state Alignment for Video Semantic Segmentation

    Full text link
    In recent years, video semantic segmentation has made great progress with advanced deep neural networks. However, there still exist two main challenges \ie, information inconsistency and computation cost. To deal with the two difficulties, we propose a novel motion-state alignment framework for video semantic segmentation to keep both motion and state consistency. In the framework, we first construct a motion alignment branch armed with an efficient decoupled transformer to capture dynamic semantics, guaranteeing region-level temporal consistency. Then, a state alignment branch composed of a stage transformer is designed to enrich feature spaces for the current frame to extract static semantics and achieve pixel-level state consistency. Next, by a semantic assignment mechanism, the region descriptor of each semantic category is gained from dynamic semantics and linked with pixel descriptors from static semantics. Benefiting from the alignment of these two kinds of effective information, the proposed method picks up dynamic and static semantics in a targeted way, so that video semantic regions are consistently segmented to obtain precise locations with low computational complexity. Extensive experiments on Cityscapes and CamVid datasets show that the proposed approach outperforms state-of-the-art methods and validates the effectiveness of the motion-state alignment framework.Comment: Accepted by CVPR Workshops 202

    Perceive, Excavate and Purify: A Novel Object Mining Framework for Instance Segmentation

    Full text link
    Recently, instance segmentation has made great progress with the rapid development of deep neural networks. However, there still exist two main challenges including discovering indistinguishable objects and modeling the relationship between instances. To deal with these difficulties, we propose a novel object mining framework for instance segmentation. In this framework, we first introduce the semantics perceiving subnetwork to capture pixels that may belong to an obvious instance from the bottom up. Then, we propose an object excavating mechanism to discover indistinguishable objects. In the mechanism, preliminary perceived semantics are regarded as original instances with classifications and locations, and then indistinguishable objects around these original instances are mined, which ensures that hard objects are fully excavated. Next, an instance purifying strategy is put forward to model the relationship between instances, which pulls the similar instances close and pushes away different instances to keep intra-instance similarity and inter-instance discrimination. In this manner, the same objects are combined as the one instance and different objects are distinguished as independent instances. Extensive experiments on the COCO dataset show that the proposed approach outperforms state-of-the-art methods, which validates the effectiveness of the proposed object mining framework.Comment: Accepted by CVPR Workshops 202

    Text2Street: Controllable Text-to-image Generation for Street Views

    Full text link
    Text-to-image generation has made remarkable progress with the emergence of diffusion models. However, it is still a difficult task to generate images for street views based on text, mainly because the road topology of street scenes is complex, the traffic status is diverse and the weather condition is various, which makes conventional text-to-image models difficult to deal with. To address these challenges, we propose a novel controllable text-to-image framework, named \textbf{Text2Street}. In the framework, we first introduce the lane-aware road topology generator, which achieves text-to-map generation with the accurate road structure and lane lines armed with the counting adapter, realizing the controllable road topology generation. Then, the position-based object layout generator is proposed to obtain text-to-layout generation through an object-level bounding box diffusion strategy, realizing the controllable traffic object layout generation. Finally, the multiple control image generator is designed to integrate the road topology, object layout and weather description to realize controllable street-view image generation. Extensive experiments show that the proposed approach achieves controllable street-view text-to-image generation and validates the effectiveness of the Text2Street framework for street views

    Estimation of Asian Dust Aerosol Effect on Cloud Radiation Forcing Using Fu-Liou Radiative Model and CERES Measurements

    Get PDF
    The impact of Asian dust on cloud radiative forcing during 2003-2006 is studied by using the Earth's Radiant Energy Budget Scanner (CERES) data and the Fu-Liou radiative transfer model. Analysis of satellite data shows that the dust aerosol significantly reduced the cloud cooling effect at TOA. In dust contaminated cloudy regions, the 4-year mean values of the instantaneous shortwave, longwave and net cloud radiative forcing are -138.9, 69.1, and -69.7 Wm(sup -2), which are 57.0, 74.2, and 46.3%, respectively, of the corresponding values in more pristine cloudy regions. The satellite-retrieved cloud properties are significantly different in the dusty regions and can influence the radiative forcing indirectly. The contributions to the cloud radiation forcing by the dust direct, indirect and semi-direct effects are estimated using combined satellite observations and Fu-Liou model simulation. The 4-year mean value of combination of indirect and semi-direct shortwave radiative forcing (SWRF) is 82.2 Wm(sup -2), which is 78.4% of the total dust effect. The direct effect is only 22.7 Wm(sup -2), which is 21.6% of the total effect. Because both first and second indirect effects enhance cloud cooling, the aerosol-induced cloud warming is mainly the result of the semi-direct effect of dust
    corecore