147 research outputs found
Motion-state Alignment for Video Semantic Segmentation
In recent years, video semantic segmentation has made great progress with
advanced deep neural networks. However, there still exist two main challenges
\ie, information inconsistency and computation cost. To deal with the two
difficulties, we propose a novel motion-state alignment framework for video
semantic segmentation to keep both motion and state consistency. In the
framework, we first construct a motion alignment branch armed with an efficient
decoupled transformer to capture dynamic semantics, guaranteeing region-level
temporal consistency. Then, a state alignment branch composed of a stage
transformer is designed to enrich feature spaces for the current frame to
extract static semantics and achieve pixel-level state consistency. Next, by a
semantic assignment mechanism, the region descriptor of each semantic category
is gained from dynamic semantics and linked with pixel descriptors from static
semantics. Benefiting from the alignment of these two kinds of effective
information, the proposed method picks up dynamic and static semantics in a
targeted way, so that video semantic regions are consistently segmented to
obtain precise locations with low computational complexity. Extensive
experiments on Cityscapes and CamVid datasets show that the proposed approach
outperforms state-of-the-art methods and validates the effectiveness of the
motion-state alignment framework.Comment: Accepted by CVPR Workshops 202
Perceive, Excavate and Purify: A Novel Object Mining Framework for Instance Segmentation
Recently, instance segmentation has made great progress with the rapid
development of deep neural networks. However, there still exist two main
challenges including discovering indistinguishable objects and modeling the
relationship between instances. To deal with these difficulties, we propose a
novel object mining framework for instance segmentation. In this framework, we
first introduce the semantics perceiving subnetwork to capture pixels that may
belong to an obvious instance from the bottom up. Then, we propose an object
excavating mechanism to discover indistinguishable objects. In the mechanism,
preliminary perceived semantics are regarded as original instances with
classifications and locations, and then indistinguishable objects around these
original instances are mined, which ensures that hard objects are fully
excavated. Next, an instance purifying strategy is put forward to model the
relationship between instances, which pulls the similar instances close and
pushes away different instances to keep intra-instance similarity and
inter-instance discrimination. In this manner, the same objects are combined as
the one instance and different objects are distinguished as independent
instances. Extensive experiments on the COCO dataset show that the proposed
approach outperforms state-of-the-art methods, which validates the
effectiveness of the proposed object mining framework.Comment: Accepted by CVPR Workshops 202
Text2Street: Controllable Text-to-image Generation for Street Views
Text-to-image generation has made remarkable progress with the emergence of
diffusion models. However, it is still a difficult task to generate images for
street views based on text, mainly because the road topology of street scenes
is complex, the traffic status is diverse and the weather condition is various,
which makes conventional text-to-image models difficult to deal with. To
address these challenges, we propose a novel controllable text-to-image
framework, named \textbf{Text2Street}. In the framework, we first introduce the
lane-aware road topology generator, which achieves text-to-map generation with
the accurate road structure and lane lines armed with the counting adapter,
realizing the controllable road topology generation. Then, the position-based
object layout generator is proposed to obtain text-to-layout generation through
an object-level bounding box diffusion strategy, realizing the controllable
traffic object layout generation. Finally, the multiple control image generator
is designed to integrate the road topology, object layout and weather
description to realize controllable street-view image generation. Extensive
experiments show that the proposed approach achieves controllable street-view
text-to-image generation and validates the effectiveness of the Text2Street
framework for street views
Estimation of Asian Dust Aerosol Effect on Cloud Radiation Forcing Using Fu-Liou Radiative Model and CERES Measurements
The impact of Asian dust on cloud radiative forcing during 2003-2006 is studied by using the Earth's Radiant Energy Budget Scanner (CERES) data and the Fu-Liou radiative transfer model. Analysis of satellite data shows that the dust aerosol significantly reduced the cloud cooling effect at TOA. In dust contaminated cloudy regions, the 4-year mean values of the instantaneous shortwave, longwave and net cloud radiative forcing are -138.9, 69.1, and -69.7 Wm(sup -2), which are 57.0, 74.2, and 46.3%, respectively, of the corresponding values in more pristine cloudy regions. The satellite-retrieved cloud properties are significantly different in the dusty regions and can influence the radiative forcing indirectly. The contributions to the cloud radiation forcing by the dust direct, indirect and semi-direct effects are estimated using combined satellite observations and Fu-Liou model simulation. The 4-year mean value of combination of indirect and semi-direct shortwave radiative forcing (SWRF) is 82.2 Wm(sup -2), which is 78.4% of the total dust effect. The direct effect is only 22.7 Wm(sup -2), which is 21.6% of the total effect. Because both first and second indirect effects enhance cloud cooling, the aerosol-induced cloud warming is mainly the result of the semi-direct effect of dust
- …