278 research outputs found
Environmental Justice in Greater Los Angeles: Impacts of Spatial and Ethnic Factors on Residents' Socioeconomic and Health Status
Environmental justice advocates that all people are protected from disproportionate impacts of environmental hazards. Despite this ideal aspiration, social and environmental inequalities exist throughout greater Los Angeles. Previous research has identified and mapped pollutant levels, demographic information, and the population’s socioeconomic status and health issues. Nevertheless, the complex interrelationships between these factors remain unclear. To close this knowledge gap, we first measured the spatial centrality using sDNA software. These data were then integrated with other socioeconomic and health data collected from CalEnvironScreen, with census tract as the unit of analysis. Finally, structural equation modeling (SEM) was executed to explore direct, indirect, and total effects among variables. The results show that the White population tends to reside in the more segregated areas and lives closer to green space, contributing to higher housing stability, financial security, and more education attainment. In contrast, people of color, especially Latinx, experience the opposite of the environmental benefits. Spatial centrality exhibits a significant indirect effect on environmental justice by influencing ethnicity composition and pollution levels. Moreover, green space accessibility significantly influences environmental justice via pollution. These findings can assist decision-makers to create a more inclusive society and curtail social segregation for all individuals
Turning a CLIP Model into a Scene Text Detector
The recent large-scale Contrastive Language-Image Pretraining (CLIP) model
has shown great potential in various downstream tasks via leveraging the
pretrained vision and language knowledge. Scene text, which contains rich
textual and visual information, has an inherent connection with a model like
CLIP. Recently, pretraining approaches based on vision language models have
made effective progresses in the field of text detection. In contrast to these
works, this paper proposes a new method, termed TCM, focusing on Turning the
CLIP Model directly for text detection without pretraining process. We
demonstrate the advantages of the proposed TCM as follows: (1) The underlying
principle of our framework can be applied to improve existing scene text
detector. (2) It facilitates the few-shot training capability of existing
methods, e.g., by using 10% of labeled data, we significantly improve the
performance of the baseline method with an average of 22% in terms of the
F-measure on 4 benchmarks. (3) By turning the CLIP model into existing scene
text detection methods, we further achieve promising domain adaptation ability.
The code will be publicly released at https://github.com/wenwenyu/TCM.Comment: CVPR202
To Explain or Not to Explain: A Study on the Necessity of Explanations for Autonomous Vehicles
Explainable AI, in the context of autonomous systems, like self driving cars,
has drawn broad interests from researchers. Recent studies have found that
providing explanations for an autonomous vehicle actions has many benefits,
e.g., increase trust and acceptance, but put little emphasis on when an
explanation is needed and how the content of explanation changes with context.
In this work, we investigate which scenarios people need explanations and how
the critical degree of explanation shifts with situations and driver types.
Through a user experiment, we ask participants to evaluate how necessary an
explanation is and measure the impact on their trust in the self driving cars
in different contexts. We also present a self driving explanation dataset with
first person explanations and associated measure of the necessity for 1103
video clips, augmenting the Berkeley Deep Drive Attention dataset.
Additionally, we propose a learning based model that predicts how necessary an
explanation for a given situation in real time, using camera data inputs. Our
research reveals that driver types and context dictates whether or not an
explanation is necessary and what is helpful for improved interaction and
understanding.Comment: 9.5 pages, 7 figures, submitted to UIST202
Looking and Listening: Audio Guided Text Recognition
Text recognition in the wild is a long-standing problem in computer vision.
Driven by end-to-end deep learning, recent studies suggest vision and language
processing are effective for scene text recognition. Yet, solving edit errors
such as add, delete, or replace is still the main challenge for existing
approaches. In fact, the content of the text and its audio are naturally
corresponding to each other, i.e., a single character error may result in a
clear different pronunciation. In this paper, we propose the AudioOCR, a simple
yet effective probabilistic audio decoder for mel spectrogram sequence
prediction to guide the scene text recognition, which only participates in the
training phase and brings no extra cost during the inference stage. The
underlying principle of AudioOCR can be easily applied to the existing
approaches. Experiments using 7 previous scene text recognition methods on 12
existing regular, irregular, and occluded benchmarks demonstrate our proposed
method can bring consistent improvement. More importantly, through our
experimentation, we show that AudioOCR possesses a generalizability that
extends to more challenging scenarios, including recognizing non-English text,
out-of-vocabulary words, and text with various accents. Code will be available
at https://github.com/wenwenyu/AudioOCR
ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation
Multimodal recommendation aims to model user and item representations
comprehensively with the involvement of multimedia content for effective
recommendations. Existing research has shown that it is beneficial for
recommendation performance to combine (user- and item-) ID embeddings with
multimodal salient features, indicating the value of IDs. However, there is a
lack of a thorough analysis of the ID embeddings in terms of feature semantics
in the literature. In this paper, we revisit the value of ID embeddings for
multimodal recommendation and conduct a thorough study regarding its semantics,
which we recognize as subtle features of content and structures. Then, we
propose a novel recommendation model by incorporating ID embeddings to enhance
the semantic features of both content and structures. Specifically, we put
forward a hierarchical attention mechanism to incorporate ID embeddings in
modality fusing, coupled with contrastive learning, to enhance content
representations. Meanwhile, we propose a lightweight graph convolutional
network for each modality to amalgamate neighborhood and ID embeddings for
improving structural representations. Finally, the content and structure
representations are combined to form the ultimate item embedding for
recommendation. Extensive experiments on three real-world datasets (Baby,
Sports, and Clothing) demonstrate the superiority of our method over
state-of-the-art multimodal recommendation methods and the effectiveness of
fine-grained ID embeddings
Consecutive Insulator-Metal-Insulator Phase Transitions of Vanadium Dioxide by Hydrogen Doping
We report modulation of a reversible phase transition in VO2 films by
hydrogen doping. A metallic phase and a new insulating phase are successively
observed at room temperature as the doping concentration increases. It is
suggested that the polarized charges from doped hydrogens play an important
role. These charges gradually occupy V3d-O2p hybridized orbitals and
consequently modulate the filling of the VO2 crystal conduction band-edge
states, which eventually evolve into new valence band-edge states. This
demonstrates the exceptional sensitivity of VO2 electronic properties to
electron concentration and orbital occupancy, providing key information for the
phase transition mechanism.Comment: 16 pages, 4 figure
- …