2 research outputs found
ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning
In this paper, we introduce a framework ARBEx, a novel attentive feature
extraction framework driven by Vision Transformer with reliability balancing to
cope against poor class distributions, bias, and uncertainty in the facial
expression learning (FEL) task. We reinforce several data pre-processing and
refinement methods along with a window-based cross-attention ViT to squeeze the
best of the data. We also employ learnable anchor points in the embedding space
with label distributions and multi-head self-attention mechanism to optimize
performance against weak predictions with reliability balancing, which is a
strategy that leverages anchor points, attention scores, and confidence values
to enhance the resilience of label predictions. To ensure correct label
classification and improve the models' discriminative power, we introduce
anchor loss, which encourages large margins between anchor points.
Additionally, the multi-head self-attention mechanism, which is also trainable,
plays an integral role in identifying accurate labels. This approach provides
critical elements for improving the reliability of predictions and has a
substantial positive effect on final prediction capabilities. Our adaptive
model can be integrated with any deep neural network to forestall challenges in
various recognition tasks. Our strategy outperforms current state-of-the-art
methodologies, according to extensive experiments conducted in a variety of
contexts.Comment: 10 pages, 7 figures. Code: https://github.com/takihasan/ARBE
Uncovering local aggregated air quality index with smartphone captured images leveraging efficient deep convolutional neural network
The prevalence and mobility of smartphones make these a widely used tool for
environmental health research. However, their potential for determining
aggregated air quality index (AQI) based on PM2.5 concentration in specific
locations remains largely unexplored in the existing literature. In this paper,
we thoroughly examine the challenges associated with predicting
location-specific PM2.5 concentration using images taken with smartphone
cameras. The focus of our study is on Dhaka, the capital of Bangladesh, due to
its significant air pollution levels and the large population exposed to it.
Our research involves the development of a Deep Convolutional Neural Network
(DCNN), which we train using over a thousand outdoor images taken and
annotated. These photos are captured at various locations in Dhaka, and their
labels are based on PM2.5 concentration data obtained from the local US
consulate, calculated using the NowCast algorithm. Through supervised learning,
our model establishes a correlation index during training, enhancing its
ability to function as a Picture-based Predictor of PM2.5 Concentration (PPPC).
This enables the algorithm to calculate an equivalent daily averaged AQI index
from a smartphone image. Unlike, popular overly parameterized models, our model
shows resource efficiency since it uses fewer parameters. Furthermore, test
results indicate that our model outperforms popular models like ViT and INN, as
well as popular CNN-based models such as VGG19, ResNet50, and MobileNetV2, in
predicting location-specific PM2.5 concentration. Our dataset is the first
publicly available collection that includes atmospheric images and
corresponding PM2.5 measurements from Dhaka. Our code and dataset will be made
public when publishing the paper.Comment: 18 pages, 7 figures, submitted to Nature Scientific Report