Computer-based scene understanding has influenced fields ranging from urban
planning to autonomous vehicle performance, yet little is known about how well
these technologies work across social differences. We investigate the biases of
deep convolutional neural networks (dCNNs) in scene classification, using
nearly one million images from global and US sources, including user-submitted
home photographs and Airbnb listings. We applied statistical models to quantify
the impact of socioeconomic indicators such as family income, Human Development
Index (HDI), and demographic factors from public data sources (CIA and US
Census) on dCNN performance. Our analyses revealed significant socioeconomic
bias, where pretrained dCNNs demonstrated lower classification accuracy, lower
classification confidence, and a higher tendency to assign labels that could be
offensive when applied to homes (e.g., "ruin", "slum"), especially in images
from homes with lower socioeconomic status (SES). This trend is consistent
across two datasets of international images and within the diverse economic and
racial landscapes of the United States. This research contributes to
understanding biases in computer vision, emphasizing the need for more
inclusive and representative training datasets. By mitigating the bias in the
computer vision pipelines, we can ensure fairer and more equitable outcomes for
applied computer vision, including home valuation and smart home security
systems. There is urgency in addressing these biases, which can significantly
impact critical decisions in urban development and resource allocation. Our
findings also motivate the development of AI systems that better understand and
serve diverse communities, moving towards technology that equitably benefits
all sectors of society.Comment: 20 pages, 3 figures, 3 table