5 research outputs found
Show, Prefer and Tell: Incorporating User Preferences into Image Captioning
Image Captioning (IC) is the task of generating natural language descriptions for images. Models encode the image using a convolutional neural network (CNN) and generate the caption via a recurrent model or a multi-modal transformer. Success is measured by the similarity between generated captions and human-written “ground-truth” captions, using the CIDEr [14], SPICE [1] and METEOR [2] metrics. While incremental gains have been made on these metrics, there is a lack of focus on end-user opinions on the amount of content in captions. Studies with blind and low-vision participants have found that lack of detail is a problem [6, 13, 17], and that the preferred amount of content varies between individuals [13], as do individual opinions on the trade-off between correctness and adding additional content with lower confidence [9]. We propose a more user-centered approach with an adjustable amount of content based on the number of regions to describe
Recommended from our members
The Care Work of Access
Current approaches to AI and Assistive Technology (AT) often foreground task completion over other encounters such as expressions of care. Our paper challenges and complements such task-completion approaches by attending to the care work of access—the continual affective and emotional adjustments that people make by noticing and attending to one another. We explore how this work impacts encounters among people with and without vision impairments who complete tasks together. We find that bound up in attempts to get things done are concerns for one another and how well people are doing together. Reading this work through emerging disability studies and feminist STS scholarship, we account for two important forms of work that give rise to access: (1) mundane attunements and (2) noninnocent authorizations. Together these processes work as sensitizing concepts to help HCI scholars account for the ways that intelligent ATs both produce access while sometimes subverting people with disabilities