Input multimodality combining speech and hand gestures has motivated numerous
usability studies. Contrastingly, issues relating to the design and ergonomic
evaluation of multimodal output messages combining speech with visual
modalities have not yet been addressed extensively. The experimental study
presented here addresses one of these issues. Its aim is to assess the actual
efficiency and usability of oral system messages including brief spatial
information for helping users to locate objects on crowded displays rapidly.
Target presentation mode, scene spatial structure and task difficulty were
chosen as independent variables. Two conditions were defined: the visual target
presentation mode (VP condition) and the multimodal target presentation mode
(MP condition). Each participant carried out two blocks of visual search tasks
(120 tasks per block, and one block per condition). Scene target presentation
mode, scene structure and task difficulty were found to be significant factors.
Multimodal target presentation proved to be more efficient than visual target
presentation. In addition, participants expressed very positive judgments on
multimodal target presentations which were preferred to visual presentations by
a majority of participants. Besides, the contribution of spatial messages to
visual search speed and accuracy was influenced by scene spatial structure and
task difficulty: (i) messages improved search efficiency to a lesser extent for
2D array layouts than for some other symmetrical layouts, although the use of
2D arrays for displaying pictures is currently prevailing; (ii) message
usefulness increased with task difficulty. Most of these results are
statistically significant.Comment: 4 page