Smart home voice assistants enable users to conveniently interact with IoT
devices and perform Internet searches; however, they also collect the voice
input that can carry sensitive personal information about users. Previous
papers investigated how information inferred from the contents of users' voice
commands are shared or leaked for tracking and advertising purposes. In this
paper, we systematically evaluate how voice itself is used for user profiling
in the Google ecosystem. To do so, we simulate various user personas by
engaging with specific categories of websites. We then use \textit{neutral
voice commands}, which we define as voice commands that neither reveal personal
interests nor require Google smart speakers to use the search APIs, to interact
with these speakers. We also explore the effects of the non-neutral voice
commands for user profiling. Notably, we employ voices that typically would not
match the predefined personas. We then iteratively improve our experiments
based on observations of profile changes to better simulate real-world user
interactions with smart speakers. We find that Google uses these voice
recordings for user profiling, and in some cases, up to 5 out of the 8
categories reported by Google for customizing advertisements are altered
following the collection of the voice commands.Comment: 11 pages, 1 figure, 7 table