From Speech to Data: Unraveling Google's Use of Voice Data for User
  Profiling

Chen, Sirui; Ma, Xinhang

From Speech to Data: Unraveling Google's Use of Voice Data for User Profiling

Authors: Sirui Chen
Xinhang Ma
Publication date: 3 March 2024
Publisher

Abstract

Smart home voice assistants enable users to conveniently interact with IoT devices and perform Internet searches; however, they also collect the voice input that can carry sensitive personal information about users. Previous papers investigated how information inferred from the contents of users' voice commands are shared or leaked for tracking and advertising purposes. In this paper, we systematically evaluate how voice itself is used for user profiling in the Google ecosystem. To do so, we simulate various user personas by engaging with specific categories of websites. We then use \textit{neutral voice commands}, which we define as voice commands that neither reveal personal interests nor require Google smart speakers to use the search APIs, to interact with these speakers. We also explore the effects of the non-neutral voice commands for user profiling. Notably, we employ voices that typically would not match the predefined personas. We then iteratively improve our experiments based on observations of profile changes to better simulate real-world user interactions with smart speakers. We find that Google uses these voice recordings for user profiling, and in some cases, up to 5 out of the 8 categories reported by Google for customizing advertisements are altered following the collection of the voice commands.Comment: 11 pages, 1 figure, 7 table

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2403.05586

Last time updated on 27/09/2024