226 research outputs found

    You can't see what you can't see: Experimental evidence for how much relevant information may be missed due to Google's Web search personalisation

    Full text link
    The influence of Web search personalisation on professional knowledge work is an understudied area. Here we investigate how public sector officials self-assess their dependency on the Google Web search engine, whether they are aware of the potential impact of algorithmic biases on their ability to retrieve all relevant information, and how much relevant information may actually be missed due to Web search personalisation. We find that the majority of participants in our experimental study are neither aware that there is a potential problem nor do they have a strategy to mitigate the risk of missing relevant information when performing online searches. Most significantly, we provide empirical evidence that up to 20% of relevant information may be missed due to Web search personalisation. This work has significant implications for Web research by public sector professionals, who should be provided with training about the potential algorithmic biases that may affect their judgments and decision making, as well as clear guidelines how to minimise the risk of missing relevant information.Comment: paper submitted to the 11th Intl. Conf. on Social Informatics; revision corrects error in interpretation of parameter Psi/p in RBO resulting from discrepancy between the documentation of the implementation in R (https://rdrr.io/bioc/gespeR/man/rbo.html) and the original definition (https://dl.acm.org/citation.cfm?id=1852106) as per 20/05/201

    Auditing the representation of migrants in image web search results

    Get PDF
    Search engines serve as information gatekeepers on a multitude of topics that are prone to gender, ethnicity, and race misrepresentations. In this paper, we specifically look at the image search representation of migrant population groups that are often subjected to discrimination and biased representation in mainstream media, increasingly so with the rise of right-wing populist actors in the Western countries. Using multiple (n = 200) virtual agents to simulate human browsing behavior in a controlled environment, we collect image search results related to various terms referring to migrants (e.g., expats, immigrants, and refugees, seven queries in English and German used in total) from the six most popular search engines. Then, with the aid of manual coding, we investigate which features are used to represent these groups and whether the representations are subjected to bias. Our findings indicate that search engines reproduce ethnic and gender biases common for mainstream media representations of different subgroups of migrant population. For instance, migrant representations tend to be highly racialized, and female migrants as well as migrants at work tend to be underrepresented in the results. Our findings highlight the need for further algorithmic impact auditing studies in the context of representation of potentially vulnerable groups in web search results

    Scaling up search engine audits: Practical insights for algorithm auditing

    Get PDF
    Algorithm audits have increased in recent years due to a growing need to independently assess the performance of automatically curated services that process, filter and rank the large and dynamic amount of information available on the Internet. Among several methodologies to perform such audits, virtual agents stand out because they offer the ability to perform systematic experiments, simulating human behaviour without the associated costs of recruiting participants. Motivated by the importance of research transparency and replicability of results, this article focuses on the challenges of such an approach. It provides methodological details, recommendations, lessons learned and limitations based on our experience of setting up experiments for eight search engines (including main, news, image and video sections) with hundreds of virtual agents placed in different regions. We demonstrate the successful performance of our research infrastructure across multiple data collections, with diverse experimental designs, and point to different changes and strategies that improve the quality of the method. We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time, and we hope that this article can serve as a basis for further research in this area

    Evaluating Web Search Engines Results for Personalization and User Tracking

    Get PDF
    Recently, light has been shed on the trend of personalization, which comes into play whenever different search results are being tailored for a group of users who have issued the same search query. The unpalatable fact that myriads of search results are being manipulated has perturbed a horde of people. With regards to that, personalization can be instrumental in spurring the Filter Bubble effects, which revolves around the inability of certain users to gain access to the typified contents that are allegedly irrelevant per the search engine's algorithm. In harmony with that, there is a wealth of research on this area. Each of these has relied on using techniques revolving around creating Google accounts that differ in one feature and issuing identical search queries from each account. The search results are often compared to determine whether those results are going to vary per account. Thereupon, we have conducted six experiments that aim to closely inspect and spot the patterns of personalization in search results. In a like manner, we are going to examine how the search results are going to vary accordingly. In all of the tasks, three different metrics are going to be measured, namely, the number of total hits, the first hit, and the correlation between hits. Those experiments are centered around fulfilling the following tasks. Firstly, setting up four VPNs that are located at different geographic locations and comparing the search results with those obtained in the UAE. Secondly, performing the search while logging in and out of a Google account. Thirdly, searching while connecting to different networks: home, phone, and university networks. Fourthly, using different search engines to issue the search queries. Fifthly, using different web browsers to carry out the search process. Finally, creating and training six Google accounts

    "Foreign beauties want to meet you": The sexualization of women in Google's organic and sponsored text search results

    Get PDF
    Search engines serve as information gatekeepers on a multitude of topics dealing with different aspects of society. However, the ways search engines filter and rank information are prone to biases related to gender, ethnicity, and race. In this article, we conduct a systematic algorithm audit to examine how one specific form of bias, namely, sexualization, is manifested in Google’s text search results about different national and gender groups. We find evidence of the sexualization of women, particularly those from the Global South and East, in search outputs in both organic and sponsored search results. Our findings contribute to research on the sexualization of people in different forms of media, bias in web search, and algorithm auditing as well as have important implications for the ongoing debates about the responsibility of transnational tech companies for preventing systems they design from amplifying discrimination

    "It is just a flu": {A}ssessing the Effect of Watch History on {YouTube}'s Pseudoscientific Video Recommendations

    Get PDF
    YouTube has revolutionized the way people discover and consume videos, becoming one of the primary news sources for Internet users. Since content on YouTube is generated by its users, the platform is particularly vulnerable to misinformative and conspiratorial videos. Even worse, the role played by YouTube's recommendation algorithm in unwittingly promoting questionable content is not well understood, and could potentially make the problem even worse. This can have dire real-world consequences, especially when pseudoscientific content is promoted to users at critical times, e.g., during the COVID-19 pandemic. In this paper, we set out to characterize and detect pseudoscientific misinformation on YouTube. We collect 6.6K videos related to COVID-19, the flat earth theory, the anti-vaccination, and anti-mask movements; using crowdsourcing, we annotate them as pseudoscience, legitimate science, or irrelevant. We then train a deep learning classifier to detect pseudoscientific videos with an accuracy of 76.1%. Next, we quantify user exposure to this content on various parts of the platform (i.e., a user's homepage, recommended videos while watching a specific video, or search results) and how this exposure changes based on the user's watch history. We find that YouTube's recommendation algorithm is more aggressive in suggesting pseudoscientific content when users are searching for specific topics, while these recommendations are less common on a user's homepage or when actively watching pseudoscientific videos. Finally, we shed light on how a user's watch history substantially affects the type of recommended videos
    • …
    corecore