93,921 research outputs found

    Why People Search for Images using Web Search Engines

    Get PDF
    What are the intents or goals behind human interactions with image search engines? Knowing why people search for images is of major concern to Web image search engines because user satisfaction may vary as intent varies. Previous analyses of image search behavior have mostly been query-based, focusing on what images people search for, rather than intent-based, that is, why people search for images. To date, there is no thorough investigation of how different image search intents affect users' search behavior. In this paper, we address the following questions: (1)Why do people search for images in text-based Web image search systems? (2)How does image search behavior change with user intent? (3)Can we predict user intent effectively from interactions during the early stages of a search session? To this end, we conduct both a lab-based user study and a commercial search log analysis. We show that user intents in image search can be grouped into three classes: Explore/Learn, Entertain, and Locate/Acquire. Our lab-based user study reveals different user behavior patterns under these three intents, such as first click time, query reformulation, dwell time and mouse movement on the result page. Based on user interaction features during the early stages of an image search session, that is, before mouse scroll, we develop an intent classifier that is able to achieve promising results for classifying intents into our three intent classes. Given that all features can be obtained online and unobtrusively, the predicted intents can provide guidance for choosing ranking methods immediately after scrolling

    Fast and reliable online learning to rank for information retrieval

    Get PDF
    The amount of digital data we produce every day far surpasses our ability to process this data, and finding useful information in this constant flow of data has become one of the major challenges of the 21st century. Search engines are one way of accessing large data collections. Their algorithms have evolved far beyond simply matching search queries to sets of documents. Today’s most sophisticated search engines combine hundreds of relevance signals to provide the best possible results for each searcher. Current approaches for tuning the parameters of search engines can be highly effective. However, they typically require considerable expertise and manual effort. They rely on supervised learning to rank, meaning that they learn from manually annotated examples of relevant documents for given queries. Obtaining large quantities of sufficiently accurate manual annotations is becoming increasingly difficult, especially for personalized search, access to sensitive data, or search in settings that change over time. In this thesis, I develop new online learning to rank techniques, based on insights from reinforcement learning. In contrast to supervised approaches, these methods allow search engines to learn directly from users’ interactions. User interactions can typically be observed easily and cheaply, and reflect the preferences of real users. Interpreting user interactions and learning from them is challenging, because they can be biased and noisy. The contributions of this thesis include a novel interleaved comparison method, called probabilistic interleave, that allows unbiased comparisons of search engine result rankings, and methods for learning quickly and effectively from the resulting relative feedback. The obtained analytical and experimental results show how search engines can effectively learn from user interactions. In the future, these and similar techniques can open up new ways for gaining useful information from ever larger amounts of data

    Search Bias Quantification: Investigating Political Bias in Social Media and Web Search

    No full text
    Users frequently use search systems on the Web as well as online social media to learn about ongoing events and public opinion on personalities. Prior studies have shown that the top-ranked results returned by these search engines can shape user opinion about the topic (e.g., event or person) being searched. In case of polarizing topics like politics, where multiple competing perspectives exist, the political bias in the top search results can play a significant role in shaping public opinion towards (or away from) certain perspectives. Given the considerable impact that search bias can have on the user, we propose a generalizable search bias quantification framework that not only measures the political bias in ranked list output by the search system but also decouples the bias introduced by the different sources—input data and ranking system. We apply our framework to study the political bias in searches related to 2016 US Presidential primaries in Twitter social media search and find that both input data and ranking system matter in determining the final search output bias seen by the users. And finally, we use the framework to compare the relative bias for two popular search systems—Twitter social media search and Google web search—for queries related to politicians and political events. We end by discussing some potential solutions to signal the bias in the search results to make the users more aware of them.publishe

    Sequential Selection of Correlated Ads by POMDPs

    Full text link
    Online advertising has become a key source of revenue for both web search engines and online publishers. For them, the ability of allocating right ads to right webpages is critical because any mismatched ads would not only harm web users' satisfactions but also lower the ad income. In this paper, we study how online publishers could optimally select ads to maximize their ad incomes over time. The conventional offline, content-based matching between webpages and ads is a fine start but cannot solve the problem completely because good matching does not necessarily lead to good payoff. Moreover, with the limited display impressions, we need to balance the need of selecting ads to learn true ad payoffs (exploration) with that of allocating ads to generate high immediate payoffs based on the current belief (exploitation). In this paper, we address the problem by employing Partially observable Markov decision processes (POMDPs) and discuss how to utilize the correlation of ads to improve the efficiency of the exploration and increase ad incomes in a long run. Our mathematical derivation shows that the belief states of correlated ads can be naturally updated using a formula similar to collaborative filtering. To test our model, a real world ad dataset from a major search engine is collected and categorized. Experimenting over the data, we provide an analyse of the effect of the underlying parameters, and demonstrate that our algorithms significantly outperform other strong baselines

    DIALOG and Mead Join the Relevance Ranks

    Get PDF
    New, non-Boolean, natural language search techniques - Westlaw\u27s WIN, DIALOG\u27s TARGET, and Mead Data Central\u27s FREESTYLE - are based on the assumption that the standard command-driven online systems coupled with Boolean logic searching are not only difficult to learn, but may sometimes miss relevant documents. Although each new product works somewhat differently, all 3 offer an alternative to searching with command interfaces and Boolean/proximity operators. They offer natural language input, with no need for commands or logical operators. This input method is coupled with so-called associative or statistical retrieval techniques that provide relevance ranking of search results. The question of how relevance search systems retrieve compared to the tried-and-true Boolean search engines is explored

    Learning about computer-assisted language learning: online tools and professional development

    Get PDF
    The study reported in this chapter investigates computer-assisted language learning (CALL) practitioners' use of online tools and ways of developing their professionalism in the field of CALL. Participants in the study were members of an international association for CALL. They were invited to complete an online questionnaire on a voluntary basis. The questionnaire was employed to collect the participants' demographic information and self-reported data on the use of online tools. It also asked the participants to indicate how they keep up to date with what is happening in CALL. The results of the study indicate that the participants use web search engines, communication tools and social networking sites most frequently among twelve categorised online tools while most participants consider themselves as good or excellent users of the Internet. Many participants often read journal articles or books, read email list messages or connect with others in social networks to learn about new developments in CALL. They also regularly search the web and collect information from blog posts or email list messages. Findings contribute to our understanding of CALL practitioners' experiences with online tools and professional development activities and provide recommendations for teacher training for CALL

    More diverse, more politically varied: How social media, search engines and aggregators shape news repertoires in the United Kingdom

    Get PDF
    There is still much to learn about how the rise of new, ‘distributed’, forms of news access through search engines, social media and aggregators are shaping people’s news use. We analyse passive web tracking data from the United Kingdom to make a comparison between direct access (primarily determined by self-selection) and distributed access (determined by a combination of self-selection and algorithmic selection). We find that (1) people who use search engines, social media and aggregators for news have more diverse news repertoires. However, (2) social media, search engine and aggregator news use is also associated with repertoires where more partisan outlets feature more prominently. The findings add to the growing evidence challenging the existence of filter bubbles, and highlight alternative ways of characterizing people’s online news use. </jats:p

    “The Same Information Is Given to Everyone”: Algorithmic Awareness of Online Platforms

    Get PDF
    After years of discourse surrounding the concept of “filter bubbles,” information seekers still find themselves in echo chambers of their own thoughts and ideas. This study is an exploratory, mixed methods analysis of platform privacy/data policies and user awareness of the personal and usage data collected and user awareness of how platforms use this data to moderate and serve online content. Utilizing Bucher’s (2018) framework to research algorithms through the black box heuristic, this project learns how users inform themselves about data collection and use policies, and their awareness of algorithmic curation. The algorithmic systems that return search results or populate newsfeeds are opaque, black boxed systems. In an attempt to open the black box, this dissertation analyzes the privacy and data policies of the top three platforms by traffic in the United States – Google, YouTube, and Facebook – to first learn how they describe their data collection practices and how they explain data usage. Then a cross-sectional survey provides user perception data about what personal data is collected about them and how that data is used, based on the privacy policy analysis. The findings of this dissertation identify a need for algorithmic literacy and develop a new frame for the ACRL’s Information Literacy Framework to address algorithmic systems in information retrieval. Additionally, the findings draw attention to two subgroups of internet users – those who believe they do not use search engines and those who use only privacy-focused search engines. Both groups require additional research and demonstrate how online information retrieval is complicated through multiple points of access and unclear methods of information curation

    An Unified Search and Recommendation Foundation Model for Cold-Start Scenario

    Full text link
    In modern commercial search engines and recommendation systems, data from multiple domains is available to jointly train the multi-domain model. Traditional methods train multi-domain models in the multi-task setting, with shared parameters to learn the similarity of multiple tasks, and task-specific parameters to learn the divergence of features, labels, and sample distributions of individual tasks. With the development of large language models, LLM can extract global domain-invariant text features that serve both search and recommendation tasks. We propose a novel framework called S\&R Multi-Domain Foundation, which uses LLM to extract domain invariant features, and Aspect Gating Fusion to merge the ID feature, domain invariant text features and task-specific heterogeneous sparse features to obtain the representations of query and item. Additionally, samples from multiple search and recommendation scenarios are trained jointly with Domain Adaptive Multi-Task module to obtain the multi-domain foundation model. We apply the S\&R Multi-Domain foundation model to cold start scenarios in the pretrain-finetune manner, which achieves better performance than other SOTA transfer learning methods. The S\&R Multi-Domain Foundation model has been successfully deployed in Alipay Mobile Application's online services, such as content query recommendation and service card recommendation, etc.Comment: CIKM 2023,6 page

    A virtue epistemology of the Internet: Search engines, intellectual virtues and education

    Get PDF
    This paper applies a virtue epistemology approach to using the Internet, as to improve our information-seeking behaviours. Virtue epistemology focusses on the cognitive character of agents and is less concerned with the nature of truth and epistemic justification as compared to traditional analytic epistemology. Due to this focus on cognitive character and agency, it is a fruitful but underexplored approach to using the Internet in an epistemically desirable way. Thus, the central question in this paper is: How to use the Internet in an epistemically virtuous way? Using the work of Jason Baehr, it starts by outlining nine intellectual or epistemic virtues: curiosity, intellectual autonomy, intellectual humility, attentiveness, intellectual carefulness, intellectual thoroughness, open-mindedness, intellectual courage and intellectual tenacity. It then explores how we should deploy these virtues and avoid the corresponding vices when interacting with the Internet, particularly search engines. Whilst an epistemically virtuous use of the Internet will not guarantee that one will acquire true beliefs, understanding or even knowledge, it will strongly improve one’s information-seeking behaviours. The paper ends with arguing that teaching and assessing online intellectual virtues should be part of school and university curricula, perhaps embedded in critical thinking courses, or even better, as individual units
    corecore