    Towards the Evaluation of Recommender Systems with Impressions

    In Recommender Systems, impressions are a relatively new type of information that records all products previously shown to the users. They are also a complex source of information, combining the effects of the recommender system that generated them, search results, or business rules that may select specific products for recommendations. The fact that the user interacted with a specific item given a list of recommended ones may benefit from a richer interaction signal, in which some items the user did not interact with may be considered negative interactions. This work presents a preliminary evaluation of recommendation models with impressions. First, impressions are characterized by describing their assumptions, signals, and challenges. Then, an evaluation study with impressions is described. The study's goal is two-fold: to measure the effects of impressions data on properly-tuned recommendation models using current open-source datasets and disentangle the signals within impressions data. Preliminary results suggest that impressions data and signals are nuanced, complex, and effective at improving the recommendation quality of recommenders. This work publishes the source code, datasets, and scripts used in the evaluation to promote reproducibility in the domain

    Impression-Aware Recommender Systems

    Novel data sources bring new opportunities to improve the quality of recommender systems. Impressions are a novel data source containing past recommendations (shown items) and traditional interactions. Researchers may use impressions to refine user preferences and overcome the current limitations in recommender systems research. The relevance and interest of impressions have increased over the years; hence, the need for a review of relevant work on this type of recommenders. We present a systematic literature review on recommender systems using impressions, focusing on three fundamental angles in research: recommenders, datasets, and evaluation methodologies. We provide three categorizations of papers describing recommenders using impressions, present each reviewed paper in detail, describe datasets with impressions, and analyze the existing evaluation methodologies. Lastly, we present open questions and future directions of interest, highlighting aspects missing in the literature that can be addressed in future works.Comment: 34 pages, 103 references, 6 tables, 2 figures, ACM UNDER REVIE

    Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education

    This report documents the program and the outcomes of Dagstuhl Seminar 23031 ``Frontiers of Information Access Experimentation for Research and Education'', which brought together 37 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) and specifically focused on developing more responsible experimental practices leading to more valid results, both for research as well as for scientific education. The seminar brought together experts from various sub-fields of information access, namely IR, RS, NLP, information science, and human-computer interaction to create a joint understanding of the problems and challenges presented by next generation information access systems, from both the research and the experimentation point of views, to discuss existing solutions and impediments, and to propose next steps to be pursued in the area in order to improve not also our research methods and findings but also the education of the new generation of researchers and developers. The seminar featured a series of long and short talks delivered by participants, who helped in setting a common ground and in letting emerge topics of interest to be explored as the main output of the seminar. This led to the definition of five groups which investigated challenges, opportunities, and next steps in the following areas: reality check, i.e. conducting real-world studies, human-machine-collaborative relevance judgment frameworks, overcoming methodological challenges in information retrieval and recommender systems through awareness and education, results-blind reviewing, and guidance for authors.Comment: Dagstuhl Seminar 23031, report

    Decision-making style, nicotine and caffeine use and dependence

    Rationale As therapeutic interventions are being developed utilising telehealth and mobile phones, it is important to understand how substance-dependent individuals will respond to offers of online assistance. Objectives The present paper considered the following: (1) how decision-making style is associated with use and dependence upon commonly used stimulants and (2) how it influences behavioural responses to electronic offers of further information about these drugs. Method An online survey examined patterns of nicotine and caffeine use, administered Severity of Dependence Scales for caffeine and nicotine and assessed decision-making style using the Melbourne Decision Making Questionnaire and mood using the Kessler Distress Scale. Upon completing these scales, the 181 participants with a mean age of 28.14 years were offered further information online. Results Stimulant dependence was associated with psychological distress. Caffeine dependence was linked to hypervigilance (panic). Decisional self-esteem varied with stimulant dependence and Kessler Distress Scale score. Participants with high decisional self-esteem declined electronic offers of further information. Conclusion Confidence rather than defensive avoidance was a factor in reducing information-seeking behaviours on the Internet

    Introducing CARONTE: a Crawler for Adversarial Resources Over Non Trusted Environments

    The monitoring of underground criminal activities is often automated to maximize the data collection and to train ML models to automatically adapt data collection tools to different communities. On the other hand, sophisticated adversaries may adopt crawling-detection capabilities that may significantly jeopardize researchers' opportunities to perform the data collection, for example by putting their accounts under the spotlight and being expelled from the community. This is particularly undesirable in prominent and high-profile criminal communities where entry costs are significant (either monetarily or for example for background checking or other trust-building mechanisms). This work presents CARONTE, a tool to semi-automatically learn virtually any forum structure for parsing and data-extraction, while maintaining a low profile for the data collection and avoiding the requirement of collecting massive datasets to maintain tool scalability. We showcase CARONTE against four underground forum communities, and show that from the adversary's perspective CARONTE maintains a profile similar to humans, whereas state-of-the-art crawling tools show clearly distinct and easy to detect patterns of automated activity

    Ensembles of choice-based models for recommender systems

    In this thesis, we focused on three main paradigms: Recommender Systems, Decision Making, and Ensembles. The work is structured as follows. First, the thesis analyzes the potential of choice-based models. The motivation behind this was based on the idea of applying sound decisionmaking paradigms, such as choice and utility theory, in the field of Recommender Systems. Second, this research analyzes the cognitive process underlying choice behavior. On the one hand, neural and gaze activity were recorded experimentally from different subjects performing a choice task in a Web Interface. On the other hand, cognitive were fitted using rational, emotional, and attentional features. Finally, the work explores the hybridization of choice-based models with ensembles. The goal is to take the best of the two worlds: transparency and performance. Two main methods were analyzed to build optimal choice-based ensembles: uninformed and informed. First one, two strategies were evaluated: 1-Learner and N-Learners ensembles. Second one, we relied on three types of prior information: (1) High diversity, (2) Low error prediction (MSE), (3) and Low crowd error

    Mediating chance encounters through opportunistic social matching

    Chance encounters, the unintended meeting between people unfamiliar with each other, serve as an important social lubricant helping people to create new social ties, such as making new friends or finding an activity, study or collaboration partner. Unfortunately, social barriers often prevent chance encounters in environments where people do not know each other and people have to rely on serendipity to meet or be introduced to interesting people around them. Little is known about the underlying dynamics of chance encounters and how systems could utilize contextual data to mediate chance encounters. This dissertation addresses this gap in research literature by exploring the design space of opportunistic social matching systems that aim to introduce relevant people to each other in the opportune moment and the opportune place in order to encourage face-to-face interaction. A theoretical framework of relational, social and personal context as predictors of encounter opportunities is proposed and validated through a mixed method approach using interviews, experience sampling and a field study of a design prototype. Key contributions of the field interview study (n=58) include novel context-aware social matching concepts such as: sociability of others as an indicator of opportune social context; activity involvement as an indicator of opportune personal context; and contextual rarity as an indicator of opportune relational context. The following study combining Experience Sampling Method (ESM) and participant interviews extends prior research on social matching by providing an empirical foundation for the design of opportunistic social matching systems. A generalized linear mixed model analysis (n=1781) shows that personal context (mood and busyness) together with the sociability of others nearby are the strongest predictors of people’s interest in a social match. Interview findings provide novel approaches on how to operationalize relational context based on social network rarity and discoverable rarity. Moreover, insights from this study highlight that additional meta-information about user interests is needed to operationalize relational context, such as users’ passion level for an interest and their skill levels for an activity. Based on these findings, the novel design concept of passive context-awareness for social matching is put forward. In the last study, Encount’r, an instantiation of an opportunistic social matching system, is designed and evaluated through a field study and participant interviews. A large-scale user profiling survey provides baseline rarity measures to operationalize relational context using rarity, passion levels, skills, needs, and offers. Findings show that attribute type, computed attribute rarity, self-reported passion levels for interest, and response time are associated with people’s interest in a match opportunity. Moreover, this study extends prior work by showing how the concept of passive context-awareness for opportunistic social matching is promising. Collectively, contributions of this work include a theoretical framework encompassing relational, social, and personal context; new innovative concepts to operationalize each of these aspects for opportunistic social matching; and field-tested design affordances for opportunistic social matching systems. This is important because opportunistic social matching systems can lead to new social ties and improved social capital

    Learning Explainable User Sentiment and Preferences for Information Filtering

    In the last decade, online social networks have enabled people to interact in many ways with each other and with content. The digital traces of such actions reveal people's preferences towards online content such as news or products. These traces often result from interactions such as sharing or liking, but also from interactions in natural language. The continuous growth of the amount of content and of digital traces has led to information overload: surrounded by large volumes of information, people are facing difficulties when searching for information relevant to their interests. To improve user experience, information systems must be able to assist users in achieving their search goals, effectively and efficiently. This thesis is concerned with two important challenges that information systems need to address in order to significantly improve search experience and overcome information overload. First, these systems need to model accurately the variety of user traces, and second, they need to meaningfully explain search results and recommendations to users. To address these challenges, this thesis proposes novel methods based on machine learning to model user sentiment and preferences for information filtering systems, which are effective, scalable, and easily interpretable by humans. We focus on two prominent types of user traces in social networks: on the one hand, user comments accompanied by unary preferences such as likes, and on the other hand, user reviews accompanied by numerical preferences such as star ratings. In both cases, we advocate that by better understanding user text through mining its semantics and modeling its structure, we can not only improve information filtering, but also explain predictions to users. Within this context, we aim to answer three main research questions, namely: (i)~how do item semantics help to predict unary preferences; (ii)~how do sentiments of free-form user texts help to predict unary preferences; and (iii)~how to model fine-grained numerical preferences from user review texts. Our goal is to model and extract from user text the knowledge required to answer these questions, and to obtain insights on how to design better information filtering systems that are more effective and improve user experience. To answer the first question, we formulate the recommendation problem based on unary preferences as a top-N retrieval task and we define an appropriate dataset and metrics for measuring performance. Then, we propose and evaluate several content-based methods based on semantic similarities under presence or absence of preferences. To answer the second question, we propose a sentiment-aware neighborhood model which integrates the sentiment of user comments with unary preferences, either through fixed or through learned mapping functions. For the latter type, we propose a learning algorithm which adapts the sentiment of user comments to unary preferences at collective or individual levels. To answer the third question, we cast the problem of modeling user attitude toward aspects of items as a weakly supervised problem, and we propose a weighted multiple-instance learning method for solving it. Lastly, we show that the learned saliency weights, apart from being easily interpretable, are useful indicators for review segmentation and summarization

    Policy Implications of User-Generated Data Network Effects

    User-generated data (UGD) network effects are an exciting and novel economic force. They upset conventional market competition dynamics, and they lead to the formation of dominant data platforms with market power that spans different and seemingly unrelated markets. This article explains that UGD network effects are a blessing and a curse. They provide dominant data platforms with the opportunity to generate welfare-enhancing efficiencies as well as welfare-reducing anticompetitive harms. After exploring the economic opportunities and social threats, this article explores the implications of UGD network effects on competition policy. Drawing on traditional network effects theory, this article proposes and critically examines a host of remedial approaches for policymakers to consider. These remedies include modernized public utility-style regulation, open access policies, and adjusted standards for anti-monopolization and merger scrutiny
