83 research outputs found

    KuaiSim: A Comprehensive Simulator for Recommender Systems

    Full text link
    Reinforcement Learning (RL)-based recommender systems (RSs) have garnered considerable attention due to their ability to learn optimal recommendation policies and maximize long-term user rewards. However, deploying RL models directly in online environments and generating authentic data through A/B tests can pose challenges and require substantial resources. Simulators offer an alternative approach by providing training and evaluation environments for RS models, reducing reliance on real-world data. Existing simulators have shown promising results but also have limitations such as simplified user feedback, lacking consistency with real-world data, the challenge of simulator evaluation, and difficulties in migration and expansion across RSs. To address these challenges, we propose KuaiSim, a comprehensive user environment that provides user feedback with multi-behavior and cross-session responses. The resulting simulator can support three levels of recommendation problems: the request level list-wise recommendation task, the whole-session level sequential recommendation task, and the cross-session level retention optimization task. For each task, KuaiSim also provides evaluation protocols and baseline recommendation algorithms that further serve as benchmarks for future research. We also restructure existing competitive simulators on the KuaiRand Dataset and compare them against KuaiSim to future assess their performance and behavioral differences. Furthermore, to showcase KuaiSim's flexibility in accommodating different datasets, we demonstrate its versatility and robustness when deploying it on the ML-1m dataset

    Deep Reinforcement Learning in Recommender Systems

    Get PDF
    Recommender Systems aim to help customers find content of their interest by presenting them suggestions they are most likely to prefer. Reinforcement Learning, a Machine Learning paradigm where agents learn by interaction which actions to perform in an environment so as to maximize a reward, can be trained to give good recommendations. One of the problems when working with Reinforcement Learning algorithms is the dimensionality explosion, especially in the observation space. On the other hand, Industrial recommender systems deal with extremely large observation spaces. New Deep Reinforcement Learning algorithms can deal with this problem, but they are mainly focused on images. A new technique has been developed able to convert raw data into images, enabling DRL algorithms to be properly applied. This project addresses this line of investigation. The contributions of the project are: (1) defining a generalization of the Markov Decision Process formulation for Recommender Systems, (2) defining a way to express the observation as an image, and (3) demonstrating the use of both concepts by addressing a particular Recommender System case through Reinforcement Learning. Results show how the trained agents offer better recommendations than the arbitrary choice. However, the system does not achieve a great performance mainly due to the lack of interactions in the datase

    강화학습 기반 온라인 슬레이트 추천 시스템에서의 효율적 탐색 방법

    Get PDF
    학위논문(석사) -- 서울대학교대학원 : 데이터사이언스대학원 데이터사이언스학과, 2023. 2. 오민환.Deep reinforcement learning (RL) is a promising approach for recommender systems, of which the ultimate goal is to maximize the long-term user value. However, practical exploration strategies for real-world applications have not been addressed. We propose an efficient exploration strategy for deep RL-based recommendation, RESR. We develop a latent state learning scheme and an off-policy learning objective with randomized Q-values to foster efficient learning. Online simulation experiments conducted with synthetic and real-world data validate the effectiveness of our method.온라인 추천 시스템에서 사용자의 장기적 가치를 최대화하기 위한 방법으로 강화학습을 활용할 수 있다. 일반적인 추천 시스템과 다르게 강화학습 기반 추천 시스템은 사용자의 선택에 따른 변화를 포착하고 장기적 차원에서 사용자의 가치를 높일 수 있다. 본 논문에서는 강화학습을 실제 적용하는 과정에서 필요한 효율적인 탐색 방법을 다룬다. 우선, 강화학습 에이전트와 사용자 및 아이템으로 이루어진 추천 문제를 부분 관찰 마르코프 의사결정 과정(POMDP)을 이용한 순차적 의사결정 문제로 구성한다. 슬레이트(Slate)라고 불리는 여러 개의 아이템으로 구성된 리스트를 사용자에게 추천하는 문제를 풀고자 한다. 본 논문은 바로 관측이 어려운 사용자의 잠재 상태를 다룬다는 점에서 과거 연구의 일반화된 문제를 연구한다. 본 논문에서 제시하는 알고리듬인 RESR은 효율적인 학습을 위한 사용자의 잠재 임베딩 및 사용자 선택 모형 학습 방법과 더불어 랜덤화된 여러 개의 Q 함수를 샘플링하여 사후분포를 근사하는 방법을 활용한다. 온라인 시뮬레이션 실험에서 알고리듬의 성능을 비교·분석한 결과 제시된 방법이 탐색 효율성 측면에서 나은 성능을 보이는 것을 확인할 수 있었다.1 Introduction 1 2 Related Works 4 3 Problem Statement 6 4 Method 9 4.1 Tractable Decomposition of Action Space 9 4.2 Latent User Representation 10 4.3 User Choice Model 10 4.4 Exploration via Randomized Q-Functions 13 5 Experiments 18 5.1 Online Simulation Environment 18 5.2 User Arrival and Departure 18 5.3 Fully Simulated Recommendation 19 5.4 Simulation using the Real-World Data 20 5.5 Results 22 5.5.1 Fully Simulated Recommendation 22 5.5.2 Simulation using the Real-World Data 24 6 Conclusion 26 A Appendix 27 A.1 Notation 27 A.2 Details of the Experiment 30 A.2.1 Fully Simulated Recommendation 30 A.2.2 Simulation using the Real-World Data 34 A.3 Algorithm 37 Bibliography 38 Abstract in Korean 43석

    Políticas de Copyright de Publicações Científicas em Repositórios Institucionais: O Caso do INESC TEC

    Get PDF
    A progressiva transformação das práticas científicas, impulsionada pelo desenvolvimento das novas Tecnologias de Informação e Comunicação (TIC), têm possibilitado aumentar o acesso à informação, caminhando gradualmente para uma abertura do ciclo de pesquisa. Isto permitirá resolver a longo prazo uma adversidade que se tem colocado aos investigadores, que passa pela existência de barreiras que limitam as condições de acesso, sejam estas geográficas ou financeiras. Apesar da produção científica ser dominada, maioritariamente, por grandes editoras comerciais, estando sujeita às regras por estas impostas, o Movimento do Acesso Aberto cuja primeira declaração pública, a Declaração de Budapeste (BOAI), é de 2002, vem propor alterações significativas que beneficiam os autores e os leitores. Este Movimento vem a ganhar importância em Portugal desde 2003, com a constituição do primeiro repositório institucional a nível nacional. Os repositórios institucionais surgiram como uma ferramenta de divulgação da produção científica de uma instituição, com o intuito de permitir abrir aos resultados da investigação, quer antes da publicação e do próprio processo de arbitragem (preprint), quer depois (postprint), e, consequentemente, aumentar a visibilidade do trabalho desenvolvido por um investigador e a respetiva instituição. O estudo apresentado, que passou por uma análise das políticas de copyright das publicações científicas mais relevantes do INESC TEC, permitiu não só perceber que as editoras adotam cada vez mais políticas que possibilitam o auto-arquivo das publicações em repositórios institucionais, como também que existe todo um trabalho de sensibilização a percorrer, não só para os investigadores, como para a instituição e toda a sociedade. A produção de um conjunto de recomendações, que passam pela implementação de uma política institucional que incentive o auto-arquivo das publicações desenvolvidas no âmbito institucional no repositório, serve como mote para uma maior valorização da produção científica do INESC TEC.The progressive transformation of scientific practices, driven by the development of new Information and Communication Technologies (ICT), which made it possible to increase access to information, gradually moving towards an opening of the research cycle. This opening makes it possible to resolve, in the long term, the adversity that has been placed on researchers, which involves the existence of barriers that limit access conditions, whether geographical or financial. Although large commercial publishers predominantly dominate scientific production and subject it to the rules imposed by them, the Open Access movement whose first public declaration, the Budapest Declaration (BOAI), was in 2002, proposes significant changes that benefit the authors and the readers. This Movement has gained importance in Portugal since 2003, with the constitution of the first institutional repository at the national level. Institutional repositories have emerged as a tool for disseminating the scientific production of an institution to open the results of the research, both before publication and the preprint process and postprint, increase the visibility of work done by an investigator and his or her institution. The present study, which underwent an analysis of the copyright policies of INESC TEC most relevant scientific publications, allowed not only to realize that publishers are increasingly adopting policies that make it possible to self-archive publications in institutional repositories, all the work of raising awareness, not only for researchers but also for the institution and the whole society. The production of a set of recommendations, which go through the implementation of an institutional policy that encourages the self-archiving of the publications developed in the institutional scope in the repository, serves as a motto for a greater appreciation of the scientific production of INESC TEC

    Understanding and Improving Continuous Experimentation : From A/B Testing to Continuous Software Optimization

    Get PDF
    Controlled experiments (i.e. A/B tests) are used by many companies with user-intensive products to improve their software with user data. Some companies adopt an experiment-driven approach to software development with continuous experimentation (CE). With CE, every user-affecting software change is evaluated in an experiment and specialized roles seek out opportunities to experiment with functionality. The goal of the thesis is to describe current practice and support CE in industry. The main contributions are threefold. First, a review of the CE literature on: infrastructure and processes, the problem-solution pairs applied in industry practice, and the benefits and challenges of the practice. Second, a multi-case study with 12 companies to analyze how experimentation is used and why some companies fail to fully realize the benefits of CE. A theory for Factors Affecting Continuous Experimentation (FACE) is constructed to realize this goal. Finally, a toolkit called Constraint Oriented Multi-variate Bandit Optimization (COMBO) is developed for supporting automated experimentation with many variables simultaneously, live in a production environment.The research in the thesis is conducted under the design science paradigm using empirical research methods, with simulation experiments of tool proposals and a multi-case study on company usage of CE. Other research methods include systematic literature review and theory building.From FACE we derive three factors that explain CE utility: (1) investments in data infrastructure, (2) user problem complexity, and (3) incentive structures for experimentation. Guidelines are provided on how to strive towards state-of-the-art CE based on company factors. All three factors are relevant for companies wanting to use CE, in particular, for those companies wanting to apply algorithms such as those in COMBO to support personalization of software to users' context in a process of continuous optimization

    Memory Models for Incremental Learning Architectures

    Get PDF
    Losing V. Memory Models for Incremental Learning Architectures. Bielefeld: Universität Bielefeld; 2019.Technological advancement leads constantly to an exponential growth of generated data in basically every domain, drastically increasing the burden of data storage and maintenance. Most of the data is instantaneously extracted and available in form of endless streams that contain the most current information. Machine learning methods constitute one fundamental way of processing such data in an automatic way, as they generate models that capture the processes behind the data. They are omnipresent in our everyday life as their applications include personalized advertising, recommendations, fraud detection, surveillance, credit ratings, high-speed trading and smart-home devices. Thereby, batch learning, denoting the offline construction of a static model based on large datasets, is the predominant scheme. However, it is increasingly unfit to deal with the accumulating masses of data in given time and in particularly its static nature cannot handle changing patterns. In contrast, incremental learning constitutes one attractive alternative that is a very natural fit for the current demands. Its dynamic adaptation allows continuous processing of data streams, without the necessity to store all data from the past, and results in always up-to-date models, even able to perform in non-stationary environments. In this thesis, we will tackle crucial research questions in the domain of incremental learning by contributing new algorithms or significantly extending existing ones. Thereby, we consider stationary and non-stationary environments and present multiple real-world applications that showcase merits of the methods as well as their versatility. The main contributions are the following: One novel approach that addresses the question of how to extend a model for prototype-based algorithms based on cost minimization. We propose local split-time prediction for incremental decision trees to mitigate the trade-off between adaptation speed versus model complexity and run time. An extensive survey of the strengths and weaknesses of state-of-the-art methods that provides guidance for choosing a suitable algorithm for a given task. One new approach to extract valuable information about the type of change in a dataset. We contribute a biologically inspired architecture, able to handle different types of drift using dedicated memories that are kept consistent. Application of the novel methods within three diverse real-world tasks, highlighting their robustness and versatility. Investigation of personalized online models in the context of two real-world applications
    corecore