522 research outputs found

    Building A Personalized Tourist Attraction Recommender System Using Crowdsourcing (Demonstration)

    Get PDF
    ABSTRACT We demonstrate how crowdsourcing can be used to automatically build a personalized tourist attraction recommender system, which tailors recommendations to specific individuals, so different people who use the system each get their own list of recommendations, appropriate to their own traits. Recommender systems crucially depend on the availability of reliable and large scale data that allows predicting how a new individual is likely to rate items from the catalog of possible items to recommend. We show how to automate the process of generating this data using crowdsourcing, so that such a system can be built even when such a dataset is not initially available. We first find possible tourist attractions to recommend by scraping such information from Wikipedia. Next, we use crowdsourced workers to filter the data, then provide their opinions regarding these items. Finally, we use machine learning methods to predict how new individuals are likely to rate each attraction, and recommend the items with the highest predicted ratings

    Netflix and Forget: Efficient and Exact Machine Unlearning from Bi-linear Recommendations

    Full text link
    People break up, miscarry, and lose loved ones. Their online streaming and shopping recommendations, however, do not necessarily update, and may serve as unhappy reminders of their loss. When users want to renege on their past actions, they expect the recommender platforms to erase selective data at the model level. Ideally, given any specified user history, the recommender can unwind or "forget", as if the record was not part of training. To that end, this paper focuses on simple but widely deployed bi-linear models for recommendations based on matrix completion. Without incurring the cost of re-training, and without degrading the model unnecessarily, we develop Unlearn-ALS by making a few key modifications to the fine-tuning procedure under Alternating Least Squares optimisation, thus applicable to any bi-linear models regardless of the training procedure. We show that Unlearn-ALS is consistent with retraining without \emph{any} model degradation and exhibits rapid convergence, making it suitable for a large class of existing recommenders.Comment: 8 pages, 8 figure

    NBPMF: Novel Network-Based Inference Methods for Peptide Mass Fingerprinting

    Get PDF
    Proteins are large, complex molecules that perform a vast array of functions in every living cell. A proteome is a set of proteins produced in an organism, and proteomics is the large-scale study of proteomes. Several high-throughput technologies have been developed in proteomics, where the most commonly applied are mass spectrometry (MS) based approaches. MS is an analytical technique for determining the composition of a sample. Recently it has become a primary tool for protein identification, quantification, and post translational modification (PTM) characterization in proteomics research. There are usually two different ways to identify proteins: top-down and bottom-up. Top-down approaches are based on subjecting intact protein ions and large fragment ions to tandem MS directly, while bottom-up methods are based on mass spectrometric analysis of peptides derived from proteolytic digestion, usually with trypsin. In bottom-up techniques, peptide mass fingerprinting (PMF) is widely used to identify proteins from MS dataset. Conventional PMF representatives such as probabilistic MOWSE algorithm, is based on mass distribution of tryptic peptides. In this thesis, we developed a novel network-based inference software termed NBPMF. By analyzing peptide-protein bipartite network, we designed new peptide protein matching score functions. We present two methods: the static one, ProbS, is based on an independent probability framework; and the dynamic one, HeatS, depicts input dataset as dependent peptides. Moreover, we use linear regression to adjust the matching score according to the masses of proteins. In addition, we consider the order of retention time to further correct the score function. In the post processing, we design two algorithms: assignment of peaks, and protein filtration. The former restricts that a peak can only be assigned to one peptide in order to reduce random matches; and the latter assumes each peak can only be assigned to one protein. In the result validation, we propose two new target-decoy search strategies to estimate the false discovery rate (FDR). The experiments on simulated, authentic, and simulated authentic dataset demonstrate that our NBPMF approaches lead to significantly improved performance compared to several state-of-the-art methods

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    Recommendations in Academic Social Media: the shaping of scholarly communication through algorithmic mediation

    Get PDF
    Scholarly communication is increasingly being mediated by Academic Social Media (ASM) platforms, which combine the functions of a scientifi c repository with social media features such as personal profi les, followers and comments. In ASM, algorithmic mediation is responsible for fi ltering the content and distributing it in personalised individual feeds and recommendations according to inferred relevance to users. However, if communication among researchers is intertwined with these platforms, in what ways may the recommendation algorithms in ASM shape scholarly communication? Scientifi c literature has been investigating how content is mediated in data-driven environments ranging from social media platforms to specifi c apps, whereas algorithmic mediation in scientifi c environments remains neglected. This thesis starts from the premise that ASM platforms are sociocultural artefacts embedded in a mutually shaping relationship with research practices and economic, political and social arrangements. Therefore, implications of algorithmic mediation can be studied through the artefact itself, peoples’ practices and the social/political/ economic arrangements that aff ect and are aff ected by such interactions. Most studies on ASM focus on one of these elements at a time, either examining design elements or the users’ behaviour on and perceptions about such platforms. In this thesis, a multifaceted approach is taken to analyse the artefact as well as the practices and arrangements traversed by algorithmic mediation. Chapter 1 reviews the literature about ASM platforms, and explains the history of algorithmic recommendations, starting from the fi rst Information Retrieval systems to current Recommender Systems, highlighting the use of diff erent data sources and techniques. The chapter also presents the mediation framework and how it applies to ASM platforms, before outlining the thesis. The rest of the thesis is divided in two parts. Part I focuses on how recommender systems in ASM shape what users can see and how users interact with and through the platform. Part II investigates how, in turn, researchers make sense of their online interactions within ASM. The end of Chapter 1 shows the methodological choices for each following chapter. Part I presents a case study of one of the most popular ASM platforms in which a walkthrough method was conducted in four steps (interface analysis, web code inspection, patent analysis and company inquiry using the General Data Protection Regulation (GDPR)). In Chapter 2 it is shown that almost all the content in ASM platforms are algorithmically mediated through mechanisms of profi ling, information selection and commodifi cation. It is also discussed how the company avoids explaining the workings of recommender systems and the mutually shaping characteristic of ASM platforms. Chapter 3 explores the distortions and biases that ASM platforms can uphold. Results show how profi ling, datafi cation and prioritization have the potential to foster homogeneity bias, discrimination, the Matthew eff ect of cumulative advantage in science and other distortions. Part II consists of two empirical studies involving participants from diff erent countries in interviews (n=11) and a research game (n=13). Chapter 4 presents the interviews combined with the show and tell technique. The results show the participant’s perceptions on ASM aff ordances, that revolve around six main themes: (1) getting access to relevant content; (2) reaching out to other scholars; (3) algorithmic impact on exposure to content; (4) to see and to be seen; (5) blurred boundaries of potential ethical or legal infringements, and (6) the more I give, the more I get. We argue that algorithmic mediation not only constructs a narration of the self, but also a narration of the relevant other in ASM platforms, confi guring an image of the relevant other that is both participatory and productive. Chapter 5 presents the design process of a research game and the results of the empirical sessions, where participants were observed while playing the game. There are two outcomes for the study. First, the human values researchers relate to algorithmic features in ASM, the most prominent being stimulation, universalism and self-direction. Second, the role of the researcher’s approach (collaborative, competitive or ambivalent) in academic tasks, showing the consequential choices people make regarding algo- rithmic features and the motivations behind those choices. The results led to four archetypal profi les: (1) the collaborative reader; (2) the competitive writer; (3) the collaborative disseminator; and (4) the ambivalent evaluator. The fi nal chapter summarises the ways in which ASM platforms forges people’s perceptions and the strategies people employ to use the systems in benefi t of their careers, answering each research question. Chapter 6 discusses the implications of algorithmic mediation for scholarly communication and science in general. The dissertation ends with refl ections on human agency in data-driven environments, the role of algorithmic inferences in science and the challenge of reconciling individual user’s needs with broader goals of the scientifi c community. By doing so, the contribution of this thesis is twofold, (1) providing in-depth knowledge about the ASM artefact, and (2) unfolding diff erent aspects of the human perspective in dealing with algorithmic mediation in ASM. Both perspectives are discussed in light of social arrangements that are mutually shaped by artefact and practices.A comunicação acadêmica é cada vez mais mediada por plataformas de Mídia Social Acadêmica (MSA), que combinam as funções de um repositório científi co com recursos de mídia social, como perfi s pessoais, seguidores e comentários. Nas MSA, a mediação algorítmica é responsável por fi ltrar o conteúdo e distribuí-lo em feeds e recomendações individuais personalizados de acordo com a relevância inferida para os usuários. No entanto, se a comunicação entre pesquisadores está entrelaçada com essas plataformas, de que forma os algoritmos de recomendação nas MSA podem moldar a comunicação acadêmica? A literatura científi ca vem investigando como o conteúdo é mediado em ambientes orientados por dados, desde plataformas de mídia social até aplicativos específi cos, enquanto a mediação algorítmica em ambientes científi cos permanece negligenciada. Esta tese parte da premissa de que as plataformas de MSA são artefatos socioculturais inseridos em uma relação mutuamente modeladora com práticas de pesquisa e arranjos econômicos, políticos e sociais. Portanto, as implicações da mediação algorítmica podem ser estudadas através do próprio artefato, das práticas humanas e dos arranjos sociais/políticos/ econômicos que afetam e são afetados por tais interações. A maioria dos estudos sobre MSA se concentra em um desses elementos de cada vez, seja examinando elementos de design ou o comportamento e percepções dos usuários sobre essas plataformas. Nesta tese, uma abordagem multifacetada é feita para analisar o artefato, bem como as práticas e arranjos atravessados pela mediação algorítmica. O Capítulo 1 revisa a literatura sobre plataformas de MSA e explica a história das recomendações algorítmicas, desde os primeiros sistemas de Recuperação de Informação até os atuais Sistemas de Recomendação, destacando o uso de diferentes fontes de dados e técnicas. O capítulo também apresenta o quadro teórico (mediation framework) e como ele se aplica às plataformas MSA, antes de delinear a estrutura da tese. O restante da tese está dividido em duas partes. A Parte I se concentra em como os sistemas de recomendação nas MSA moldam o que os usuários podem ver e como os usuários interagem com e na plataforma. A Parte II, por sua vez, investiga como os pesquisadores dão sentido às suas interações online dentro das MSA. O fi nal do Capítulo 1 mostra as opções metodológicas para cada capítulo seguinte. A Parte I apresenta um estudo de caso de uma das plataformas de MSA mais populares em que o walkthrough method foi realizado em quatro etapas (análise de interface, inspeção de código web, análise de patente e consulta à empresa usando o General Data Protection Regulation (GDPR)). No Capítulo 2 é mostrado que quase todo o conteúdo das plataformas ASM é mediado por algoritmos por meio de mecanismos de perfi - lamento, seleção de informações e mercantilização. Também é discutido como a empresa evita explicar o funcionamento dos sistemas de recomendação e a característica de modelagem mútua das plataformas de MSA. O Capítulo 3 explora as distorções e vieses que as plataformas de MSA podem sustentar. Os resultados mostram como o perfi lamento, a datifi cação e a priorização de conteúdo têm o potencial de promover viés de homogeneidade, discriminação o efeito Mateus de vantagem cumulativa na ciência e outras distorções. A Parte II consiste em dois estudos empíricos envolvendo participantes de diferentes países em entrevistas (n=11) e um jogo de pesquisa (n=13). O capítulo 4 apresenta as entrevistas combinadas com a técnica show and tell. Os resultados mostram as percepções dos participantes sobre as aff ordances das MSA, que giram em torno de seis temas principais: (1) ter acesso a conteúdos relevantes; (2) acesso a outros pesquisadores; (3) impacto algorítmico na exposição ao conteúdo; (4) ver e ser visto; (5) limites difusos de potenciais infrações éticas ou legais e (6) quanto mais eu dou, mais eu recebo. Argumentamos que a mediação algorítmica não apenas constrói uma narração do eu, mas também uma narração do outro nas plataformas de MSA, confi gurando uma imagem do outro ao mesmo tempo participativa e produtiva. O capítulo 5 apresenta o processo de design de um jogo de pesquisa e os resultados das sessões empíricas, onde os participantes foram observados enquanto jogavam o jogo. Há dois resultados para o estudo. Primeiro, quais valores humanos os pesquisadores relacionam com recursos algorítmicos nas MSA, sendo os mais proeminentes o estímulo, o universalismo e o autodirecionamento. Em segundo lugar, o papel da abordagem do pesquisador (colaborativa, competitiva ou ambivalente) em tarefas acadêmicas, mostrando as escolhas consequentes que as pessoas fazem em relação aos recursos algorítmicos e as motivações por trás dessas escolhas. Os resultados levaram a quatro perfi s arquetípicos: (1) o leitor colaborativo; (2) o escritor competitivo; (3) o divulgador colaborativo; e (4) o avaliador ambivalente. O capítulo fi nal (Capítulo 6) resume as maneiras pelas quais as plataformas de MSA forjam as percepções das pessoas e as estratégias que as pessoas empregam para usar os sistemas em benefício de suas carreiras, respondendo a cada questão de pesquisa. O capítulo discute ainda as implicações da mediação algorítmica para a comunicação acadêmica e a ciência em geral. A dissertação termina com refl exões sobre a agência humana em ambientes orientados por dados, o papel das inferências algorítmicas na ciência e o desafi o de conciliar as necessidades individuais do usuário com os objetivos mais amplos da comunidade científi ca. Ao fazê-lo, a contribuição desta tese é dupla, (1) fornecendo conhecimento aprofundado sobre o artefato plataformas de MSA, e (2) desdobrando diferentes aspectos da perspectiva humana ao lidar com mediação algorítmica em ASM. Ambas as perspectivas são discutidas à luz de arranjos sociais que são mutuamente moldados por artefatos e práticas

    Enhancing the museum experience with a sustainable solution based on contextual information obtained from an on-line analysis of users’ behaviour

    Get PDF
    Human computer interaction has evolved in the last years in order to enhance users’ experiences and provide more intuitive and usable systems. A major leap through in this scenario is obtained by embedding, in the physical environment, sensors capable of detecting and processing users’ context (position, pose, gaze, ...). Feeded by the so collected information flows, user interface paradigms may shift from stereotyped gestures on physical devices, to more direct and intuitive ones that reduce the semantic gap between the action and the corresponding system reaction or even anticipate the user’s needs, thus limiting the overall learning effort and increasing user satisfaction. In order to make this process effective, the context of the user (i.e. where s/he is, what is s/he doing, who s/he is, what are her/his preferences and also actual perception and needs) must be properly understood. While collecting data on some aspects can be easy, interpreting them all in a meaningful way in order to improve the overall user experience is much harder. This is more evident when we consider informal learning environments like museums, i.e. places that are designed to elicit visitor response towards the artifacts on display and the cultural themes proposed. In such a situation, in fact, the system should adapt to the attention paid by the user choosing the appropriate content for the user’s purposes, presenting an intuitive interface to navigate it. My research goal is focused on collecting, in a simple,unobtrusive, and sustainable way, contextual information about the visitors with the purpose of creating more engaging and personalized experiences

    Advances in privacy-preserving machine learning

    Get PDF
    Building useful predictive models often involves learning from personal data. For instance, companies use customer data to target advertisements, online education platforms collect student data to recommend content and improve user engagement, and medical researchers fit diagnostic models to patient data. A recent line of research aims to design learning algorithms that provide rigorous privacy guarantees for user data, in the sense that their outputs---models or predictions---leak as little information as possible about individuals in the training data. The goal of this dissertation is to design private learning algorithms with performance comparable to the best possible non-private ones. We quantify privacy using \emph{differential privacy}, a well-studied privacy notion that limits how much information is leaked about an individual by the output of an algorithm. Training a model using a differentially private algorithm prevents an adversary from confidently determining whether a specific person's data was used for training the model. We begin by presenting a technique for practical differentially private convex optimization that can leverage any off-the-shelf optimizer as a black box. We also perform an extensive empirical evaluation of the state-of-the-art algorithms on a range of publicly available datasets, as well as in an industry application. Next, we present a learning algorithm that outputs a private classifier when given black-box access to a non-private learner and a limited amount of unlabeled public data. We prove that the accuracy guarantee of our private algorithm in the PAC model of learning is comparable to that of the underlying non-private learner. Such a guarantee is not possible, in general, without public data. Lastly, we consider building recommendation systems, which we model using matrix completion. We present the first algorithm for matrix completion with provable user-level privacy and accuracy guarantees. Our algorithm consistently outperforms the state-of-the-art private algorithms on a suite of datasets. Along the way, we give an optimal algorithm for differentially private singular vector computation which leads to significant savings in terms of space and time when operating on sparse matrices. It can also be used for private low-rank approximation
    corecore