156 research outputs found
Deep Learning based Recommender System: A Survey and New Perspectives
With the ever-growing volume of online information, recommender systems have
been an effective strategy to overcome such information overload. The utility
of recommender systems cannot be overstated, given its widespread adoption in
many web applications, along with its potential impact to ameliorate many
problems related to over-choice. In recent years, deep learning has garnered
considerable interest in many research fields such as computer vision and
natural language processing, owing not only to stellar performance but also the
attractive property of learning feature representations from scratch. The
influence of deep learning is also pervasive, recently demonstrating its
effectiveness when applied to information retrieval and recommender systems
research. Evidently, the field of deep learning in recommender system is
flourishing. This article aims to provide a comprehensive review of recent
research efforts on deep learning based recommender systems. More concretely,
we provide and devise a taxonomy of deep learning based recommendation models,
along with providing a comprehensive summary of the state-of-the-art. Finally,
we expand on current trends and provide new perspectives pertaining to this new
exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys.
https://doi.acm.org/10.1145/328502
Machine learning applications for censored data
The amount of data being gathered has increased tremendously as many aspects of our lives are becoming increasingly digital. Data alone is not useful, because the ultimate goal is to use the data to obtain new insights and create new applications. The largest challenge of computer science has been the largest on the algorithmic front: how can we create machines that help us do useful things with the data? To address this challenge, the field of data science has emerged as the systematic and interdisciplinary study of how knowledge can be extracted from both structed and unstructured data sets. Machine learning is a subfield of data science, where the task of building predictive models from data has been automated by a general learning algorithm and high prediction accuracy is the primary goal.
Many practical problems can be formulated as questions and there is often data that describes the problem. The solution therefore seems simple: formulate a data set of inputs and outputs, and then apply machine learning to these examples in order to learn to predict the outputs. However, many practical problems are such that the correct outputs are not available because it takes years to collect them. For example, if one wants to predict the total amount of money spent by different customers, in principle one has to wait until all customers have decided to stop buying to add all of the purchases together to get the answers. We say that the data is ’censored’; the correct answers are only partially available because we cannot wait potentially years to collect a data set of historical inputs and outputs.
This thesis presents new applications of machine learning to censored data sets, with the goal of answering the most relevant question in each application. These applications include digital marketing, peer-to-peer lending, unemployment, and game recommendation. Our solution takes into account the censoring in the data set, where previous applications have obtained biased results or used older data sets where censoring is not a problem. The solution is based on a three stage process that combines a mathematical description of the problem with machine learning: 1) deconstruct the problem as pairwise data, 2) apply machine learning to predict the missing pairs, 3) reconstruct the correct answer from these pairs. The abstract solution is similar in all domains, but the specific machine learning model and the pairwise description of the problem depends on the application.Kerätyn datan määrä on kasvanut kun digitalisoituminen on edennyt. Itse data ei kuitenkaan ole arvokasta, vaan tavoitteena on käyttää dataa tiedon hankkimiseen ja uusissa sovelluksissa. Suurin haaste onkin menetelmäkehityksessä: miten voidaan kehittää koneita jotka osaavat käyttää dataa hyödyksi? Monien alojen yhtymäkohtaa onkin kutsuttu Datatieteeksi (Data Science). Sen tavoitteena on ymmärtää, miten tietoa voidaan systemaattisesti saada sekä strukturoiduista että strukturoimattomista datajoukoista. Koneoppiminen voidaan nähdä osana datatiedettä, kun tavoitteena on rakentaa ennustavia malleja automaattisesti datasta ns. yleiseen oppimisalgoritmiin perustuen ja menetelmän fokus on ennustustarkkuudessa.
Monet käytännön ongelmat voidaan muotoilla kysymyksinä, jota kuvaamaan on kerätty dataa. Ratkaisu vaikuttaakin koneoppimisen kannalta helpolta: määritellään datajoukko syötteitä ja oikeita vastauksia, ja kun koneoppimista sovelletaan tähän datajoukkoon niin vastaus opitaan ennustamaan. Monissa käytännön ongelmissa oikeaa vastausta ei kuitenkaan ole täysin saatavilla, koska datan kerääminen voi kestää vuosia. Jos esimerkiksi halutaan ennustaa miten paljon rahaa eri asiakkaat kuluttavat elinkaarensa aikana, täytyisi periaatteessa odottaa kunnes yrityksen kaikki asiakkaat lopettavat ostosten tekemisen jotta nämä voidaan laskea yhteen lopullisen vastauksen saamiseksi. Kutsumme tämänkaltaista datajoukkoa ’sensuroiduksi’; oikeat vastaukset on havaittu vain osittain koska esimerkkien kerääminen syötteistä ja oikeista vastauksista voi kestää vuosia.
Tämä väitös esittelee koneoppimisen uusia sovelluksia sensuroituihin datajoukkoihin, ja tavoitteena on vastata kaikkein tärkeimpään kysymykseen kussakin sovelluksessa. Sovelluksina ovat mm. digitaalinen markkinointi, vertaislainaus, työttömyys ja pelisuosittelu. Ratkaisu ottaa huomioon sensuroinnin, siinä missä edelliset ratkaisut ovat saaneet vääristyneitä tuloksia tai keskittyneet ratkaisemaan yksinkertaisempaa ongelmaa datajoukoissa, joissa sensurointi ei ole ongelma. Ehdottamamme ratkaisu perustuu kolmeen vaiheeseen jossa yhdistyy ongelman matemaattinen ymmärrys ja koneoppiminen: 1) ongelma dekonstruoidaan parittaisena datana 2) koneoppimista sovelletaan puuttuvien parien ennustamiseen 3) oikea vastaus rekonstruoidaan ennustetuista pareista. Abstraktilla tasolla idea on kaikissa paperissa sama, mutta jokaisessa sovelluksessa hyödynnetään sitä varten suunniteltua koneoppimismenetelmää ja parittaista kuvausta
Learning from Multi-View Multi-Way Data via Structural Factorization Machines
Real-world relations among entities can often be observed and determined by
different perspectives/views. For example, the decision made by a user on
whether to adopt an item relies on multiple aspects such as the contextual
information of the decision, the item's attributes, the user's profile and the
reviews given by other users. Different views may exhibit multi-way
interactions among entities and provide complementary information. In this
paper, we introduce a multi-tensor-based approach that can preserve the
underlying structure of multi-view data in a generic predictive model.
Specifically, we propose structural factorization machines (SFMs) that learn
the common latent spaces shared by multi-view tensors and automatically adjust
the importance of each view in the predictive model. Furthermore, the
complexity of SFMs is linear in the number of parameters, which make SFMs
suitable to large-scale problems. Extensive experiments on real-world datasets
demonstrate that the proposed SFMs outperform several state-of-the-art methods
in terms of prediction accuracy and computational cost.Comment: 10 page
Adversarial Attacks on Linear Contextual Bandits
Contextual bandit algorithms are applied in a wide range of domains, from
advertising to recommender systems, from clinical trials to education. In many
of these domains, malicious agents may have incentives to attack the bandit
algorithm to induce it to perform a desired behavior. For instance, an
unscrupulous ad publisher may try to increase their own revenue at the expense
of the advertisers; a seller may want to increase the exposure of their
products, or thwart a competitor's advertising campaign. In this paper, we
study several attack scenarios and show that a malicious agent can force a
linear contextual bandit algorithm to pull any desired arm times
over a horizon of steps, while applying adversarial modifications to either
rewards or contexts that only grow logarithmically as . We also
investigate the case when a malicious agent is interested in affecting the
behavior of the bandit algorithm in a single context (e.g., a specific user).
We first provide sufficient conditions for the feasibility of the attack and we
then propose an efficient algorithm to perform the attack. We validate our
theoretical results on experiments performed on both synthetic and real-world
datasets
SHELDON Smart habitat for the elderly.
An insightful document concerning active and assisted living under different perspectives: Furniture and habitat, ICT solutions and Healthcare
A novel hybrid recommendation system for library book selection
Abstract. Increasing number of books published in a year and decreasing budgets have made collection development increasingly difficult in libraries. Despite the data to help decision making being available in the library systems, the librarians have little means to utilize the data. In addition, modern key technologies, such as machine learning, that generate more value out data have not yet been utilized in the field of libraries to their full extent. This study was set to discover a way to build a recommendation system that could help librarians who are struggling with book selection process.
This thesis proposed a novel hybrid recommendation system for library book selection. The data used to build the system consisted of book metadata and book circulation data of books located in Joensuu City Library’s adult fiction collection. The proposed system was based on both rule-based components and a machine learning model. The user interface for the system was build using web technologies so that the system could be used via using web browser.
The proposed recommendation system was evaluated using two different methods: automated tests and focus group methodology. The system achieved an accuracy of 79.79% and F1 score of 0.86 in automated tests. Uncertainty rate of the system was 27.87%. With these results in automated tests, the proposed system outperformed baseline machine learning models. The main suggestions that were gathered from focus group evaluation were that while the proposed system was found interesting, librarians thought it would need more features and configurability in order to be usable in real world scenarios.
Results indicate that making good quality recommendations using book metadata is challenging because the data is high dimensional categorical data by its nature. Main implications of the results are that recommendation systems in domain of library collection development should focus on data pre-processing and feature engineering. Further investigation is suggested to be carried out regarding knowledge representation
- …