Search CORE

112 research outputs found

RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERING

Author: Lee Yunkyoung
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2015
Field of study

Collaborative filtering is one of the well known and most extensive techniques in recommendation system its basic idea is to predict which items a user would be interested in based on their preferences. Recommendation systems using collaborative filtering are able to provide an accurate prediction when enough data is provided, because this technique is based on the user’s preference. User-based collaborative filtering has been very successful in the past to predict the customer’s behavior as the most important part of the recommendation system. However, their widespread use has revealed some real challenges, such as data sparsity and data scalability, with gradually increasing the number of users and items. To improve the execution time and accuracy of the prediction problem, this paper proposed item-based collaborative filtering applying dimension reduction in a recommendation system. It demonstrates that the proposed approach can achieve better performance and execution time for the recommendation system in terms of existing challenges, according to evaluation metrics using Mean Absolute Error (MAE)

SJSU ScholarWorks

An Efficient and Scalable Recommender System for the Smart Web

Author: Albacete García Esperanza
Baldominos Gómez Alejandro
Marrero Ignacio
Sáez Achaerandio Yago
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This proceeding at: 11th International Conference on Innovations in Information Technology (IIT) Innovations 2015. Special Theme: Smart Cities, Big Data, Sustainable Development. Took place at 2015, November, 01 - 03, in Dubai, United Arab Emirates (IEEE IIT 2015).This work describes the development of a web recommender system implementing both collaborative filtering and content-based filtering. Moreover, it supports two different working modes, either sponsored or related, depending on whether websites are to be recommended based on a list of ongoing ad campaigns or in the user preferences. Novel recommendation algorithms are proposed and implemented, which fully rely on set operations such as union and intersection in order to compute the set of recommendations to be provided to end users. The recommender system is deployed over a real-time big data architecture designed to work with Apache Hadoop ecosystem, thus supporting horizontal scalability, and is able to provide recommendations as a service by means of a RESTful API. The performance of the recommender is measured, resulting in the system being able to provide dozens of recommendations in few milliseconds in a single-node cluster setup.This research work is part of Memento Data Analysis project, co-funded by the Spanish Ministry of Industry, Energy and Tourism with no. TSI-020601-2012-99 and TSI-020110-2009-137.Publicad

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Apache Mahout: Machine Learning on Distributed Dataflow Systems

Author: Anil R.
Capan G.
Drost-Fromm I.
Dunning T.
Friedman E.
Grant T.
Quinn S.
Ranjan P.
Schelter S.
Yılmazel Ö.
Publication venue
Publication date: 01/06/2020
Field of study

Apache Mahout is a library for scalable machine learning (ML) on distributed dataflow systems, offering various implementations of classification, clustering, dimensionality reduction and recommendation algorithms. Mahout was a pioneer in large-scale machine learning in 2008, when it started and targeted MapReduce, which was the predominant abstraction for scalable computing in industry at that time. Mahout has been widely used by leading web companies and is part of several commercial cloud offerings. In recent years, Mahout migrated to a general framework enabling a mix of dataflow programming and linear algebraic computations on backends such as Apache Spark and Apache Flink. This design allows users to execute data preprocessing and model training in a single, unified dataflow system, instead of requiring a complex integration of several specialized systems. Mahout is maintained as a community-driven open source project at the Apache Software Foundation, and is available under https://mahout.apache.org

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Recommender Systems in Light of Big Data

Author: A. Almohsen Khadija
Al-Jobori Huda
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2015
Field of study

The growth in the usage of the web, especially e-commerce website, has led to the development of recommender system (RS) which aims in personalizing the web content for each user and reducing the cognitive load of information on the user. However, as the world enters Big Data era and lives through the contemporary data explosion, the main goal of a RS becomes to provide millions of high quality recommendations in few seconds for the increasing number of users and items. One of the successful techniques of RSs is collaborative filtering (CF) which makes recommendations for users based on what other like-mind users had preferred. Despite its success, CF is facing some challenges posed by Big Data, such as: scalability, sparsity and cold start. As a consequence, new approaches of CF that overcome the existing problems have been studied such as Singular value decomposition (SVD). This paper surveys the literature of RSs and reviews the current state of RSs with the main concerns surrounding them due to Big Data. Furthermore, it investigates thoroughly SVD, one of the promising approaches expected to perform well in tackling Big Data challenges, and provides an implementation to it using some of the successful Big Data tools (i.e. Apache Hadoop and Spark). This implementation is intended to validate the applicability of, existing contributions to the field of, SVD-based RSs as well as validated the effectiveness of Hadoop and spark in developing large-scale systems. The implementation has been evaluated empirically by measuring mean absolute error which gave comparable results with other experiments conducted, previously by other researchers, on a relatively smaller data set and non-distributed environment. This proved the scalability of SVD-based RS and its applicability to Big Data

Crossref

Institute of Advanced Engineering and Science

THE USE OF RECOMMENDER SYSTEMS IN WEB APPLICATIONS – THE TROI CASE

Author: Mulaj Donjeta
Publication venue: UBT Knowledge Center
Publication date: 01/07/2018
Field of study

Avoiding digital marketing, surveys, reviews and online users behavior approaches on digital age are the key elements for a powerful businesses to fail, there are some systems that should preceded some artificial intelligence techniques. In this direction, the use of data mining for recommending relevant items as a new state of the art technique is increasing user satisfaction as well as the business revenues. And other related information gathering approaches in order to our systems thing and acts like humans. To do so there is a Recommender System that will be elaborated in this thesis. How people interact, how to calculate accurately and identify what people like or dislike based on their online previous behaviors. The thesis includes also the methodologies recommender system uses, how math equations helps Recommender Systems to calculate user’s behavior and similarities. The filters are important on Recommender System, explaining if similar users like the same product or item, which is the probability of neighbor user to like also. Here comes collaborative filters, neighborhood filters, hybrid recommender system with the use of various algorithms the Recommender Systems has the ability to predict whether a particular user would prefer an item or not, based on the user’s profile and their activities. The use of Recommender Systems are beneficial to both service providers and users. Thesis cover also the strength and weaknesses of Recommender Systems and how involving Ontology can improve it. Ontology-based methods can be used to reduce problems that content-based recommender systems are known to suffer from. Based on Kosovar’s GDP and youngsters job perspectives are desirable for improvements, the demand is greater than the offer. I thought of building an intelligence system that will be making easier for Kosovars to find the appropriate job that suits their profile, skills, knowledge, character and locations. And that system is called TROI Search engine that indexes and merge all local operating job seeking websites in one platform with intelligence features. Thesis will present the design, implementation, testing and evaluation of a TROI search engine. Testing is done by getting user experiments while using running environment of TROI search engine. Results show that the functionality of the recommender system is satisfactory and helpful

University of Business and Technology in Kosovo: UBT Knowledge Center Collections

Exploiting distributional semantics for content-based and context-aware recommendation

Author: Codina Busquet Victor
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

During the last decade, the use of recommender systems has been increasingly growing to the point that, nowadays, the success of many well-known services depends on these technologies. Recommenders Systems help people to tackle the choice overload problem by effectively presenting new content adapted to the user¿s preferences. However, current recommendation algorithms commonly suffer from data sparsity, which refers to the incapability of producing acceptable recommendations until a minimum amount of users¿ ratings are available for training the prediction models. This thesis investigates how the distributional semantics of concepts describing the entities of the recommendation space can be exploited to mitigate the data-sparsity problem and improve the prediction accuracy with respect to state-of-the-art recommendation techniques. The fundamental idea behind distributional semantics is that concepts repeatedly co-occurring in the same context or usage tend to be related. In this thesis, we propose and evaluate two novel semantically-enhanced prediction models that address the sparsity-related limitations: (1) a content-based approach, which exploits the distributional semantics of item¿s attributes during item and user-profile matching, and (2) a context-aware recommendation approach that exploits the distributional semantics of contextual conditions during context modeling. We demonstrate in an exhaustive experimental evaluation that the proposed algorithms outperform state-of-the-art ones, especially when data are sparse. Finally, this thesis presents a recommendation framework, which extends the widespread machine learning library Apache Mahout, including all the proposed and evaluated recommendation algorithms as well as a tool for offline evaluation and meta-parameter optimization. The framework has been developed to allow other researchers to reproduce the described evaluation experiments and make new progress on the Recommender Systems field easierDurant l'última dècada, l'ús dels sistemes de recomanació s'ha vist incrementat fins al punt que, actualment, l'èxit de molts dels serveis web més coneguts depèn en aquesta tecnologia. Els Sistemes de Recomanació ajuden als usuaris a trobar els productes o serveis que més s¿adeqüen als seus interessos i preferències. Una gran limitació dels algoritmes de recomanació actuals és el problema de "data-sparsity", que es refereix a la incapacitat d'aquests sistemes de generar recomanacions precises fins que un cert nombre de votacions d'usuari és disponible per entrenar els models de predicció. Per mitigar aquest problema i millorar així la precisió de predicció de les tècniques de recomanació que conformen l'estat de l'art, en aquesta tesi hem investigat diferents maneres d'aprofitar la semàntica distribucional dels conceptes que descriuen les entitats que conformen l'espai del problema de la recomanació, principalment, els objectes a recomanar i la informació contextual. En la semàntica distribucional s'assumeix la següent hipotesi: conceptes que coincideixen repetidament en el mateix context o ús tendeixen a estar semànticament relacionats. Concretament, en aquesta tesi hem proposat i avaluat dos algoritmes de recomanació que fan ús de la semàntica distribucional per mitigar el problem de "data-sparsity": (1) un model basat en contingut que explota les similituds distribucionals dels atributs que representen els objectes a recomanar durant el càlcul de la correspondència entre els perfils d'usuari i dels objectes; (2) un model de recomanació contextual que fa ús de les similituds distribucionals entre condicions contextuals durant la representació del context. Mitjançant una avaluació experimental exhaustiva dels models de recomanació proposats hem demostrat la seva efectivitat en situacions de falta de dades, confirmant que poden millorar la precisió d'algoritmes que conformen l'estat de l'art. Finalment, aquesta tesi presenta una llibreria pel desenvolupament i avaluació d'algoritmes de recomanació com una extensió de la llibreria de "Machine Learning" Apache Mahout, àmpliament utilitzada en el camp del Machine Learning. La nostra extensió inclou tots els algoritmes de recomanació avaluats en aquesta tesi, així com una eina per facilitar l'avaluació experimental dels algoritmes. Hem desenvolupat aquesta llibreria per facilitar a altres investigadors la reproducció dels experiments realitzats i, per tant, el progrés en el camp dels Sistemes de Recomanació

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Empirical study of the behavior of several Recommender System methods on SAPO Videos

Author: Guaicaipuro Alberto Oliveira Neves
Publication venue
Publication date: 20/07/2015
Field of study

Nos últimos anos, a internet tornou-se numa ferramenta indispensável para qualquer empresa ou utilizador da mesma, o que levou à uma enorme quantidade de informações estar disponível aos seus utilizadores. Esta sobrecarga de informação tornaram-se um problema urgente que faz com que o utilizador não consiga manter o controle dos seus próprios interesses. Para resolver este problema, os sistemas de recomendação são desenvolvidos para sugerir automaticamente itens que sejam do interesse dos utilizadores.As estratégias de recomendação mais populares para prever as preferências do utilizador são: \emph{1) Content-based filtering, 3) Social based filtering, 4) Social tagging filtering, 5) Knowledge-based filtering, 6) hybrid filtering, 7) context-aware filtering and 8)time-aware filtering.} Esta tese tem como objetivo realizar um estudo empírico sobre recomendação de vídeos no site do Sapo. A motivação para este trabalho focasse com avaliar qual a melhor estratégia para o problems proposto,isto é, encontrar os melhores ferramentas e métricas de avaliação. Existem bastantes e diferentes ferramentas e métricas para avaliar e implementar este tipo de estratégias, encontrar a melhor combinação possível levará a encontrar uma melhor solução para o problema.Para realização deste estudo, será necessário fazer um levantamento de diferentes ferramentas de recomendação, recolher e preparar os dados a serem utilizados na plataforma experimental que será desenvolvido com algumas das ferramentas encontradas. No final de teste, os dados analisados serão avaliados usando as métricas de avaliação que mais se adequarem ao problema.Considerando o número crescente de plataformas de vídeos on-line, este tipo de sistema de recomendação também oferece às empresas uma grande vantagem competitiva.In the last years, the internet became an indispensable tool for any company or internet user, which led to a huge amount of information being at every internet user's disposal. This information overload became a pressing problem making the user unable to keep track of his own interests. To solve this issue, recommender systems are developed to automatically suggest items to users that may fit their interests.The most popular strategies for predicting user preferences are: 1) Collaborative filtering, 2) Content-based filtering, 3) Social based filtering, 4) Social tagging filtering, 5) Knowledge-based filtering, 6) hybrid filtering, 7) context-aware filtering and 8)time-aware filtering. This thesis aims to do an empirical study regarding recommender systems strategies for the Sapo Videos website. The motivation for this work lays with assessing which is the best strategy for the proposed problem, that leads to finding the best tool and evaluation metrics. There are a lot of different tools and metrics to implement and evaluate this kind of strategies finding the best one will point out that best strategy. To accomplish this study it will be necessary to survey different recommendation tools, collect and prepare the data to be used on the experimental plataform that will be develop with some of the tools surveyed. In the end of each run of the experiment the data will analyzed using offline evaluation metrics that most suit the problem.Considering a growing number of platforms of online videos, this kind of recommendation systems also offers companies a great competitive advantage. It provides to its users personalized recommendations and also promotes their products

Repositório Aberto da Universidade do Porto