459 research outputs found
An Efficient and Scalable Recommender System for the Smart Web
This proceeding at: 11th International Conference on Innovations in Information Technology (IIT) Innovations 2015. Special Theme: Smart Cities, Big Data, Sustainable Development. Took place at 2015, November, 01 - 03, in Dubai, United Arab Emirates (IEEE IIT 2015).This work describes the development of a web recommender system implementing both collaborative filtering and content-based filtering. Moreover, it supports two different working modes, either sponsored or related, depending on whether websites are to be recommended based on a list of ongoing ad campaigns or in the user preferences. Novel recommendation algorithms are proposed and implemented, which fully rely on set operations such as union and intersection in order to compute the set of recommendations to be provided to end users. The recommender system is deployed over a real-time big data architecture designed to work with Apache Hadoop ecosystem, thus supporting horizontal scalability, and is able to provide recommendations as a service by means of a RESTful API. The performance of the recommender is measured, resulting in the system being able to provide dozens of recommendations in few milliseconds in a single-node cluster setup.This research work is part of Memento Data Analysis project, co-funded by the Spanish Ministry of Industry, Energy and Tourism with no. TSI-020601-2012-99 and TSI-020110-2009-137.Publicad
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Recommended from our members
Accelerating Iterative Computations for Large-Scale Data Processing
Recent advances in sensing, storage, and networking technologies are creating massive amounts of data at an unprecedented scale and pace. Large-scale data processing is commonly leveraged to make sense of these data, which will enable companies, governments, and organizations, to make better decisions and bring convenience to our daily life. However, the massive amount of data involved makes it challenging to perform data processing in a timely manner. On the one hand, huge volumes of data might not even fit into the disk of a single machine. On the other hand, data mining and machine learning algorithms, which are usually involved in large-scale data processing, typically require time-consuming iterative computations. Therefore, it is imperative to efficiently perform iterative computations on large computer clusters or cloud using highly-parallel and shared-nothing distributed systems.
This research aims to explore new forms of iterative computations that reduce unnecessary computations so as to accelerate large-scale data processing in a distributed environment. We propose the iterative computation transformation for well-known data mining and machine learning algorithms, such as expectation-maximization, nonnegative matrix factorization, belief propagation, and graph algorithms (e.g., PageRank). These algorithms have been used in a wide range of application domains. First, we show how to accelerate expectation-maximization algorithms with frequent updates in a distributed environment. Then, we illustrate the way of efficiently scaling distributed nonnegative matrix factorization with block-wise updates. Next, our approach of scaling distributed belief propagation with prioritized block updates is presented. Last, we illustrate how to efficiently perform distributed incremental computation on evolving graphs.
We will elaborate how to implement these transformed iterative computations on existing distributed programming models such as the MapReduce-based model, as well as develop new scalable and efficient distributed programming models and frameworks when necessary. The goal of these supporting distributed frameworks is to lift the burden of the programmers in specifying transformation of iterative computations and communication mechanisms, and automatically optimize the execution of the computation. Our techniques are evaluated extensively to demonstrate their efficiency. While the techniques we propose are in the context of specific algorithms, they address the challenges commonly faced in many other algorithms
A Survey of Recommendation Systems and Performance Enhancing Methods
With the development of web services like E-commerce, job hunting websites, movie websites, recommendation system plays a more and more importance role in helping users finding their potential interests among the overloading information. There are a great number of researches available in this field, which leads to various recommendation approaches to choose from when researchers try to implement their recommendation systems.
This paper gives a systematic literature review of recommendation systems where the sources are extracted from Scopus. The research problem to address, similarity metrics used, proposed method and evaluation metrics used are the focus of summary of these papers. In spite of the methodology used in traditional recommendation systems, how additional performance enhancement methods like machine learning methods, matrix factorization techniques and big data tools are applied in several papers are also introduced.
Through reading this paper, researchers are able to understand what are the existing types of recommendation systems, what is the general process of recommendation systems, how the performance enhancement methods can be used to improve the system's performance. Therefore, they can choose a recommendation system which interests them for either implementation or research purpose
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Recent advancements in deep neural networks for graph-structured data have
led to state-of-the-art performance on recommender system benchmarks. However,
making these methods practical and scalable to web-scale recommendation tasks
with billions of items and hundreds of millions of users remains a challenge.
Here we describe a large-scale deep recommendation engine that we developed and
deployed at Pinterest. We develop a data-efficient Graph Convolutional Network
(GCN) algorithm PinSage, which combines efficient random walks and graph
convolutions to generate embeddings of nodes (i.e., items) that incorporate
both graph structure as well as node feature information. Compared to prior GCN
approaches, we develop a novel method based on highly efficient random walks to
structure the convolutions and design a novel training strategy that relies on
harder-and-harder training examples to improve robustness and convergence of
the model. We also develop an efficient MapReduce model inference algorithm to
generate embeddings using a trained model. We deploy PinSage at Pinterest and
train it on 7.5 billion examples on a graph with 3 billion nodes representing
pins and boards, and 18 billion edges. According to offline metrics, user
studies and A/B tests, PinSage generates higher-quality recommendations than
comparable deep learning and graph-based alternatives. To our knowledge, this
is the largest application of deep graph embeddings to date and paves the way
for a new generation of web-scale recommender systems based on graph
convolutional architectures.Comment: KDD 201
- …