Search CORE

675 research outputs found

Cross domain recommender systems using matrix and tensor factorization

Author: Pourheidari Vahid 1988-
Publication venue: 'University of Saskatchewan Library'
Publication date: 12/03/2019
Field of study

Today, the amount and importance of available data on the internet are growing exponentially. These digital data has become a primary source of information and the people’s life bonded to them tightly. The data comes in diverse shapes and from various resources and users utilize them in almost all their personal or social activities. However, selecting a desirable option from the huge list of available options can be really frustrating and time-consuming. Recommender systems aim to ease this process by finding the proper items which are more likely to be interested by users. Undoubtedly, there is not even one social media or online service which can continue its’ work properly without using recommender systems. On the other hand, almost all available recommendation techniques suffer from some common issues: the data sparsity, the cold-start, and the new-user problems. This thesis tackles the mentioned problems using different methods. While, most of the recommender methods rely on using single domain information, in this thesis, the main focus is on using multi-domain information to create cross-domain recommender systems. A cross-domain recommender system is not only able to handle the cold-start and new-user situations much better, but it also helps to incorporate different features exposed in diverse domains together and capture a better understanding of the users’ preferences which means producing more accurate recommendations. In this thesis, a pre-clustering stage is proposed to reduce the data sparsity as well. Various cross-domain knowledge-based recommender systems are suggested to recommend items in two popular social media, the Twitter and LinkedIn, by using different information available in both domains. The state of art techniques in this field, namely matrix factorization and tensor decomposition, are implemented to develop cross-domain recommender systems. The presented recommender systems based on the coupled nonnegative matrix factorization and PARAFAC-style tensor decomposition are evaluated using real-world datasets and it is shown that they superior to the baseline matrix factorization collaborative filtering. In addition, network analysis is performed on the extracted data from Twitter and LinkedIn

eCommons@USASK

University of Saskatchewan Research Archive

Scalable and Reliable Sparse Data Computation on Emergent High Performance Computing Systems

Author: Miao Zheng
Publication venue: Clemson University Libraries
Publication date: 01/05/2022
Field of study

Heterogeneous systems with both CPUs and GPUs have become important system architectures in emergent High Performance Computing (HPC) systems. Heterogeneous systems must address both performance-scalability and power-scalability in the presence of failures. Aggressive power reduction pushes hardware to its operating limit and increases the failure rate. Resilience allows programs to progress when subjected to faults and is an integral component of large-scale systems, but incurs significant time and energy overhead. The future exascale systems are expected to have higher power consumption with higher fault rates. Sparse data computation is the fundamental kernel in many scientific applications. It is suitable for the studies of scalability and resilience on heterogeneous systems due to its computational characteristics. To deliver the promised performance within the given power budget, heterogeneous computing mandates a deep understanding of the interplay between scalability and resilience. Managing scalability and resilience is challenging in heterogeneous systems, due to the heterogeneous compute capability, power consumption, and varying failure rates between CPUs and GPUs. Scalability and resilience have been traditionally studied in isolation, and optimizing one typically detrimentally impacts the other. While prior works have been proved successful in optimizing scalability and resilience on CPU-based homogeneous systems, simply extending current approaches to heterogeneous systems results in suboptimal performance-scalability and/or power-scalability. To address the above multiple research challenges, we propose novel resilience and energy-efficiency technologies to optimize scalability and resilience for sparse data computation on heterogeneous systems with CPUs and GPUs. First, we present generalized analytical and experimental methods to analyze and quantify the time and energy costs of various recovery schemes, and develop and prototype performance optimization and power management strategies to improve scalability for sparse linear solvers. Our results quantitatively reveal that each resilience scheme has its own advantages depending on the fault rate, system size, and power budget, and the forward recovery can further benefit from our performance and power optimizations for large-scale computing. Second, we design a novel resilience technique that relaxes the requirement of synchronization and identicalness for processes, and allows them to run in heterogeneous resources with power reduction. Our results show a significant reduction in energy for unmodified programs in various fault situations compared to exact replication techniques. Third, we propose a novel distributed sparse tensor decomposition that utilizes an asynchronous RDMA-based approach with OpenSHMEM to improve scalability on large-scale systems and prove that our method works well in heterogeneous systems. Our results show our irregularity-aware workload partition and balanced-asynchronous algorithms are scalable and outperform the state-of-the-art distributed implementations. We demonstrate that understanding different bottlenecks for various types of tensors plays critical roles in improving scalability

Clemson University: TigerPrints

Mobile app recommendations using deep learning and big data

Author: Pinto Luís António Galego
Publication venue
Publication date: 18/01/2019
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Marketing Research e CRMRecommender systems were first introduced to solve information overload problems in enterprises. Over the last decades, recommender systems have found applications in several major websites related to e-commerce, music and video streaming, travel and movie sites, social media and mobile app stores. Several methods have been proposed over the years to build recommender systems. The most popular approaches are based on collaborative filtering techniques, which leverage the similarities between consumer tastes. But the current state of the art in recommender systems is deep-learning methods, which can leverage not only item consumption data but also content, context, and user attributes. Mobile app stores generate data with Big Data properties from app consumption data, behavioral, geographic, demographic, social network and user-generated content data, which includes reviews, comments and search queries. In this dissertation, we propose a deep-learning architecture for recommender systems in mobile app stores that leverage most of these data sources. We analyze three issues related to the impact of the data sources, the impact of embedding layer pretraining and the efficiency of using Kernel methods to improve app scoring at a Big Data scale. An experiment is conducted on a Portuguese Android app store. Results suggest that models can be improved by combining structured and unstructured data. The results also suggest that embedding layer pretraining is essential to obtain good results. Some evidence is provided showing that Kernel-based methods might not be efficient when deployed in Big Data contexts

Repositório da Universidade Nova de Lisboa

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs

Author: Chen Quan
Cui Weihao
Guo Cong
Guo Minyi
Leng Jingwen
Li Li
Liu Zihan
Song Yaoxu
Zhang Zhendong
Zhou Yangjie
Publication venue
Publication date: 27/05/2023
Field of study

Graph neural networks (GNNs) are powerful tools for exploring and learning from graph structures and features. As such, achieving high-performance execution for GNNs becomes crucially important. Prior works have proposed to explore the sparsity (i.e., low density) in the input graph to accelerate GNNs, which uses the full-graph-level or block-level sparsity format. We show that they fail to balance the sparsity benefit and kernel execution efficiency. In this paper, we propose a novel system, referred to as AdaptGear, that addresses the challenge of optimizing GNNs performance by leveraging kernels tailored to the density characteristics at the subgraph level. Meanwhile, we also propose a method that dynamically chooses the optimal set of kernels for a given input graph. Our evaluation shows that AdaptGear can achieve a significant performance improvement, up to

6.49 \times

(

1.87 \times

on average), over the state-of-the-art works on two mainstream NVIDIA GPUs across various datasets

arXiv.org e-Print Archive

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Author: Cui Bin
Jiang Youhe
Miao Xupeng
Nie Xiaonan
Shi Chunan
Wang Yujie
Zhang Hailin
Publication venue: 'VLDB Endowment'
Publication date: 24/11/2022
Field of study

Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs efficiently is still challenging due to a large number of parallelism choices. Existing DL systems either rely on manual efforts to make distributed training plans or apply parallelism combinations within a very limited search space. In this approach, we propose Galvatron, a new system framework that incorporates multiple popular parallelism dimensions and automatically finds the most efficient hybrid parallelism strategy. To better explore such a rarely huge search space, we 1) involve a decision tree to make decomposition and pruning based on some reasonable intuitions, and then 2) design a dynamic programming search algorithm to generate the optimal plan. Evaluations on four representative Transformer workloads show that Galvatron could perform automatically distributed training with different GPU memory budgets. Among all evluated scenarios, Galvatron always achieves superior system throughput compared to previous work with limited parallelism

arXiv.org e-Print Archive