36,636 research outputs found
Integer Echo State Networks: Hyperdimensional Reservoir Computing
We propose an approximation of Echo State Networks (ESN) that can be
efficiently implemented on digital hardware based on the mathematics of
hyperdimensional computing. The reservoir of the proposed Integer Echo State
Network (intESN) is a vector containing only n-bits integers (where n<8 is
normally sufficient for a satisfactory performance). The recurrent matrix
multiplication is replaced with an efficient cyclic shift operation. The intESN
architecture is verified with typical tasks in reservoir computing: memorizing
of a sequence of inputs; classifying time-series; learning dynamic processes.
Such an architecture results in dramatic improvements in memory footprint and
computational efficiency, with minimal performance loss.Comment: 10 pages, 10 figures, 1 tabl
Personalized Purchase Prediction of Market Baskets with Wasserstein-Based Sequence Matching
Personalization in marketing aims at improving the shopping experience of
customers by tailoring services to individuals. In order to achieve this,
businesses must be able to make personalized predictions regarding the next
purchase. That is, one must forecast the exact list of items that will comprise
the next purchase, i.e., the so-called market basket. Despite its relevance to
firm operations, this problem has received surprisingly little attention in
prior research, largely due to its inherent complexity. In fact,
state-of-the-art approaches are limited to intuitive decision rules for pattern
extraction. However, the simplicity of the pre-coded rules impedes performance,
since decision rules operate in an autoregressive fashion: the rules can only
make inferences from past purchases of a single customer without taking into
account the knowledge transfer that takes place between customers. In contrast,
our research overcomes the limitations of pre-set rules by contributing a novel
predictor of market baskets from sequential purchase histories: our predictions
are based on similarity matching in order to identify similar purchase habits
among the complete shopping histories of all customers. Our contributions are
as follows: (1) We propose similarity matching based on subsequential dynamic
time warping (SDTW) as a novel predictor of market baskets. Thereby, we can
effectively identify cross-customer patterns. (2) We leverage the Wasserstein
distance for measuring the similarity among embedded purchase histories. (3) We
develop a fast approximation algorithm for computing a lower bound of the
Wasserstein distance in our setting. An extensive series of computational
experiments demonstrates the effectiveness of our approach. The accuracy of
identifying the exact market baskets based on state-of-the-art decision rules
from the literature is outperformed by a factor of 4.0.Comment: Accepted for oral presentation at 25th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD 2019
Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework
Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms
LiveRank: How to Refresh Old Datasets
This paper considers the problem of refreshing a dataset. More precisely ,
given a collection of nodes gathered at some time (Web pages, users from an
online social network) along with some structure (hyperlinks, social
relationships), we want to identify a significant fraction of the nodes that
still exist at present time. The liveness of an old node can be tested through
an online query at present time. We call LiveRank a ranking of the old pages so
that active nodes are more likely to appear first. The quality of a LiveRank is
measured by the number of queries necessary to identify a given fraction of the
active nodes when using the LiveRank order. We study different scenarios from a
static setting where the Liv-eRank is computed before any query is made, to
dynamic settings where the LiveRank can be updated as queries are processed.
Our results show that building on the PageRank can lead to efficient LiveRanks,
for Web graphs as well as for online social networks
User's Privacy in Recommendation Systems Applying Online Social Network Data, A Survey and Taxonomy
Recommender systems have become an integral part of many social networks and
extract knowledge from a user's personal and sensitive data both explicitly,
with the user's knowledge, and implicitly. This trend has created major privacy
concerns as users are mostly unaware of what data and how much data is being
used and how securely it is used. In this context, several works have been done
to address privacy concerns for usage in online social network data and by
recommender systems. This paper surveys the main privacy concerns, measurements
and privacy-preserving techniques used in large-scale online social networks
and recommender systems. It is based on historical works on security,
privacy-preserving, statistical modeling, and datasets to provide an overview
of the technical difficulties and problems associated with privacy preserving
in online social networks.Comment: 26 pages, IET book chapter on big data recommender system
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
- …