159,853 research outputs found
The Bases of Association Rules of High Confidence
We develop a new approach for distributed computing of the association rules
of high confidence in a binary table. It is derived from the D-basis algorithm
in K. Adaricheva and J.B. Nation (TCS 2017), which is performed on multiple
sub-tables of a table given by removing several rows at a time. The set of
rules is then aggregated using the same approach as the D-basis is retrieved
from a larger set of implications. This allows to obtain a basis of association
rules of high confidence, which can be used for ranking all attributes of the
table with respect to a given fixed attribute using the relevance parameter
introduced in K. Adaricheva et al. (Proceedings of ICFCA-2015). This paper
focuses on the technical implementation of the new algorithm. Some testing
results are performed on transaction data and medical data.Comment: Presented at DTMN, Sydney, Australia, July 28, 201
What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?
Purpose:
The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint.
Design/methodology/approach:
A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel NaĂŻve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint.
Findings:
The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior.
Research limitations/implications:
The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation.
Originality/value:
Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective
AUC Optimisation and Collaborative Filtering
In recommendation systems, one is interested in the ranking of the predicted
items as opposed to other losses such as the mean squared error. Although a
variety of ways to evaluate rankings exist in the literature, here we focus on
the Area Under the ROC Curve (AUC) as it widely used and has a strong
theoretical underpinning. In practical recommendation, only items at the top of
the ranked list are presented to the users. With this in mind, we propose a
class of objective functions over matrix factorisations which primarily
represent a smooth surrogate for the real AUC, and in a special case we show
how to prioritise the top of the list. The objectives are differentiable and
optimised through a carefully designed stochastic gradient-descent-based
algorithm which scales linearly with the size of the data. In the special case
of square loss we show how to improve computational complexity by leveraging
previously computed measures. To understand theoretically the underlying matrix
factorisation approaches we study both the consistency of the loss functions
with respect to AUC, and generalisation using Rademacher theory. The resulting
generalisation analysis gives strong motivation for the optimisation under
study. Finally, we provide computation results as to the efficacy of the
proposed method using synthetic and real data
Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access
Pregel is a popular distributed computing model for dealing with large-scale
graphs. However, it can be tricky to implement graph algorithms correctly and
efficiently in Pregel's vertex-centric model, especially when the algorithm has
multiple computation stages, complicated data dependencies, or even
communication over dynamic internal data structures. Some domain-specific
languages (DSLs) have been proposed to provide more intuitive ways to implement
graph algorithms, but due to the lack of support for remote access --- reading
or writing attributes of other vertices through references --- they cannot
handle the above mentioned dynamic communication, causing a class of Pregel
algorithms with fast convergence impossible to implement.
To address this problem, we design and implement Palgol, a more declarative
and powerful DSL which supports remote access. In particular, programmers can
use a more declarative syntax called chain access to naturally specify dynamic
communication as if directly reading data on arbitrary remote vertices. By
analyzing the logic patterns of chain access, we provide a novel algorithm for
compiling Palgol programs to efficient Pregel code. We demonstrate the power of
Palgol by using it to implement several practical Pregel algorithms, and the
evaluation result shows that the efficiency of Palgol is comparable with that
of hand-written code.Comment: 12 pages, 10 figures, extended version of APLAS 2017 pape
Efficient Large-scale Approximate Nearest Neighbor Search on the GPU
We present a new approach for efficient approximate nearest neighbor (ANN)
search in high dimensional spaces, extending the idea of Product Quantization.
We propose a two-level product and vector quantization tree that reduces the
number of vector comparisons required during tree traversal. Our approach also
includes a novel highly parallelizable re-ranking method for candidate vectors
by efficiently reusing already computed intermediate values. Due to its small
memory footprint during traversal, the method lends itself to an efficient,
parallel GPU implementation. This Product Quantization Tree (PQT) approach
significantly outperforms recent state of the art methods for high dimensional
nearest neighbor queries on standard reference datasets. Ours is the first work
that demonstrates GPU performance superior to CPU performance on high
dimensional, large scale ANN problems in time-critical real-world applications,
like loop-closing in videos
- …