6,349 research outputs found
User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition
The new financial European regulations such as PSD2 are changing the retail
banking services. Noticeably, the monitoring of the personal expenses is now
opened to other institutions than retail banks. Nonetheless, the retail banks
are looking to leverage the user-device authentication on the mobile banking
applications to enhance the personal financial advertisement. To address the
profiling of the authentication, we rely on tensor decomposition, a higher
dimensional analogue of matrix decomposition. We use Paratuck2, which expresses
a tensor as a multiplication of matrices and diagonal tensors, because of the
imbalance between the number of users and devices. We highlight why Paratuck2
is more appropriate in this case than the popular CP tensor decomposition,
which decomposes a tensor as a sum of rank-one tensors. However, the
computation of Paratuck2 is computational intensive. We propose a new
APproximate HEssian-based Newton resolution algorithm, APHEN, capable of
solving Paratuck2 more accurately and faster than the other popular approaches
based on alternating least square or gradient descent. The results of Paratuck2
are used for the predictions of users' authentication with neural networks. We
apply our method for the concrete case of targeting clients for financial
advertising campaigns based on the authentication events generated by mobile
banking applications
Driving with Data: Modeling and Forecasting Vehicle Fleet Maintenance in Detroit
The City of Detroit maintains an active fleet of over 2500 vehicles, spending
an annual average of over \$5 million on new vehicle purchases and over \$7.7
million on maintaining this fleet. Understanding the existence of patterns and
trends in this data could be useful to a variety of stakeholders, particularly
as Detroit emerges from Chapter 9 bankruptcy, but the patterns in such data are
often complex and multivariate and the city lacks dedicated resources for
detailed analysis of this data. This work, a data collaboration between the
Michigan Data Science Team (http://midas.umich.edu/mdst) and the City of
Detroit's Operations and Infrastructure Group, seeks to address this unmet need
by analyzing data from the City of Detroit's entire vehicle fleet from
2010-2017. We utilize tensor decomposition techniques to discover and visualize
unique temporal patterns in vehicle maintenance; apply differential sequence
mining to demonstrate the existence of common and statistically unique
maintenance sequences by vehicle make and model; and, after showing these
time-dependencies in the dataset, demonstrate an application of a predictive
Long Short Term Memory (LSTM) neural network model to predict maintenance
sequences. Our analysis shows both the complexities of municipal vehicle fleet
data and useful techniques for mining and modeling such data.Comment: Presented at the Data For Good Exchange 201
QANet: Tensor Decomposition Approach for Query-based Anomaly Detection in Heterogeneous Information Networks
Complex networks have now become integral parts of modern information
infrastructures. This paper proposes a user-centric method for detecting
anomalies in heterogeneous information networks, in which nodes and/or edges
might be from different types. In the proposed anomaly detection method, users
interact directly with the system and anomalous entities can be detected
through queries. Our approach is based on tensor decomposition and clustering
methods. We also propose a network generation model to construct synthetic
heterogeneous information network to test the performance of the proposed
method. The proposed anomaly detection method is compared with state-of-the-art
methods in both synthetic and real-world networks. Experimental results show
that the proposed tensor-based method considerably outperforms the existing
anomaly detection methods
Tensor Completion Algorithms in Big Data Analytics
Tensor completion is a problem of filling the missing or unobserved entries
of partially observed tensors. Due to the multidimensional character of tensors
in describing complex datasets, tensor completion algorithms and their
applications have received wide attention and achievement in areas like data
mining, computer vision, signal processing, and neuroscience. In this survey,
we provide a modern overview of recent advances in tensor completion algorithms
from the perspective of big data analytics characterized by diverse variety,
large volume, and high velocity. We characterize these advances from four
perspectives: general tensor completion algorithms, tensor completion with
auxiliary information (variety), scalable tensor completion algorithms
(volume), and dynamic tensor completion algorithms (velocity). Further, we
identify several tensor completion applications on real-world data-driven
problems and present some common experimental frameworks popularized in the
literature. Our goal is to summarize these popular methods and introduce them
to researchers and practitioners for promoting future research and
applications. We conclude with a discussion of key challenges and promising
research directions in this community for future exploration
TLib: A Flexible C++ Tensor Framework for Numerical Tensor Calculus
Numerical tensor calculus comprise basic tensor operations such as the
entrywise addition and contraction of higher-order tensors. We present, TLib,
flexible tensor framework with generic tensor functions and tensor classes that
assists users to implement generic and flexible tensor algorithms in C++. The
number of dimensions, the extents of the dimensions of the tensors and the
contraction modes of the tensor operations can be runtime variable. Our
framework provides tensor classes that simplify the management of
multidimensional data and utilization of tensor operations using
object-oriented and generic programming techniques. Additional stream classes
help the user to verify and compare of numerical results with MATLAB. Tensor
operations are implemented with generic tensor functions and in terms of
multidimensional iterator types only, decoupling data storage representation
and computation. The user can combine tensor functions with different tensor
types and extend the framework without further modification of the classes or
functions. We discuss the design and implementation of the framework and
demonstrate its usage with examples that have been discussed in the literature.Comment: 29 page
Discovering patterns of online popularity from time series
How is popularity gained online? Is being successful strictly related to
rapidly becoming viral in an online platform or is it possible to acquire
popularity in a steady and disciplined fashion? What are other temporal
characteristics that can unveil the popularity of online content? To answer
these questions, we leverage a multi-faceted temporal analysis of the evolution
of popular online contents. Here, we present dipm-SC: a multi-dimensional
shape-based time-series clustering algorithm with a heuristic to find the
optimal number of clusters. First, we validate the accuracy of our algorithm on
synthetic datasets generated from benchmark time series models. Second, we show
that dipm-SC can uncover meaningful clusters of popularity behaviors in a
real-world Twitter dataset. By clustering the multidimensional time-series of
the popularity of contents coupled with other domain-specific dimensions, we
uncover two main patterns of popularity: bursty and steady temporal behaviors.
Moreover, we find that the way popularity is gained over time has no
significant impact on the final cumulative popularity
A review of heterogeneous data mining for brain disorders
With rapid advances in neuroimaging techniques, the research on brain
disorder identification has become an emerging area in the data mining
community. Brain disorder data poses many unique challenges for data mining
research. For example, the raw data generated by neuroimaging experiments is in
tensor representations, with typical characteristics of high dimensionality,
structural complexity and nonlinear separability. Furthermore, brain
connectivity networks can be constructed from the tensor data, embedding subtle
interactions between brain regions. Other clinical measures are usually
available reflecting the disease status from different perspectives. It is
expected that integrating complementary information in the tensor data and the
brain network data, and incorporating other clinical parameters will be
potentially transformative for investigating disease mechanisms and for
informing therapeutic interventions. Many research efforts have been devoted to
this area. They have achieved great success in various applications, such as
tensor-based modeling, subgraph pattern mining, multi-view feature analysis. In
this paper, we review some recent data mining methods that are used for
analyzing brain disorders
Transform-Based Multilinear Dynamical System for Tensor Time Series Analysis
We propose a novel multilinear dynamical system (MLDS) in a transform domain,
named -MLDS, to model tensor time series. With transformations
applied to a tensor data, the latent multidimensional correlations among the
frontal slices are built, and thus resulting in the computational independence
in the transform domain. This allows the exact separability of the
multi-dimensional problem into multiple smaller LDS problems. To estimate the
system parameters, we utilize the expectation-maximization (EM) algorithm to
determine the parameters of each LDS. Further, -MLDSs
significantly reduce the model parameters and allows parallel processing. Our
general -MLDS model is implemented based on different transforms:
discrete Fourier transform, discrete cosine transform and discrete wavelet
transform. Due to the nonlinearity of these transformations, -MLDS
is able to capture the nonlinear correlations within the data unlike the MLDS
\cite{rogers2013multilinear} which assumes multi-way linear correlations. Using
four real datasets, the proposed -MLDS is shown to achieve much
higher prediction accuracy than the state-of-the-art MLDS and LDS with an equal
number of parameters under different noise models. In particular, the relative
errors are reduced by . Simultaneously, -MLDS
achieves an exponential improvement in the model's training time than MLDS
Parallel Active Subspace Decomposition for Scalable and Efficient Tensor Robust Principal Component Analysis
Tensor robust principal component analysis (TRPCA) has received a substantial
amount of attention in various fields. Most existing methods, normally relying
on tensor nuclear norm minimization, need to pay an expensive computational
cost due to multiple singular value decompositions (SVDs) at each iteration. To
overcome the drawback, we propose a scalable and efficient method, named
Parallel Active Subspace Decomposition (PASD), which divides the unfolding
along each mode of the tensor into a columnwise orthonormal matrix (active
subspace) and another small-size matrix in parallel. Such a transformation
leads to a nonconvex optimization problem in which the scale of nulcear norm
minimization is generally much smaller than that in the original problem.
Furthermore, we introduce an alternating direction method of multipliers (ADMM)
method to solve the reformulated problem and provide rigorous analyses for its
convergence and suboptimality. Experimental results on synthetic and real-world
data show that our algorithm is more accurate than the state-of-the-art
approaches, and is orders of magnitude faster.Comment: 19 pages, 2 figures, 2 table
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite
Tensor methods have gained increasingly attention from various applications,
including machine learning, quantum chemistry, healthcare analytics, social
network analysis, data mining, and signal processing, to name a few. Sparse
tensors and their algorithms become critical to further improve the performance
of these methods and enhance the interpretability of their output. This work
presents a sparse tensor algorithm benchmark suite (PASTA) for single- and
multi-core CPUs. To the best of our knowledge, this is the first benchmark
suite for sparse tensor world. PASTA targets on: 1) helping application users
to evaluate different computer systems using its representative computational
workloads; 2) providing insights to better utilize existed computer
architecture and systems and inspiration for the future design. This benchmark
suite is publicly released https://gitlab.com/tensorworld/pasta
- …