Search CORE

6,349 research outputs found

User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition

Author: Charlier Jeremy
Falk Eric
Hilger Jean
State Radu
Publication venue
Publication date: 23/05/2019
Field of study

The new financial European regulations such as PSD2 are changing the retail banking services. Noticeably, the monitoring of the personal expenses is now opened to other institutions than retail banks. Nonetheless, the retail banks are looking to leverage the user-device authentication on the mobile banking applications to enhance the personal financial advertisement. To address the profiling of the authentication, we rely on tensor decomposition, a higher dimensional analogue of matrix decomposition. We use Paratuck2, which expresses a tensor as a multiplication of matrices and diagonal tensors, because of the imbalance between the number of users and devices. We highlight why Paratuck2 is more appropriate in this case than the popular CP tensor decomposition, which decomposes a tensor as a sum of rank-one tensors. However, the computation of Paratuck2 is computational intensive. We propose a new APproximate HEssian-based Newton resolution algorithm, APHEN, capable of solving Paratuck2 more accurately and faster than the other popular approaches based on alternating least square or gradient descent. The results of Paratuck2 are used for the predictions of users' authentication with neural networks. We apply our method for the concrete case of targeting clients for financial advertising campaigns based on the authentication events generated by mobile banking applications

arXiv.org e-Print Archive

Driving with Data: Modeling and Forecasting Vehicle Fleet Maintenance in Detroit

Author: Farahi Arya
Gardner Josh
Koutra Danai
Krassenstein Sam
Mroueh Jawad
Pang Victor
Webb Jared
Publication venue
Publication date: 18/10/2017
Field of study

The City of Detroit maintains an active fleet of over 2500 vehicles, spending an annual average of over \$5 million on new vehicle purchases and over \$7.7 million on maintaining this fleet. Understanding the existence of patterns and trends in this data could be useful to a variety of stakeholders, particularly as Detroit emerges from Chapter 9 bankruptcy, but the patterns in such data are often complex and multivariate and the city lacks dedicated resources for detailed analysis of this data. This work, a data collaboration between the Michigan Data Science Team (http://midas.umich.edu/mdst) and the City of Detroit's Operations and Infrastructure Group, seeks to address this unmet need by analyzing data from the City of Detroit's entire vehicle fleet from 2010-2017. We utilize tensor decomposition techniques to discover and visualize unique temporal patterns in vehicle maintenance; apply differential sequence mining to demonstrate the existence of common and statistically unique maintenance sequences by vehicle make and model; and, after showing these time-dependencies in the dataset, demonstrate an application of a predictive Long Short Term Memory (LSTM) neural network model to predict maintenance sequences. Our analysis shows both the complexities of municipal vehicle fleet data and useful techniques for mining and modeling such data.Comment: Presented at the Data For Good Exchange 201

arXiv.org e-Print Archive

QANet: Tensor Decomposition Approach for Query-based Anomaly Detection in Heterogeneous Information Networks

Author: Jalili Mahdi
Jandaghi Pegah
Ranjbar Vahid
Salehi Mostafa
Publication venue
Publication date: 19/10/2018
Field of study

Complex networks have now become integral parts of modern information infrastructures. This paper proposes a user-centric method for detecting anomalies in heterogeneous information networks, in which nodes and/or edges might be from different types. In the proposed anomaly detection method, users interact directly with the system and anomalous entities can be detected through queries. Our approach is based on tensor decomposition and clustering methods. We also propose a network generation model to construct synthetic heterogeneous information network to test the performance of the proposed method. The proposed anomaly detection method is compared with state-of-the-art methods in both synthetic and real-world networks. Experimental results show that the proposed tensor-based method considerably outperforms the existing anomaly detection methods

arXiv.org e-Print Archive

Tensor Completion Algorithms in Big Data Analytics

Author: Caverlee James
Ge Hancheng
Hu Xia
Song Qingquan
Publication venue
Publication date: 02/05/2018
Field of study

Tensor completion is a problem of filling the missing or unobserved entries of partially observed tensors. Due to the multidimensional character of tensors in describing complex datasets, tensor completion algorithms and their applications have received wide attention and achievement in areas like data mining, computer vision, signal processing, and neuroscience. In this survey, we provide a modern overview of recent advances in tensor completion algorithms from the perspective of big data analytics characterized by diverse variety, large volume, and high velocity. We characterize these advances from four perspectives: general tensor completion algorithms, tensor completion with auxiliary information (variety), scalable tensor completion algorithms (volume), and dynamic tensor completion algorithms (velocity). Further, we identify several tensor completion applications on real-world data-driven problems and present some common experimental frameworks popularized in the literature. Our goal is to summarize these popular methods and introduce them to researchers and practitioners for promoting future research and applications. We conclude with a discussion of key challenges and promising research directions in this community for future exploration

arXiv.org e-Print Archive

TLib: A Flexible C++ Tensor Framework for Numerical Tensor Calculus

Author: Bassoy Cem
Publication venue
Publication date: 28/11/2017
Field of study

Numerical tensor calculus comprise basic tensor operations such as the entrywise addition and contraction of higher-order tensors. We present, TLib, flexible tensor framework with generic tensor functions and tensor classes that assists users to implement generic and flexible tensor algorithms in C++. The number of dimensions, the extents of the dimensions of the tensors and the contraction modes of the tensor operations can be runtime variable. Our framework provides tensor classes that simplify the management of multidimensional data and utilization of tensor operations using object-oriented and generic programming techniques. Additional stream classes help the user to verify and compare of numerical results with MATLAB. Tensor operations are implemented with generic tensor functions and in terms of multidimensional iterator types only, decoupling data storage representation and computation. The user can combine tensor functions with different tensor types and extend the framework without further modification of the classes or functions. We discuss the design and implementation of the framework and demonstrate its usage with examples that have been discussed in the literature.Comment: 29 page

arXiv.org e-Print Archive

Discovering patterns of online popularity from time series

Author: Abeliuk Andrés
Ferrara Emilio
Muric Goran
Ozer Mert
Sapienza Anna
Publication venue
Publication date: 09/04/2019
Field of study

How is popularity gained online? Is being successful strictly related to rapidly becoming viral in an online platform or is it possible to acquire popularity in a steady and disciplined fashion? What are other temporal characteristics that can unveil the popularity of online content? To answer these questions, we leverage a multi-faceted temporal analysis of the evolution of popular online contents. Here, we present dipm-SC: a multi-dimensional shape-based time-series clustering algorithm with a heuristic to find the optimal number of clusters. First, we validate the accuracy of our algorithm on synthetic datasets generated from benchmark time series models. Second, we show that dipm-SC can uncover meaningful clusters of popularity behaviors in a real-world Twitter dataset. By clustering the multidimensional time-series of the popularity of contents coupled with other domain-specific dimensions, we uncover two main patterns of popularity: bursty and steady temporal behaviors. Moreover, we find that the way popularity is gained over time has no significant impact on the final cumulative popularity

arXiv.org e-Print Archive

A review of heterogeneous data mining for brain disorders

Author: Cao Bokai
Kong Xiangnan
Yu Philip S.
Publication venue
Publication date: 05/08/2015
Field of study

With rapid advances in neuroimaging techniques, the research on brain disorder identification has become an emerging area in the data mining community. Brain disorder data poses many unique challenges for data mining research. For example, the raw data generated by neuroimaging experiments is in tensor representations, with typical characteristics of high dimensionality, structural complexity and nonlinear separability. Furthermore, brain connectivity networks can be constructed from the tensor data, embedding subtle interactions between brain regions. Other clinical measures are usually available reflecting the disease status from different perspectives. It is expected that integrating complementary information in the tensor data and the brain network data, and incorporating other clinical parameters will be potentially transformative for investigating disease mechanisms and for informing therapeutic interventions. Many research efforts have been devoted to this area. They have achieved great success in various applications, such as tensor-based modeling, subgraph pattern mining, multi-view feature analysis. In this paper, we review some recent data mining methods that are used for analyzing brain disorders

arXiv.org e-Print Archive

Transform-Based Multilinear Dynamical System for Tensor Time Series Analysis

Author: Liu Xiao-Yang
Lu Weijun
Sun Yue
Walid Anwar
Wu Qingwei
Publication venue
Publication date: 18/11/2018
Field of study

We propose a novel multilinear dynamical system (MLDS) in a transform domain, named

\mathcal{L}

-MLDS, to model tensor time series. With transformations applied to a tensor data, the latent multidimensional correlations among the frontal slices are built, and thus resulting in the computational independence in the transform domain. This allows the exact separability of the multi-dimensional problem into multiple smaller LDS problems. To estimate the system parameters, we utilize the expectation-maximization (EM) algorithm to determine the parameters of each LDS. Further,

\mathcal{L}

-MLDSs significantly reduce the model parameters and allows parallel processing. Our general

\mathcal{L}

-MLDS model is implemented based on different transforms: discrete Fourier transform, discrete cosine transform and discrete wavelet transform. Due to the nonlinearity of these transformations,

\mathcal{L}

-MLDS is able to capture the nonlinear correlations within the data unlike the MLDS \cite{rogers2013multilinear} which assumes multi-way linear correlations. Using four real datasets, the proposed

\mathcal{L}

-MLDS is shown to achieve much higher prediction accuracy than the state-of-the-art MLDS and LDS with an equal number of parameters under different noise models. In particular, the relative errors are reduced by

50\% \sim 99\%

. Simultaneously,

\mathcal{L}

-MLDS achieves an exponential improvement in the model's training time than MLDS

arXiv.org e-Print Archive

Parallel Active Subspace Decomposition for Scalable and Efficient Tensor Robust Principal Component Analysis

Author: Jiang Jonathan Q.
Ng Michael K.
Publication venue
Publication date: 28/12/2017
Field of study

Tensor robust principal component analysis (TRPCA) has received a substantial amount of attention in various fields. Most existing methods, normally relying on tensor nuclear norm minimization, need to pay an expensive computational cost due to multiple singular value decompositions (SVDs) at each iteration. To overcome the drawback, we propose a scalable and efficient method, named Parallel Active Subspace Decomposition (PASD), which divides the unfolding along each mode of the tensor into a columnwise orthonormal matrix (active subspace) and another small-size matrix in parallel. Such a transformation leads to a nonconvex optimization problem in which the scale of nulcear norm minimization is generally much smaller than that in the original problem. Furthermore, we introduce an alternating direction method of multipliers (ADMM) method to solve the reformulated problem and provide rigorous analyses for its convergence and suboptimality. Experimental results on synthetic and real-world data show that our algorithm is more accurate than the state-of-the-art approaches, and is orders of magnitude faster.Comment: 19 pages, 2 figures, 2 table

arXiv.org e-Print Archive

PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite

Author: Barker Kevin
Li Ang
Li Jiajia
Ma Yuchen
Wu Xiaolong
Publication venue
Publication date: 08/02/2019
Field of study

Tensor methods have gained increasingly attention from various applications, including machine learning, quantum chemistry, healthcare analytics, social network analysis, data mining, and signal processing, to name a few. Sparse tensors and their algorithms become critical to further improve the performance of these methods and enhance the interpretability of their output. This work presents a sparse tensor algorithm benchmark suite (PASTA) for single- and multi-core CPUs. To the best of our knowledge, this is the first benchmark suite for sparse tensor world. PASTA targets on: 1) helping application users to evaluate different computer systems using its representative computational workloads; 2) providing insights to better utilize existed computer architecture and systems and inspiration for the future design. This benchmark suite is publicly released https://gitlab.com/tensorworld/pasta

arXiv.org e-Print Archive