34 research outputs found
Kernel discriminant analysis and clustering with parsimonious Gaussian process models
This work presents a family of parsimonious Gaussian process models which
allow to build, from a finite sample, a model-based classifier in an infinite
dimensional space. The proposed parsimonious models are obtained by
constraining the eigen-decomposition of the Gaussian processes modeling each
class. This allows in particular to use non-linear mapping functions which
project the observations into infinite dimensional spaces. It is also
demonstrated that the building of the classifier can be directly done from the
observation space through a kernel function. The proposed classification method
is thus able to classify data of various types such as categorical data,
functional data or networks. Furthermore, it is possible to classify mixed data
by combining different kernels. The methodology is as well extended to the
unsupervised classification case. Experimental results on various data sets
demonstrate the effectiveness of the proposed method
The discriminative functional mixture model for a comparative analysis of bike sharing systems
Bike sharing systems (BSSs) have become a means of sustainable intermodal
transport and are now proposed in many cities worldwide. Most BSSs also provide
open access to their data, particularly to real-time status reports on their
bike stations. The analysis of the mass of data generated by such systems is of
particular interest to BSS providers to update system structures and policies.
This work was motivated by interest in analyzing and comparing several European
BSSs to identify common operating patterns in BSSs and to propose practical
solutions to avoid potential issues. Our approach relies on the identification
of common patterns between and within systems. To this end, a model-based
clustering method, called FunFEM, for time series (or more generally functional
data) is developed. It is based on a functional mixture model that allows the
clustering of the data in a discriminative functional subspace. This model
presents the advantage in this context to be parsimonious and to allow the
visualization of the clustered systems. Numerical experiments confirm the good
behavior of FunFEM, particularly compared to state-of-the-art methods. The
application of FunFEM to BSS data from JCDecaux and the Transport for London
Initiative allows us to identify 10 general patterns, including pathological
ones, and to propose practical improvement strategies based on the system
comparison. The visualization of the clustered data within the discriminative
subspace turns out to be particularly informative regarding the system
efficiency. The proposed methodology is implemented in a package for the R
software, named funFEM, which is available on the CRAN. The package also
provides a subset of the data analyzed in this work.Comment: Published at http://dx.doi.org/10.1214/15-AOAS861 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Dissimilarity for functional data clustering based on smoothing parameter commutation.
Many studies measure the same type of information longitudinally on the same subject at multiple time points, and clustering of such functional data has many important applications. We propose a novel and easy method to implement dissimilarity measure for functional data clustering based on smoothing splines and smoothing parameter commutation. This method handles data observed at regular or irregular time points in the same way. We measure the dissimilarity between subjects based on varying curve estimates with pairwise commutation of smoothing parameters. The intuition is that smoothing parameters of smoothing splines reflect the inverse of the signal-to-noise ratios and that when applying an identical smoothing parameter the smoothed curves for two similar subjects are expected to be close. Our method takes into account the estimation uncertainty using smoothing parameter commutation and is not strongly affected by outliers. It can also be used for outlier detection. The effectiveness of our proposal is shown by simulations comparing it to other dissimilarity measures and by a real application to methadone dosage maintenance levels
CUDA-bigPSF: An optimized version of bigPSF accelerated with Graphics Processing Unit
Accurate and fast short-term load forecasting is crucial in efficiently managing energy production and distribution. As such, many different algorithms have been proposed to address this topic, including hybrid models that combine clustering with other forecasting techniques. One of these algorithms is bigPSF, an algorithm that combines K-means clustering and a similarity search optimized for its use in distributed environments. The work presented in this paper aims to improve the time required to execute the algorithm with two main contributions. First, some of the issues of the original proposal that limited the number of cores simultaneously used are studied and highlighted. Second, a version of the algorithm optimized for Graphics Processing Unit (GPU) is proposed, solving the previously mentioned issues while taking into account the GPU architecture and memory structure. Experimentation was done with seven years of real-world electric demand data from Uruguay. Results show that the proposed algorithm executed consistently faster than the original version, achieving speedups up to 500 times faster during the training phase.Funding for open access charge: Universidad de Granada / CBUAGrant PID2020-112495RB-C21 funded by MCIN/ AEI /10.13039/501100011033I + D + i FEDER 2020 project B-TIC-42-UGR2
Functional Factorial K-means Analysis
A new procedure for simultaneously finding the optimal cluster structure of
multivariate functional objects and finding the subspace to represent the
cluster structure is presented. The method is based on the -means criterion
for projected functional objects on a subspace in which a cluster structure
exists. An efficient alternating least-squares algorithm is described, and the
proposed method is extended to a regularized method for smoothness of weight
functions. To deal with the negative effect of the correlation of coefficient
matrix of the basis function expansion in the proposed algorithm, a two-step
approach to the proposed method is also described. Analyses of artificial and
real data demonstrate that the proposed method gives correct and interpretable
results compared with existing methods, the functional principal component
-means (FPCK) method and tandem clustering approach. It is also shown that
the proposed method can be considered complementary to FPCK.Comment: 39 pages, 17 figure