12,584 research outputs found
Incremental eigenpair computation for graph Laplacian matrices: theory and applications
The smallest eigenvalues and the associated eigenvectors (i.e., eigenpairs) of a graph Laplacian matrix have been widely used for spectral clustering and community detection. However, in real-life applications, the number of clusters or communities (say, K) is generally unknown a priori. Consequently, the majority of the existing methods either choose K heuristically or they repeat the clustering method with different choices of K and accept the best clustering result. The first option, more often, yields suboptimal result, while the second option is computationally expensive. In this work, we propose an incremental method for constructing the eigenspectrum of the graph Laplacian matrix. This method leverages the eigenstructure of graph Laplacian matrix to obtain the Kth smallest eigenpair of the Laplacian matrix given a collection of all previously compute
Data mining: a tool for detecting cyclical disturbances in supply networks.
Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means
clustering and the RULES-6 classification rule-learning algorithms, developed by the present authors’ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Online Unsupervised Multi-view Feature Selection
In the era of big data, it is becoming common to have data with multiple
modalities or coming from multiple sources, known as "multi-view data".
Multi-view data are usually unlabeled and come from high-dimensional spaces
(such as language vocabularies), unsupervised multi-view feature selection is
crucial to many applications. However, it is nontrivial due to the following
challenges. First, there are too many instances or the feature dimensionality
is too large. Thus, the data may not fit in memory. How to select useful
features with limited memory space? Second, how to select features from
streaming data and handles the concept drift? Third, how to leverage the
consistent and complementary information from different views to improve the
feature selection in the situation when the data are too big or come in as
streams? To the best of our knowledge, none of the previous works can solve all
the challenges simultaneously. In this paper, we propose an Online unsupervised
Multi-View Feature Selection, OMVFS, which deals with large-scale/streaming
multi-view data in an online fashion. OMVFS embeds unsupervised feature
selection into a clustering algorithm via NMF with sparse learning. It further
incorporates the graph regularization to preserve the local structure
information and help select discriminative features. Instead of storing all the
historical data, OMVFS processes the multi-view data chunk by chunk and
aggregates all the necessary information into several small matrices. By using
the buffering technique, the proposed OMVFS can reduce the computational and
storage cost while taking advantage of the structure information. Furthermore,
OMVFS can capture the concept drifts in the data streams. Extensive experiments
on four real-world datasets show the effectiveness and efficiency of the
proposed OMVFS method. More importantly, OMVFS is about 100 times faster than
the off-line methods
- …