376 research outputs found
Proximal Symmetric Non-negative Latent Factor Analysis: A Novel Approach to Highly-Accurate Representation of Undirected Weighted Networks
An Undirected Weighted Network (UWN) is commonly found in big data-related
applications. Note that such a network's information connected with its nodes,
and edges can be expressed as a Symmetric, High-Dimensional and Incomplete
(SHDI) matrix. However, existing models fail in either modeling its intrinsic
symmetry or low-data density, resulting in low model scalability or
representation learning ability. For addressing this issue, a Proximal
Symmetric Nonnegative Latent-factor-analysis (PSNL) model is proposed. It
incorporates a proximal term into symmetry-aware and data density-oriented
objective function for high representation accuracy. Then an adaptive
Alternating Direction Method of Multipliers (ADMM)-based learning scheme is
implemented through a Tree-structured of Parzen Estimators (TPE) method for
high computational efficiency. Empirical studies on four UWNs demonstrate that
PSNL achieves higher accuracy gain than state-of-the-art models, as well as
highly competitive computational efficiency
A Dynamic Linear Bias Incorporation Scheme for Nonnegative Latent Factor Analysis
High-Dimensional and Incomplete (HDI) data is commonly encountered in big
data-related applications like social network services systems, which are
concerning the limited interactions among numerous nodes. Knowledge acquisition
from HDI data is a vital issue in the domain of data science due to their
embedded rich patterns like node behaviors, where the fundamental task is to
perform HDI data representation learning. Nonnegative Latent Factor Analysis
(NLFA) models have proven to possess the superiority to address this issue,
where a linear bias incorporation (LBI) scheme is important in present the
training overshooting and fluctuation, as well as preventing the model from
premature convergence. However, existing LBI schemes are all statistic ones
where the linear biases are fixed, which significantly restricts the
scalability of the resultant NLFA model and results in loss of representation
learning ability to HDI data. Motivated by the above discoveries, this paper
innovatively presents the dynamic linear bias incorporation (DLBI) scheme. It
firstly extends the linear bias vectors into matrices, and then builds a binary
weight matrix to switch the active/inactive states of the linear biases. The
weight matrix's each entry switches between the binary states dynamically
corresponding to the linear bias value variation, thereby establishing the
dynamic linear biases for an NLFA model. Empirical studies on three HDI
datasets from real applications demonstrate that the proposed DLBI-based NLFA
model obtains higher representation accuracy several than state-of-the-art
models do, as well as highly-competitive computational efficiency.Comment: arXiv admin note: substantial text overlap with arXiv:2306.03911,
arXiv:2302.12122, arXiv:2306.0364
An Online Sparse Streaming Feature Selection Algorithm
Online streaming feature selection (OSFS), which conducts feature selection
in an online manner, plays an important role in dealing with high-dimensional
data. In many real applications such as intelligent healthcare platform,
streaming feature always has some missing data, which raises a crucial
challenge in conducting OSFS, i.e., how to establish the uncertain relationship
between sparse streaming features and labels. Unfortunately, existing OSFS
algorithms never consider such uncertain relationship. To fill this gap, we in
this paper propose an online sparse streaming feature selection with
uncertainty (OS2FSU) algorithm. OS2FSU consists of two main parts: 1) latent
factor analysis is utilized to pre-estimate the missing data in sparse
streaming features before con-ducting feature selection, and 2) fuzzy logic and
neighborhood rough set are employed to alleviate the uncertainty between
estimated streaming features and labels during conducting feature selection. In
the experiments, OS2FSU is compared with five state-of-the-art OSFS algorithms
on six real datasets. The results demonstrate that OS2FSU outperforms its
competitors when missing data are encountered in OSFS
Large-scale Dynamic Network Representation via Tensor Ring Decomposition
Large-scale Dynamic Networks (LDNs) are becoming increasingly important in
the Internet age, yet the dynamic nature of these networks captures the
evolution of the network structure and how edge weights change over time,
posing unique challenges for data analysis and modeling. A Latent Factorization
of Tensors (LFT) model facilitates efficient representation learning for a LDN.
But the existing LFT models are almost based on Canonical Polyadic
Factorization (CPF). Therefore, this work proposes a model based on Tensor Ring
(TR) decomposition for efficient representation learning for a LDN.
Specifically, we incorporate the principle of single latent factor-dependent,
non-negative, and multiplicative update (SLF-NMU) into the TR decomposition
model, and analyze the particular bias form of TR decomposition. Experimental
studies on two real LDNs demonstrate that the propose method achieves higher
accuracy than existing models
Exact Clustering of Weighted Graphs via Semidefinite Programming
As a model problem for clustering, we consider the densest k-disjoint-clique
problem of partitioning a weighted complete graph into k disjoint subgraphs
such that the sum of the densities of these subgraphs is maximized. We
establish that such subgraphs can be recovered from the solution of a
particular semidefinite relaxation with high probability if the input graph is
sampled from a distribution of clusterable graphs. Specifically, the
semidefinite relaxation is exact if the graph consists of k large disjoint
subgraphs, corresponding to clusters, with weight concentrated within these
subgraphs, plus a moderate number of outliers. Further, we establish that if
noise is weakly obscuring these clusters, i.e, the between-cluster edges are
assigned very small weights, then we can recover significantly smaller
clusters. For example, we show that in approximately sparse graphs, where the
between-cluster weights tend to zero as the size n of the graph tends to
infinity, we can recover clusters of size polylogarithmic in n. Empirical
evidence from numerical simulations is also provided to support these
theoretical phase transitions to perfect recovery of the cluster structure
Centrality measures for graphons: Accounting for uncertainty in networks
As relational datasets modeled as graphs keep increasing in size and their
data-acquisition is permeated by uncertainty, graph-based analysis techniques
can become computationally and conceptually challenging. In particular, node
centrality measures rely on the assumption that the graph is perfectly known --
a premise not necessarily fulfilled for large, uncertain networks. Accordingly,
centrality measures may fail to faithfully extract the importance of nodes in
the presence of uncertainty. To mitigate these problems, we suggest a
statistical approach based on graphon theory: we introduce formal definitions
of centrality measures for graphons and establish their connections to
classical graph centrality measures. A key advantage of this approach is that
centrality measures defined at the modeling level of graphons are inherently
robust to stochastic variations of specific graph realizations. Using the
theory of linear integral operators, we define degree, eigenvector, Katz and
PageRank centrality functions for graphons and establish concentration
inequalities demonstrating that graphon centrality functions arise naturally as
limits of their counterparts defined on sequences of graphs of increasing size.
The same concentration inequalities also provide high-probability bounds
between the graphon centrality functions and the centrality measures on any
sampled graph, thereby establishing a measure of uncertainty of the measured
centrality score. The same concentration inequalities also provide
high-probability bounds between the graphon centrality functions and the
centrality measures on any sampled graph, thereby establishing a measure of
uncertainty of the measured centrality score.Comment: Authors ordered alphabetically, all authors contributed equally. 21
pages, 7 figure
- …