2,285 research outputs found
N2VSCDNNR: A Local Recommender System Based on Node2vec and Rich Information Network
Recommender systems are becoming more and more important in our daily lives.
However, traditional recommendation methods are challenged by data sparsity and
efficiency, as the numbers of users, items, and interactions between the two in
many real-world applications increase fast. In this work, we propose a novel
clustering recommender system based on node2vec technology and rich information
network, namely N2VSCDNNR, to solve these challenges. In particular, we use a
bipartite network to construct the user-item network, and represent the
interactions among users (or items) by the corresponding one-mode projection
network. In order to alleviate the data sparsity problem, we enrich the network
structure according to user and item categories, and construct the one-mode
projection category network. Then, considering the data sparsity problem in the
network, we employ node2vec to capture the complex latent relationships among
users (or items) from the corresponding one-mode projection category network.
Moreover, considering the dependency on parameter settings and information loss
problem in clustering methods, we use a novel spectral clustering method, which
is based on dynamic nearest-neighbors (DNN) and a novel automatically
determining cluster number (ADCN) method that determines the cluster centers
based on the normal distribution method, to cluster the users and items
separately. After clustering, we propose the two-phase personalized
recommendation to realize the personalized recommendation of items for each
user. A series of experiments validate the outstanding performance of our
N2VSCDNNR over several advanced embedding and side information based
recommendation algorithms. Meanwhile, N2VSCDNNR seems to have lower time
complexity than the baseline methods in online recommendations, indicating its
potential to be widely applied in large-scale systems
Structural and Functional Discovery in Dynamic Networks with Non-negative Matrix Factorization
Time series of graphs are increasingly prevalent in modern data and pose
unique challenges to visual exploration and pattern extraction. This paper
describes the development and application of matrix factorizations for
exploration and time-varying community detection in time-evolving graph
sequences. The matrix factorization model allows the user to home in on and
display interesting, underlying structure and its evolution over time. The
methods are scalable to weighted networks with a large number of time points or
nodes, and can accommodate sudden changes to graph topology. Our techniques are
demonstrated with several dynamic graph series from both synthetic and real
world data, including citation and trade networks. These examples illustrate
how users can steer the techniques and combine them with existing methods to
discover and display meaningful patterns in sizable graphs over many time
points.Comment: 16 pages, 17 figure
Network-based Distance Metric with Application to Discover Disease Subtypes in Cancer
While we once thought of cancer as single monolithic diseases affecting a
specific organ site, we now understand that there are many subtypes of cancer
defined by unique patterns of gene mutations. These gene mutational data, which
can be more reliably obtained than gene expression data, help to determine how
the subtypes develop, evolve, and respond to therapies. Different from dense
continuous-value gene expression data, which most existing cancer subtype
discovery algorithms use, somatic mutational data are extremely sparse and
heterogeneous, because there are less than 0.5\% mutated genes in discrete
value 1/0 out of 20,000 human protein-coding genes, and identical mutated genes
are rarely shared by cancer patients.
Our focus is to search for cancer subtypes from extremely sparse and high
dimensional gene mutational data in discrete 1 and 0 values using unsupervised
learning. We propose a new network-based distance metric. We project cancer
patients' mutational profile into their gene network structure and measure the
distance between two patients using the similarity between genes and between
the gene vertexes of the patients in the network. Experimental results in
synthetic data and real-world data show that our approach outperforms the top
competitors in cancer subtype discovery. Furthermore, our approach can identify
cancer subtypes that cannot be detected by other clustering algorithms in real
cancer data
Simultaneous Dimension Reduction and Clustering via the NMF-EM Algorithm
Mixture models are among the most popular tools for clustering. However, when
the dimension and the number of clusters is large, the estimation of the
clusters become challenging, as well as their interpretation. Restriction on
the parameters can be used to reduce the dimension. An example is given by
mixture of factor analyzers for Gaussian mixtures. The extension of MFA to
non-Gaussian mixtures is not straightforward. We propose a new constraint for
parameters in non-Gaussian mixture model: the components parameters are
combinations of elements from a small dictionary, say elements, with . Including a nonnegative matrix factorization (NMF) in the EM algorithm
allows us to simultaneously estimate the dictionary and the parameters of the
mixture. We propose the acronym NMF-EM for this algorithm, implemented in the R
package {\tt nmfem}. This original approach is motivated by passengers
clustering from ticketing data: we apply NMF-EM to data from two Transdev
public transport networks. In this case, the words are easily interpreted as
typical slots in a timetable
Analysis of multiview legislative networks with structured matrix factorization: Does Twitter influence translate to the real world?
The rise of social media platforms has fundamentally altered the public
discourse by providing easy to use and ubiquitous forums for the exchange of
ideas and opinions. Elected officials often use such platforms for
communication with the broader public to disseminate information and engage
with their constituencies and other public officials. In this work, we
investigate whether Twitter conversations between legislators reveal their
real-world position and influence by analyzing multiple Twitter networks that
feature different types of link relations between the Members of Parliament
(MPs) in the United Kingdom and an identical data set for politicians within
Ireland. We develop and apply a matrix factorization technique that allows the
analyst to emphasize nodes with contextual local network structures by
specifying network statistics that guide the factorization solution. Leveraging
only link relation data, we find that important politicians in Twitter networks
are associated with real-world leadership positions, and that rankings from the
proposed method are correlated with the number of future media headlines.Comment: Published at http://dx.doi.org/10.1214/15-AOAS858 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation
As opposed to manual feature engineering which is tedious and difficult to
scale, network representation learning has attracted a surge of research
interests as it automates the process of feature learning on graphs. The
learned low-dimensional node vector representation is generalizable and eases
the knowledge discovery process on graphs by enabling various off-the-shelf
machine learning tools to be directly applied. Recent research has shown that
the past decade of network embedding approaches either explicitly factorize a
carefully designed matrix to obtain the low-dimensional node vector
representation or are closely related to implicit matrix factorization, with
the fundamental assumption that the factorized node connectivity matrix is
low-rank. Nonetheless, the global low-rank assumption does not necessarily hold
especially when the factorized matrix encodes complex node interactions, and
the resultant single low-rank embedding matrix is insufficient to capture all
the observed connectivity patterns. In this regard, we propose a novel
multi-level network embedding framework BoostNE, which can learn multiple
network embedding representations of different granularity from coarse to fine
without imposing the prevalent global low-rank assumption. The proposed BoostNE
method is also in line with the successful gradient boosting method in ensemble
learning as multiple weak embeddings lead to a stronger and more effective one.
We assess the effectiveness of the proposed BoostNE framework by comparing it
with existing state-of-the-art network embedding methods on various datasets,
and the experimental results corroborate the superiority of the proposed
BoostNE network embedding framework
Clustered Multitask Nonnegative Matrix Factorization for Spectral Unmixing of Hyperspectral Data
In this paper, the new algorithm based on clustered multitask network is
proposed to solve spectral unmixing problem in hyperspectral imagery. In the
proposed algorithm, the clustered network is employed. Each pixel in the
hyperspectral image considered as a node in this network. The nodes in the
network are clustered using the fuzzy c-means clustering method. Diffusion
least mean square strategy has been used to optimize the proposed cost
function. To evaluate the proposed method, experiments are conducted on
synthetic and real datasets. Simulation results based on spectral angle
distance, abundance angle distance and reconstruction error metrics illustrate
the advantage of the proposed algorithm compared with other methods.Comment: one column, 22 pages, 12 figures, journal. arXiv admin note:
substantial text overlap with arXiv:1902.07593, arXiv:1812.1078
Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data
We propose a nonparametric model for time series with missing data based on
low-rank matrix factorization. The model expresses each instance in a set of
time series as a linear combination of a small number of shared basis
functions. Constraining the functions and the corresponding coefficients to be
nonnegative yields an interpretable low-dimensional representation of the data.
A time-smoothing regularization term ensures that the model captures meaningful
trends in the data, instead of overfitting short-term fluctuations. The
low-dimensional representation makes it possible to detect outliers and cluster
the time series according to the interpretable features extracted by the model,
and also to perform forecasting via kernel regression. We apply our methodology
to a large real-world dataset of infant-sleep data gathered by caregivers with
a mobile-phone app. Our analysis automatically extracts daily-sleep patterns
consistent with the existing literature. This allows us to compute
sleep-development trends for the cohort, which characterize the emergence of
circadian sleep and different napping habits. We apply our methodology to
detect anomalous individuals, to cluster the cohort into groups with different
sleeping tendencies, and to obtain improved predictions of future sleep
behavior.Comment: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended
Abstrac
Nonnegative Multi-level Network Factorization for Latent Factor Analysis
Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two
optimized nonnegative matrices and has been widely used for unsupervised
learning tasks such as product recommendation based on a rating matrix.
However, although networks between nodes with the same nature exist, standard
NMF overlooks them, e.g., the social network between users. This problem leads
to comparatively low recommendation accuracy because these networks are also
reflections of the nature of the nodes, such as the preferences of users in a
social network. Also, social networks, as complex networks, have many different
structures. Each structure is a composition of links between nodes and reflects
the nature of nodes, so retaining the different network structures will lead to
differences in recommendation performance. To investigate the impact of these
network structures on the factorization, this paper proposes four multi-level
network factorization algorithms based on the standard NMF, which integrates
the vertical network (e.g., rating matrix) with the structures of horizontal
network (e.g., user social network). These algorithms are carefully designed
with corresponding convergence proofs to retain four desired network
structures. Experiments on synthetic data show that the proposed algorithms are
able to preserve the desired network structures as designed. Experiments on
real-world data show that considering the horizontal networks improves the
accuracy of document clustering and recommendation with standard NMF, and
various structures show their differences in performance on these two tasks.
These results can be directly used in document clustering and recommendation
systems
Joint community and anomaly tracking in dynamic networks
Most real-world networks exhibit community structure, a phenomenon
characterized by existence of node clusters whose intra-edge connectivity is
stronger than edge connectivities between nodes belonging to different
clusters. In addition to facilitating a better understanding of network
behavior, community detection finds many practical applications in diverse
settings. Communities in online social networks are indicative of shared
functional roles, or affiliation to a common socio-economic status, the
knowledge of which is vital for targeted advertisement. In buyer-seller
networks, community detection facilitates better product recommendations.
Unfortunately, reliability of community assignments is hindered by anomalous
user behavior often observed as unfair self-promotion, or "fake"
highly-connected accounts created to promote fraud. The present paper advocates
a novel approach for jointly tracking communities while detecting such
anomalous nodes in time-varying networks. By postulating edge creation as the
result of mutual community participation by node pairs, a dynamic factor model
with anomalous memberships captured through a sparse outlier matrix is put
forth. Efficient tracking algorithms suitable for both online and decentralized
operation are developed. Experiments conducted on both synthetic and real
network time series successfully unveil underlying communities and anomalous
nodes.Comment: 13 page
- …