Search CORE

2,285 research outputs found

N2VSCDNNR: A Local Recommender System Based on Node2vec and Rich Information Network

Author: Chen Jinyin
Fan Lu
Lin Xiang
Wu Yangyang
Xuan Qi
Yu Shanqing
Zheng Haibin
Publication venue
Publication date: 12/04/2019
Field of study

Recommender systems are becoming more and more important in our daily lives. However, traditional recommendation methods are challenged by data sparsity and efficiency, as the numbers of users, items, and interactions between the two in many real-world applications increase fast. In this work, we propose a novel clustering recommender system based on node2vec technology and rich information network, namely N2VSCDNNR, to solve these challenges. In particular, we use a bipartite network to construct the user-item network, and represent the interactions among users (or items) by the corresponding one-mode projection network. In order to alleviate the data sparsity problem, we enrich the network structure according to user and item categories, and construct the one-mode projection category network. Then, considering the data sparsity problem in the network, we employ node2vec to capture the complex latent relationships among users (or items) from the corresponding one-mode projection category network. Moreover, considering the dependency on parameter settings and information loss problem in clustering methods, we use a novel spectral clustering method, which is based on dynamic nearest-neighbors (DNN) and a novel automatically determining cluster number (ADCN) method that determines the cluster centers based on the normal distribution method, to cluster the users and items separately. After clustering, we propose the two-phase personalized recommendation to realize the personalized recommendation of items for each user. A series of experiments validate the outstanding performance of our N2VSCDNNR over several advanced embedding and side information based recommendation algorithms. Meanwhile, N2VSCDNNR seems to have lower time complexity than the baseline methods in online recommendations, indicating its potential to be widely applied in large-scale systems

arXiv.org e-Print Archive

Structural and Functional Discovery in Dynamic Networks with Non-negative Matrix Factorization

Author: Mankad Shawn
Michailidis George
Publication venue: 'American Physical Society (APS)'
Publication date: 30/05/2013
Field of study

Time series of graphs are increasingly prevalent in modern data and pose unique challenges to visual exploration and pattern extraction. This paper describes the development and application of matrix factorizations for exploration and time-varying community detection in time-evolving graph sequences. The matrix factorization model allows the user to home in on and display interesting, underlying structure and its evolution over time. The methods are scalable to weighted networks with a large number of time points or nodes, and can accommodate sudden changes to graph topology. Our techniques are demonstrated with several dynamic graph series from both synthetic and real world data, including citation and trade networks. These examples illustrate how users can steer the techniques and combine them with existing methods to discover and display meaningful patterns in sizable graphs over many time points.Comment: 16 pages, 17 figure

arXiv.org e-Print Archive

Network-based Distance Metric with Application to Discover Disease Subtypes in Cancer

Author: Chen Ping
Ding Wei
Qiang Jipeng
Quackenbush John
Publication venue
Publication date: 28/02/2017
Field of study

While we once thought of cancer as single monolithic diseases affecting a specific organ site, we now understand that there are many subtypes of cancer defined by unique patterns of gene mutations. These gene mutational data, which can be more reliably obtained than gene expression data, help to determine how the subtypes develop, evolve, and respond to therapies. Different from dense continuous-value gene expression data, which most existing cancer subtype discovery algorithms use, somatic mutational data are extremely sparse and heterogeneous, because there are less than 0.5\% mutated genes in discrete value 1/0 out of 20,000 human protein-coding genes, and identical mutated genes are rarely shared by cancer patients. Our focus is to search for cancer subtypes from extremely sparse and high dimensional gene mutational data in discrete 1 and 0 values using unsupervised learning. We propose a new network-based distance metric. We project cancer patients' mutational profile into their gene network structure and measure the distance between two patients using the similarity between genes and between the gene vertexes of the patients in the network. Experimental results in synthetic data and real-world data show that our approach outperforms the top competitors in cancer subtype discovery. Furthermore, our approach can identify cancer subtypes that cannot be detected by other clustering algorithms in real cancer data

arXiv.org e-Print Archive

Simultaneous Dimension Reduction and Clustering via the NMF-EM Algorithm

Author: Alquier Pierre
Carel Léna
Publication venue
Publication date: 05/06/2018
Field of study

Mixture models are among the most popular tools for clustering. However, when the dimension and the number of clusters is large, the estimation of the clusters become challenging, as well as their interpretation. Restriction on the parameters can be used to reduce the dimension. An example is given by mixture of factor analyzers for Gaussian mixtures. The extension of MFA to non-Gaussian mixtures is not straightforward. We propose a new constraint for parameters in non-Gaussian mixture model: the

K

components parameters are combinations of elements from a small dictionary, say

H

elements, with

H \ll K

. Including a nonnegative matrix factorization (NMF) in the EM algorithm allows us to simultaneously estimate the dictionary and the parameters of the mixture. We propose the acronym NMF-EM for this algorithm, implemented in the R package {\tt nmfem}. This original approach is motivated by passengers clustering from ticketing data: we apply NMF-EM to data from two Transdev public transport networks. In this case, the words are easily interpreted as typical slots in a timetable

arXiv.org e-Print Archive

Analysis of multiview legislative networks with structured matrix factorization: Does Twitter influence translate to the real world?

Author: Mankad Shawn
Michailidis George
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 29/01/2016
Field of study

The rise of social media platforms has fundamentally altered the public discourse by providing easy to use and ubiquitous forums for the exchange of ideas and opinions. Elected officials often use such platforms for communication with the broader public to disseminate information and engage with their constituencies and other public officials. In this work, we investigate whether Twitter conversations between legislators reveal their real-world position and influence by analyzing multiple Twitter networks that feature different types of link relations between the Members of Parliament (MPs) in the United Kingdom and an identical data set for politicians within Ireland. We develop and apply a matrix factorization technique that allows the analyst to emphasize nodes with contextual local network structures by specifying network statistics that guide the factorization solution. Leveraging only link relation data, we find that important politicians in Twitter networks are associated with real-world leadership positions, and that rankings from the proposed method are correlated with the number of future media headlines.Comment: Published at http://dx.doi.org/10.1214/15-AOAS858 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation

Author: Li Jundong
Liu Huan
Wu Liang
Publication venue
Publication date: 26/08/2018
Field of study

As opposed to manual feature engineering which is tedious and difficult to scale, network representation learning has attracted a surge of research interests as it automates the process of feature learning on graphs. The learned low-dimensional node vector representation is generalizable and eases the knowledge discovery process on graphs by enabling various off-the-shelf machine learning tools to be directly applied. Recent research has shown that the past decade of network embedding approaches either explicitly factorize a carefully designed matrix to obtain the low-dimensional node vector representation or are closely related to implicit matrix factorization, with the fundamental assumption that the factorized node connectivity matrix is low-rank. Nonetheless, the global low-rank assumption does not necessarily hold especially when the factorized matrix encodes complex node interactions, and the resultant single low-rank embedding matrix is insufficient to capture all the observed connectivity patterns. In this regard, we propose a novel multi-level network embedding framework BoostNE, which can learn multiple network embedding representations of different granularity from coarse to fine without imposing the prevalent global low-rank assumption. The proposed BoostNE method is also in line with the successful gradient boosting method in ensemble learning as multiple weak embeddings lead to a stronger and more effective one. We assess the effectiveness of the proposed BoostNE framework by comparing it with existing state-of-the-art network embedding methods on various datasets, and the experimental results corroborate the superiority of the proposed BoostNE network embedding framework

arXiv.org e-Print Archive

Clustered Multitask Nonnegative Matrix Factorization for Spectral Unmixing of Hyperspectral Data

Author: Khoshsokhan Sara
Rajabi Roozbeh
Zayyani Hadi
Publication venue
Publication date: 16/05/2019
Field of study

In this paper, the new algorithm based on clustered multitask network is proposed to solve spectral unmixing problem in hyperspectral imagery. In the proposed algorithm, the clustered network is employed. Each pixel in the hyperspectral image considered as a node in this network. The nodes in the network are clustered using the fuzzy c-means clustering method. Diffusion least mean square strategy has been used to optimize the proposed cost function. To evaluate the proposed method, experiments are conducted on synthetic and real datasets. Simulation results based on spectral angle distance, abundance angle distance and reconstruction error metrics illustrate the advantage of the proposed algorithm compared with other methods.Comment: one column, 22 pages, 12 figures, journal. arXiv admin note: substantial text overlap with arXiv:1902.07593, arXiv:1812.1078

arXiv.org e-Print Archive

Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data

Author: Brooks Hayley
Cheng Mark
Fernandez-Granda Carlos
Heeger David J.
Liu Sheng
Mackey Wayne
Tabak Esteban G.
Publication venue
Publication date: 06/11/2019
Field of study

We propose a nonparametric model for time series with missing data based on low-rank matrix factorization. The model expresses each instance in a set of time series as a linear combination of a small number of shared basis functions. Constraining the functions and the corresponding coefficients to be nonnegative yields an interpretable low-dimensional representation of the data. A time-smoothing regularization term ensures that the model captures meaningful trends in the data, instead of overfitting short-term fluctuations. The low-dimensional representation makes it possible to detect outliers and cluster the time series according to the interpretable features extracted by the model, and also to perform forecasting via kernel regression. We apply our methodology to a large real-world dataset of infant-sleep data gathered by caregivers with a mobile-phone app. Our analysis automatically extracts daily-sleep patterns consistent with the existing literature. This allows us to compute sleep-development trends for the cohort, which characterize the emergence of circadian sleep and different napping habits. We apply our methodology to detect anomalous individuals, to cluster the cohort into groups with different sleeping tendencies, and to obtain improved predictions of future sleep behavior.Comment: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstrac

arXiv.org e-Print Archive

Nonnegative Multi-level Network Factorization for Latent Factor Analysis

Author: Lu Jie
Luo Xiangfeng
Xuan Junyu
Zhang Guangquan
Publication venue
Publication date: 01/04/2015
Field of study

Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two optimized nonnegative matrices and has been widely used for unsupervised learning tasks such as product recommendation based on a rating matrix. However, although networks between nodes with the same nature exist, standard NMF overlooks them, e.g., the social network between users. This problem leads to comparatively low recommendation accuracy because these networks are also reflections of the nature of the nodes, such as the preferences of users in a social network. Also, social networks, as complex networks, have many different structures. Each structure is a composition of links between nodes and reflects the nature of nodes, so retaining the different network structures will lead to differences in recommendation performance. To investigate the impact of these network structures on the factorization, this paper proposes four multi-level network factorization algorithms based on the standard NMF, which integrates the vertical network (e.g., rating matrix) with the structures of horizontal network (e.g., user social network). These algorithms are carefully designed with corresponding convergence proofs to retain four desired network structures. Experiments on synthetic data show that the proposed algorithms are able to preserve the desired network structures as designed. Experiments on real-world data show that considering the horizontal networks improves the accuracy of document clustering and recommendation with standard NMF, and various structures show their differences in performance on these two tasks. These results can be directly used in document clustering and recommendation systems

arXiv.org e-Print Archive

Joint community and anomaly tracking in dynamic networks

Author: Baingana Brian
Giannakis Georgios B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/06/2015
Field of study

Most real-world networks exhibit community structure, a phenomenon characterized by existence of node clusters whose intra-edge connectivity is stronger than edge connectivities between nodes belonging to different clusters. In addition to facilitating a better understanding of network behavior, community detection finds many practical applications in diverse settings. Communities in online social networks are indicative of shared functional roles, or affiliation to a common socio-economic status, the knowledge of which is vital for targeted advertisement. In buyer-seller networks, community detection facilitates better product recommendations. Unfortunately, reliability of community assignments is hindered by anomalous user behavior often observed as unfair self-promotion, or "fake" highly-connected accounts created to promote fraud. The present paper advocates a novel approach for jointly tracking communities while detecting such anomalous nodes in time-varying networks. By postulating edge creation as the result of mutual community participation by node pairs, a dynamic factor model with anomalous memberships captured through a sparse outlier matrix is put forth. Efficient tracking algorithms suitable for both online and decentralized operation are developed. Experiments conducted on both synthetic and real network time series successfully unveil underlying communities and anomalous nodes.Comment: 13 page

arXiv.org e-Print Archive