61,295 research outputs found
An Attention-based Collaboration Framework for Multi-View Network Representation Learning
Learning distributed node representations in networks has been attracting
increasing attention recently due to its effectiveness in a variety of
applications. Existing approaches usually study networks with a single type of
proximity between nodes, which defines a single view of a network. However, in
reality there usually exists multiple types of proximities between nodes,
yielding networks with multiple views. This paper studies learning node
representations for networks with multiple views, which aims to infer robust
node representations across different views. We propose a multi-view
representation learning approach, which promotes the collaboration of different
views and lets them vote for the robust representations. During the voting
process, an attention mechanism is introduced, which enables each node to focus
on the most informative views. Experimental results on real-world networks show
that the proposed approach outperforms existing state-of-the-art approaches for
network representation learning with a single view and other competitive
approaches with multiple views.Comment: CIKM 201
A Robust Unified Graph Model Based on Molecular Data Binning for Subtype Discovery in High-dimensional Spaces
Machine learning (ML) is a subfield of artificial intelligence (AI) that has already revolutionised the world around us. It is a widely employed process for discovering patterns and groups within datasets. It has a wide range of applications including disease subtyping, which aims to discover intrinsic subtypes of disease in large-scale unlabelled data. Whilst the groups discovered in multi-view high-dimensional data by ML algorithms are promising, their capacity to identify pertinent and meaningful groups is limited by the presence of data variability and outliers. Since outlier values represent potential but unlikely outcomes, they are statistically and philosophically fascinating.
Therefore, the primary aim of this thesis was to propose a robust approach that discovers meaningful groups while considering the presence of data variability and outliers in the data. To achieve this aim, a novel robust approach (ROMDEX) was developed that utilised the proposed intermediate graph models (IMGs) for robust computation of proximity between observations in the data. Finally, a robust multi-view graph-based clustering approach was developed based on ROMDEX that improved the discovery of meaningful groups that were hidden behind the noise in the data.
The proposed approach was validated on real-world, and synthetic data for disease subtyping. Additionally, the stability of the approach was assessed by evaluating its performance across different levels of noise in clustering data. The results were evaluated through Kaplan-Meier survival time analysis for disease subtyping. Also, the concordance index (CI) and normalised mutual information (NMI) are used to evaluate the predictive ability of the proposed clustering model. Additionally, the accuracy, Kappa statistic and rand index are computed to evaluate the clustering stability against various levels of Gaussian noise. The proposed approach outperformed the existing state-of-the-art approaches MRGC, PINS, SNF, Consensus Clustering, and Icluster+ on these datasets. The findings for all datasets were outstanding, demonstrating the predictive ability of the proposed unsupervised graph-based clustering approach
LATTE: Application Oriented Social Network Embedding
In recent years, many research works propose to embed the network structured
data into a low-dimensional feature space, where each node is represented as a
feature vector. However, due to the detachment of embedding process with
external tasks, the learned embedding results by most existing embedding models
can be ineffective for application tasks with specific objectives, e.g.,
community detection or information diffusion. In this paper, we propose study
the application oriented heterogeneous social network embedding problem.
Significantly different from the existing works, besides the network structure
preservation, the problem should also incorporate the objectives of external
applications in the objective function. To resolve the problem, in this paper,
we propose a novel network embedding framework, namely the "appLicAtion
orienTed neTwork Embedding" (Latte) model. In Latte, the heterogeneous network
structure can be applied to compute the node "diffusive proximity" scores,
which capture both local and global network structures. Based on these computed
scores, Latte learns the network representation feature vectors by extending
the autoencoder model model to the heterogeneous network scenario, which can
also effectively unite the objectives of network embedding and external
application tasks. Extensive experiments have been done on real-world
heterogeneous social network datasets, and the experimental results have
demonstrated the outstanding performance of Latte in learning the
representation vectors for specific application tasks.Comment: 11 Pages, 12 Figures, 1 Tabl
Attributed Network Embedding for Learning in a Dynamic Environment
Network embedding leverages the node proximity manifested to learn a
low-dimensional node vector representation for each node in the network. The
learned embeddings could advance various learning tasks such as node
classification, network clustering, and link prediction. Most, if not all, of
the existing works, are overwhelmingly performed in the context of plain and
static networks. Nonetheless, in reality, network structure often evolves over
time with addition/deletion of links and nodes. Also, a vast majority of
real-world networks are associated with a rich set of node attributes, and
their attribute values are also naturally changing, with the emerging of new
content patterns and the fading of old content patterns. These changing
characteristics motivate us to seek an effective embedding representation to
capture network and attribute evolving patterns, which is of fundamental
importance for learning in a dynamic environment. To our best knowledge, we are
the first to tackle this problem with the following two challenges: (1) the
inherently correlated network and node attributes could be noisy and
incomplete, it necessitates a robust consensus representation to capture their
individual properties and correlations; (2) the embedding learning needs to be
performed in an online fashion to adapt to the changes accordingly. In this
paper, we tackle this problem by proposing a novel dynamic attributed network
embedding framework - DANE. In particular, DANE first provides an offline
method for a consensus embedding and then leverages matrix perturbation theory
to maintain the freshness of the end embedding results in an online manner. We
perform extensive experiments on both synthetic and real attributed networks to
corroborate the effectiveness and efficiency of the proposed framework.Comment: 10 page
Multi-Scale Link Prediction
The automated analysis of social networks has become an important problem due
to the proliferation of social networks, such as LiveJournal, Flickr and
Facebook. The scale of these social networks is massive and continues to grow
rapidly. An important problem in social network analysis is proximity
estimation that infers the closeness of different users. Link prediction, in
turn, is an important application of proximity estimation. However, many
methods for computing proximity measures have high computational complexity and
are thus prohibitive for large-scale link prediction problems. One way to
address this problem is to estimate proximity measures via low-rank
approximation. However, a single low-rank approximation may not be sufficient
to represent the behavior of the entire network. In this paper, we propose
Multi-Scale Link Prediction (MSLP), a framework for link prediction, which can
handle massive networks. The basis idea of MSLP is to construct low rank
approximations of the network at multiple scales in an efficient manner. Based
on this approach, MSLP combines predictions at multiple scales to make robust
and accurate predictions. Experimental results on real-life datasets with more
than a million nodes show the superior performance and scalability of our
method.Comment: 20 pages, 10 figure
A Proximity-Aware Hierarchical Clustering of Faces
In this paper, we propose an unsupervised face clustering algorithm called
"Proximity-Aware Hierarchical Clustering" (PAHC) that exploits the local
structure of deep representations. In the proposed method, a similarity measure
between deep features is computed by evaluating linear SVM margins. SVMs are
trained using nearest neighbors of sample data, and thus do not require any
external training data. Clusters are then formed by thresholding the similarity
scores. We evaluate the clustering performance using three challenging
unconstrained face datasets, including Celebrity in Frontal-Profile (CFP),
IARPA JANUS Benchmark A (IJB-A), and JANUS Challenge Set 3 (JANUS CS3)
datasets. Experimental results demonstrate that the proposed approach can
achieve significant improvements over state-of-the-art methods. Moreover, we
also show that the proposed clustering algorithm can be applied to curate a set
of large-scale and noisy training dataset while maintaining sufficient amount
of images and their variations due to nuisance factors. The face verification
performance on JANUS CS3 improves significantly by finetuning a DCNN model with
the curated MS-Celeb-1M dataset which contains over three million face images
- …