49,094 research outputs found
Single-cell RNA-seq data analysis using graph autoencoders and graph attention networks
With the development of high-throughput sequencing technology, the scale of single-cell RNA sequencing (scRNA-seq) data has surged. Its data are typically high-dimensional, with high dropout noise and high sparsity. Therefore, gene imputation and cell clustering analysis of scRNA-seq data is increasingly important. Statistical or traditional machine learning methods are inefficient, and improved accuracy is needed. The methods based on deep learning cannot directly process non-Euclidean spatial data, such as cell diagrams. In this study, we developed scGAEGAT, a multi-modal model with graph autoencoders and graph attention networks for scRNA-seq analysis based on graph neural networks. Cosine similarity, median L1 distance, and root-mean-squared error were used to measure the gene imputation performance of different methods for comparison with scGAEGAT. Furthermore, adjusted mutual information, normalized mutual information, completeness score, and Silhouette coefficient score were used to measure the cell clustering performance of different methods for comparison with scGAEGAT. Experimental results demonstrated promising performance of the scGAEGAT model in gene imputation and cell clustering prediction on four scRNA-seq data sets with gold-standard cell labels
Enhancing Graph Collaborative Filtering via Uniformly Co-Clustered Intent Modeling
Graph-based collaborative filtering has emerged as a powerful paradigm for
delivering personalized recommendations. Despite their demonstrated
effectiveness, these methods often neglect the underlying intents of users,
which constitute a pivotal facet of comprehensive user interests. Consequently,
a series of approaches have arisen to tackle this limitation by introducing
independent intent representations. However, these approaches fail to capture
the intricate relationships between intents of different users and the
compatibility between user intents and item properties.
To remedy the above issues, we propose a novel method, named uniformly
co-clustered intent modeling. Specifically, we devise a uniformly contrastive
intent modeling module to bring together the embeddings of users with similar
intents and items with similar properties. This module aims to model the
nuanced relations between intents of different users and properties of
different items, especially those unreachable to each other on the user-item
graph. To model the compatibility between user intents and item properties, we
design the user-item co-clustering module, maximizing the mutual information of
co-clusters of users and items. This approach is substantiated through
theoretical validation, establishing its efficacy in modeling compatibility to
enhance the mutual information between user and item representations.
Comprehensive experiments on various real-world datasets verify the
effectiveness of the proposed framework.Comment: In submissio
A Robust Unified Graph Model Based on Molecular Data Binning for Subtype Discovery in High-dimensional Spaces
Machine learning (ML) is a subfield of artificial intelligence (AI) that has already revolutionised the world around us. It is a widely employed process for discovering patterns and groups within datasets. It has a wide range of applications including disease subtyping, which aims to discover intrinsic subtypes of disease in large-scale unlabelled data. Whilst the groups discovered in multi-view high-dimensional data by ML algorithms are promising, their capacity to identify pertinent and meaningful groups is limited by the presence of data variability and outliers. Since outlier values represent potential but unlikely outcomes, they are statistically and philosophically fascinating.
Therefore, the primary aim of this thesis was to propose a robust approach that discovers meaningful groups while considering the presence of data variability and outliers in the data. To achieve this aim, a novel robust approach (ROMDEX) was developed that utilised the proposed intermediate graph models (IMGs) for robust computation of proximity between observations in the data. Finally, a robust multi-view graph-based clustering approach was developed based on ROMDEX that improved the discovery of meaningful groups that were hidden behind the noise in the data.
The proposed approach was validated on real-world, and synthetic data for disease subtyping. Additionally, the stability of the approach was assessed by evaluating its performance across different levels of noise in clustering data. The results were evaluated through Kaplan-Meier survival time analysis for disease subtyping. Also, the concordance index (CI) and normalised mutual information (NMI) are used to evaluate the predictive ability of the proposed clustering model. Additionally, the accuracy, Kappa statistic and rand index are computed to evaluate the clustering stability against various levels of Gaussian noise. The proposed approach outperformed the existing state-of-the-art approaches MRGC, PINS, SNF, Consensus Clustering, and Icluster+ on these datasets. The findings for all datasets were outstanding, demonstrating the predictive ability of the proposed unsupervised graph-based clustering approach
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
Co-Clustering Network-Constrained Trajectory Data
Recently, clustering moving object trajectories kept gaining interest from
both the data mining and machine learning communities. This problem, however,
was studied mainly and extensively in the setting where moving objects can move
freely on the euclidean space. In this paper, we study the problem of
clustering trajectories of vehicles whose movement is restricted by the
underlying road network. We model relations between these trajectories and road
segments as a bipartite graph and we try to cluster its vertices. We
demonstrate our approaches on synthetic data and show how it could be useful in
inferring knowledge about the flow dynamics and the behavior of the drivers
using the road network
- …