12 research outputs found
Network-Adjusted Covariates for Community Detection
Community detection is a crucial task in network analysis that can be
significantly improved by incorporating subject-level information, i.e.
covariates. However, current methods often struggle with selecting tuning
parameters and analyzing low-degree nodes. In this paper, we introduce a novel
method that addresses these challenges by constructing network-adjusted
covariates, which leverage the network connections and covariates with a unique
weight to each node based on the node's degree. Spectral clustering on
network-adjusted covariates yields an exact recovery of community labels under
certain conditions, which is tuning-free and computationally efficient. We
present novel theoretical results about the strong consistency of our method
under degree-corrected stochastic blockmodels with covariates, even in the
presence of mis-specification and sparse communities with bounded degrees.
Additionally, we establish a general lower bound for the community detection
problem when both network and covariates are present, and it shows our method
is optimal up to a constant factor. Our method outperforms existing approaches
in simulations and a LastFM app user network, and provides interpretable
community structures in a statistics publication citation network where
of nodes are isolated.Comment: 48 page
Covariate-Assisted Community Detection on Sparse Networks
Community detection is an important problem when processing network data.
Traditionally, this is done by exploiting the connections between nodes, but
connections can be too sparse to detect communities in many real datasets. Node
covariates can be used to assist community detection; see Binkiewicz et al.
(2017); Weng and Feng (2022); Yan and Sarkar (2021); Yang et al. (2013).
However, how to combine covariates with network connections is challenging,
because covariates may be high-dimensional and inconsistent with community
labels. To study the relationship between covariates and communities, we
propose the degree corrected stochastic block model with node covariates
(DCSBM-NC). It allows degree heterogeneity among communities and inconsistent
labels between communities and covariates. Based on DCSBM-NC, we design the
adjusted neighbor-covariate (ANC) data matrix, which leverages covariate
information to assist community detection. We then propose the
covariate-assisted spectral clustering on ratios of singular vectors (CA-SCORE)
method on the ANC matrix. We prove that CA-SCORE successfully recovers
community labels when 1) the network is relatively dense; 2) the covariate
class labels match the community labels; 3) the data is a mixture of 1) and 2).
CA-SCORE has good performance on synthetic and real datasets. The algorithm is
implemented in the R(R Core Team (2021)) package CASCORE
Graph matching beyond perfectly-overlapping Erdős–Rényi random graphs
Graph matching is a fruitful area in terms of both algorithms and theories. Given two graphs G1=(V1,E1) and G2=(V2,E2), where V1 and V2 are the same or largely overlapped upon an unknown permutation π∗, graph matching is to seek the correct mapping π∗. In this paper, we exploit the degree information, which was previously used only in noiseless graphs and perfectly-overlapping Erdős–Rényi random graphs matching. We are concerned with graph matching of partially-overlapping graphs and stochastic block models, which are more useful in tackling real-life problems. We propose the edge exploited degree profile graph matching method and two refined variations. We conduct a thorough analysis of our proposed methods’ performances in a range of challenging scenarios, including coauthorship data set and a zebrafish neuron activity data set. Our methods are proved to be numerically superior than the state-of-the-art methods. The algorithms are implemented in the R (A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, 2020) package GMPro (GMPro: graph matching with degree profiles, 2020)
Development of a Personalized Pharmacologic Treatment Repository for Bronchial Asthma Based on the 2018 Guideline for the Diagnosis and Management of Bronchial Asthma in Primary Care(Practice Edition)
Bronchial asthma is a chronic inflammatory disease with high heterogeneity, polygenic inheritance, complex etiology and many complications. The effects of prevention and treatment for a bronchial asthma patient often depend on whether the patient has received a personalized health management. In order to be in line with the international management level of bronchial asthma, the updates in the 2018 Guideline for the Diagnosis and Management of Bronchial Asthma in Primary Care (Practice Edition, here in after referred to as the 2018 Guideline) contain the idea of early intervention, optimized medication regimens, and highlighted standardized management approach regarding bronchial asthma. To promote personalized pharmacologic management of bronchial asthma in primary care, and to provide online pre-, mid- and post-diagnosis pharmaceutical services for physicians, as well as personalized pharmacologic monitoring and management services for bronchial asthma patients in the community, pharmacists have developed a search engine with integrated functions of "pre-judgment, early warning and prediction" to collect medication information related to bronchial asthma using the information technology, according to the pharmacologic treatment path "initial treatment, long-term treatment, degradation principle" put forward in the 2018 Guideline, with the "one factory, one drug, one specification" individualized instruction as a basis
Bayesian Variational Inference in Keyword Identification and Multiple Instance Classification
This dissertation investigates (1) Variational Bayesian Semi-supervised Keyword Extraction and (2) Variational Bayesian Multimodal Multiple Instance Classification.
The expansion of textual data, stemming from various sources such as online product reviews and scholarly publications on scientific discoveries, has created a demand for the extraction of succinct yet comprehensive information. As a result, in recent years, efforts have been spent in developing novel methodologies for keyword extraction. Although many methods have been proposed to automatically extract keywords in the contexts of both unsupervised and fully supervised learning, how to effectively use partially observed keywords, such as author-specified keywords, remains an under-explored area. In Chapter 1, we propose a novel variational Bayesian semi-supervised (VBSS) keyword extraction approach, built on a recent Bayesian semi-supervised (BSS) technique that uses the information from a small set of known keywords to identify previously undetected ones. Our proposed VBSS method greatly enhances the computational efficiency of BSS via mean-field variational inference, coupled with data augmentation, which brings closed-form solutions at each step of the optimization process. Further, our numerical results show that VBSS offers enhanced accuracy for long texts and improved control over false discovery rates when compared with a list of state-of-the-art keyword extraction methods.
In Chapter 2, we apply mean-field variational inference on multiple instance learning (MIL). In MIL, objects are represented by bags of instances. Each instance shares the same feature set but has unique feature values. MIL aims to train models that predict bag-level outcomes based on these instances, making it a weakly supervised approach due to the lack of instance-level labels. While MIL methods focusing on binary classification are abundant, they often cannot identify which specific instances drive bag labels and have limited or little interpretability. Xiong et al. (2024) introduced MICProB, a Bayesian multiple instance classification (MIC) algorithm that addresses these issues. However, MICProB is computationally intensive and best suited for unimodal instances. To overcome these limitations, we propose a novel variational Bayesian multimodal MIC (vMMIC) algorithm. vMMIC handles diverse instance types and significantly improves computational efficiency through Bayesian variational inference, combined with data augmentation. We benchmark vMMIC against MICProB and many other MIC approaches on both simulated and real-world data. Results demonstrate vMMIC\u27s superior performance, computational efficiency, and interpretability
TurboID screening of ApxI toxin interactants identifies host proteins involved in Actinobacillus pleuropneumoniae-induced apoptosis of immortalized porcine alveolar macrophages
Abstract Actinobacillus pleuropneumoniae (APP) is a gram-negative pathogenic bacterium responsible for porcine contagious pleuropneumonia (PCP), which can cause porcine necrotizing and hemorrhagic pleuropneumonia. Actinobacillus pleuropneumoniae-RTX-toxin (Apx) is an APP virulence factor. APP secretes a total of four Apx toxins, among which, ApxI demonstrates strong hemolytic activity and cytotoxicity, causing lysis of porcine erythrocytes and apoptosis of porcine alveolar macrophages. However, the protein interaction network between this toxin and host cells is still poorly understood. TurboID mediates the biotinylation of endogenous proteins, thereby targeting specific proteins and local proteomes through gene fusion. We applied the TurboID enzyme-catalyzed proximity tagging method to identify and study host proteins in immortalized porcine alveolar macrophage (iPAM) cells that interact with the exotoxin ApxI of APP. His-tagged TurboID-ApxIA and TurboID recombinant proteins were expressed and purified. By mass spectrometry, 318 unique interacting proteins were identified in the TurboID ApxIA-treated group. Among them, only one membrane protein, caveolin-1 (CAV1), was identified. A co-immunoprecipitation assay confirmed that CAV1 can interact with ApxIA. In addition, overexpression and RNA interference experiments revealed that CAV1 was involved in ApxI toxin-induced apoptosis of iPAM cells. This study provided first-hand information about the proteome of iPAM cells interacting with the ApxI toxin of APP through the TurboID proximity labeling system, and identified a new host membrane protein involved in this interaction. These results lay a theoretical foundation for the clinical treatment of PCP
A Novel Splicing Mutation Leading to Wiskott-Aldrich Syndrome from a Family
Wiskott-Aldrich syndrome (WAS) is a rare X-linked recessive genetic disease characterized by clinical symptoms such as eczema, thrombocytopenia with small platelets, immune deficiency, prone to autoimmune diseases, and malignant tumors. This disease is caused by mutations of the WAS gene encoding WASprotein (WASP). The locus and type of mutations of the WAS gene and the expression quantity of WASP were strongly correlated with the clinical manifestations of patients. We found a novel mutation in the WAS gene (c.931+5G>C), which affected splicing to produce three abnormal mRNA, resulting in an abnormally truncated WASP. This mutation led to a reduction but not the elimination of the normal WASP population, resulting in causes X-linked thrombocytopenia (XLT) with mild clinical manifestations. Our findings revealed the pathogenic mechanism of this mutation