Search CORE

339 research outputs found

Massively Parallel Algorithms for the Stochastic Block Model

Author: Li Zelin
Peng Pan
Zhu Xianbin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science and statistics. Among others, the Stochastic Block Model (SBM) serves a canonical model for community detection and clustering, and the Massively Parallel Computation (MPC) model is a mathematical abstraction of real-world parallel computing systems, which provides a powerful computational framework for handling large-scale datasets. We study the problem of exactly recovering the communities in a graph generated from the SBM in the MPC model. Specifically, given kn vertices that are partitioned into k equal-sized clusters (i.e., each has size n), a graph on these kn vertices is randomly generated such that each pair of vertices is connected with probability p if they are in the same cluster and with probability q if not, where p > q > 0. We give MPC algorithms for the SBM in the (very general) s-space MPC model, where each machine is guaranteed to have memory s = ?(log n). Under the condition that (p-q)/?p ? ??(k^{1/2} n^{-1/2+1/(2(r-1))}) for any integer r ? [3,O(log n)], our first algorithm exactly recovers all the k clusters in O(kr log_s n) rounds using O?(m) total space, or in O(rlog_s n) rounds using O?(km) total space. If (p-q)/?p ? ??(k^{3/4} n^{-1/4}), our second algorithm achieves O(log_s n) rounds and O?(m) total space complexity. Both algorithms significantly improve upon a recent result of Cohen-Addad et al. [PODC\u2722], who gave algorithms that only work in the sublinear space MPC model, where each machine has local memory s = O(n^?) for some constant ? > 0, with a much stronger condition on p,q,k. Our algorithms are based on collecting the r-step neighborhood of each vertex and comparing the difference of some statistical information generated from the local neighborhoods for each pair of vertices. To implement the clustering algorithms in parallel, we present efficient approaches for implementing some basic graph operations in the s-space MPC model

Dagstuhl Research Online Publication Server

Massively Parallel Algorithms for the Stochastic Block Model

Author: Li Zelin
Peng Pan
Zhu Xianbin
Publication venue
Publication date: 14/08/2023
Field of study

Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science and statistics. We study the problem of exactly recovering the communities in a graph generated from the Stochastic Block Model (SBM) in the Massively Parallel Computation (MPC) model. Specifically, given

kn

vertices that are partitioned into

k

equal-sized clusters (i.e., each has size

n

), a graph on these

kn

vertices is randomly generated such that each pair of vertices is connected with probability~

p

if they are in the same cluster and with probability

q

if not, where

p > q > 0

. We give MPC algorithms for the SBM in the (very general) \emph{

s

-space MPC model}, where each machine has memory

s=\Omega(\log n)

. Under the condition that

\frac{p-q}{\sqrt{p}}\geq \tilde{\Omega}(k^{\frac12}n^{-\frac12+\frac{1}{2(r-1)}})

for any integer

r\in [3,O(\log n)]

, our first algorithm exactly recovers all the

k

clusters in

O(kr\log_s n)

rounds using

\tilde{O}(m)

total space, or in

O(r\log_s n)

rounds using

\tilde{O}(km)

total space. If

\frac{p-q}{\sqrt{p}}\geq \tilde{\Omega}(k^{\frac34}n^{-\frac14})

, our second algorithm achieves

O(\log_s n)

rounds and

\tilde{O}(m)

total space complexity. Both algorithms significantly improve upon a recent result of Cohen-Addad et al. [PODC'22], who gave algorithms that only work in the \emph{sublinear space MPC model}, where each machine has local memory~

s=O(n^{\delta})

for some constant

\delta>0

, with a much stronger condition on

p,q,k

. Our algorithms are based on collecting the

r

-step neighborhood of each vertex and comparing the difference of some statistical information generated from the local neighborhoods for each pair of vertices. To implement the clustering algorithms in parallel, we present efficient approaches for implementing some basic graph operations in the

s

-space MPC model

arXiv.org e-Print Archive

Security enhancement for NOMA-UAV networks

Author: Chen Yunfei
Li Yanxin
Lu Weidang
Wang Jingjing
Wang Xianbin
Zhang Shun
Zhao Nan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2020
Field of study

Owing to its distinctive merits, non-orthogonal multiple access (NOMA) techniques have been utilized in unmanned aerial vehicle (UAV) enabled wireless base stations to provide effective coverage for terrestrial users. However, the security of NOMA-UAV systems remains a challenge due to the line-of-sight air-to-ground channels and higher transmission power of weaker users in NOMA. In this paper, we propose two schemes to guarantee the secure transmission in UAV-NOMA networks. When only one user requires secure transmission, we derive the hovering position for the UAV and the power allocation to meet rate threshold of the secure user while maximizing the sum rate of remaining users. This disrupts the eavesdropping towards the secure user effectively. When multiple users require secure transmission, we further take the advantage of beamforming via multiple antennas at the UAV to guarantee their secure transmission. Due to the non-convexity of this problem, we convert it into a convex one for an iterative solution by using the second order cone programming. Finally, simulation results are provided to show the effectiveness of the proposed scheme

Warwick Research Archives Portal Repository

JOINTLY MODELING CONTINUOUS AND BINARY OUTCOMES FOR BOOLEAN OUTCOMES: AN APPLICATION TO MODELING HYPERTENSION

Author: Caffo Brian S.
Li Xianbin
Stuart Elizabeth
Publication venue: Collection of Biostatistics Research Archive
Publication date: 19/02/2008
Field of study

Binary outcomes defined by logical (Boolean) and or or operations on original continuous and discrete outcomes arise commonly in medical diagnoses and epidemiological research. In this manuscript,we consider applying the “or” operator to two continuous variables above a threshold and a binary variable, a setting that occurs frequently in the modeling of hypertension. Rather than modeling the resulting composite outcome defined by the logical operator, we present a method that models the original outcomes thus utilizing all information in the data, yet continues to yield conclusions on the composite scale. A stratified propensity score adjustment is proposed to account for confounding variables. A Mantel-Haenszel style combination of strata-specific odds ratios is proposed to evaluate a risk factor. The benefits of the proposed approach include easy handling of missing data and the ability to estimate the correlations between the original outcomes. We emphasize that the model retains the ability to evaluate odds ratios on the simpler and more easily interpreted composite scale. The approach is evaluated by Monte Carlo simulations. An example of the analysis of the impact of sleep disordered breathing on a standard composite hypertension measure, based on blood pressure measurements and medication usage,is included

Collection Of Biostatistics Research Archive

ON THE POTENTIAL FOR ILL-LOGIC WITH LOGICALLY DEFINED OUTCOMES

Author: Caffo Brian S.
Li Xianbin
Scharfstein Daniel O.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 07/07/2006
Field of study

Logically defined outcomes are commonly used in medical diagnoses and epidemiological research. When missing values in the original outcomes exist, the method of handling the missingness can have unintended consequences, even if the original outcomes are missing completely at random. Complicating the issue is that the default behavior of standard statistical packages yields different results. In this paper, we consider two binary original outcomes, which are missing completely at random. For estimating the prevalence of a logically defined or outcome, we discuss the properties of four estimators: complete case estimator, all-available case estimator, maximum likelihood estimator (MLE), and moment-based estimator. With the exception of the all-available case estimator, the estimators are consistent. A simulation study is conducted to evaluate the finite sample performance of the four estimators and an analysis of hypertension data from the Sleep Heart Health Study is presented

Collection Of Biostatistics Research Archive