7 research outputs found
How Many Communities Are There?
Stochastic blockmodels and variants thereof are among the most widely used
approaches to community detection for social networks and relational data. A
stochastic blockmodel partitions the nodes of a network into disjoint sets,
called communities. The approach is inherently related to clustering with
mixture models; and raises a similar model selection problem for the number of
communities. The Bayesian information criterion (BIC) is a popular solution,
however, for stochastic blockmodels, the conditional independence assumption
given the communities of the endpoints among different edges is usually
violated in practice. In this regard, we propose composite likelihood BIC
(CL-BIC) to select the number of communities, and we show it is robust against
possible misspecifications in the underlying stochastic blockmodel assumptions.
We derive the requisite methodology and illustrate the approach using both
simulated and real data. Supplementary materials containing the relevant
computer code are available online.Comment: 26 pages, 3 figure
SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models
We revisit sure independence screening procedures for variable selection in generalized linear models and the Cox proportional hazards model. Through the publicly available R package SIS, we provide a unified environment to carry out variable selection using iterative sure independence screening (ISIS) and all of its variants. For the regularization steps in the ISIS recruiting process, available penalties include the LASSO, SCAD, and MCP while the implemented variants for the screening steps are sample splitting, data-driven thresholding, and combinations thereof. Performance of these feature selection techniques is investigated by means of real and simulated data sets, where we find considerable improvements in terms of model selection and computational time between our algorithms and traditional penalized pseudo-likelihood methods applied directly to the full set of covariates
Antimicrobial resistance among migrants in Europe: a systematic review and meta-analysis
BACKGROUND: Rates of antimicrobial resistance (AMR) are rising globally and there is concern that increased migration is contributing to the burden of antibiotic resistance in Europe. However, the effect of migration on the burden of AMR in Europe has not yet been comprehensively examined. Therefore, we did a systematic review and meta-analysis to identify and synthesise data for AMR carriage or infection in migrants to Europe to examine differences in patterns of AMR across migrant groups and in different settings. METHODS: For this systematic review and meta-analysis, we searched MEDLINE, Embase, PubMed, and Scopus with no language restrictions from Jan 1, 2000, to Jan 18, 2017, for primary data from observational studies reporting antibacterial resistance in common bacterial pathogens among migrants to 21 European Union-15 and European Economic Area countries. To be eligible for inclusion, studies had to report data on carriage or infection with laboratory-confirmed antibiotic-resistant organisms in migrant populations. We extracted data from eligible studies and assessed quality using piloted, standardised forms. We did not examine drug resistance in tuberculosis and excluded articles solely reporting on this parameter. We also excluded articles in which migrant status was determined by ethnicity, country of birth of participants' parents, or was not defined, and articles in which data were not disaggregated by migrant status. Outcomes were carriage of or infection with antibiotic-resistant organisms. We used random-effects models to calculate the pooled prevalence of each outcome. The study protocol is registered with PROSPERO, number CRD42016043681. FINDINGS: We identified 2274 articles, of which 23 observational studies reporting on antibiotic resistance in 2319 migrants were included. The pooled prevalence of any AMR carriage or AMR infection in migrants was 25·4% (95% CI 19·1-31·8; I2 =98%), including meticillin-resistant Staphylococcus aureus (7·8%, 4·8-10·7; I2 =92%) and antibiotic-resistant Gram-negative bacteria (27·2%, 17·6-36·8; I2 =94%). The pooled prevalence of any AMR carriage or infection was higher in refugees and asylum seekers (33·0%, 18·3-47·6; I2 =98%) than in other migrant groups (6·6%, 1·8-11·3; I2 =92%). The pooled prevalence of antibiotic-resistant organisms was slightly higher in high-migrant community settings (33·1%, 11·1-55·1; I2 =96%) than in migrants in hospitals (24·3%, 16·1-32·6; I2 =98%). We did not find evidence of high rates of transmission of AMR from migrant to host populations. INTERPRETATION: Migrants are exposed to conditions favouring the emergence of drug resistance during transit and in host countries in Europe. Increased antibiotic resistance among refugees and asylum seekers and in high-migrant community settings (such as refugee camps and detention facilities) highlights the need for improved living conditions, access to health care, and initiatives to facilitate detection of and appropriate high-quality treatment for antibiotic-resistant infections during transit and in host countries. Protocols for the prevention and control of infection and for antibiotic surveillance need to be integrated in all aspects of health care, which should be accessible for all migrant groups, and should target determinants of AMR before, during, and after migration. FUNDING: UK National Institute for Health Research Imperial Biomedical Research Centre, Imperial College Healthcare Charity, the Wellcome Trust, and UK National Institute for Health Research Health Protection Research Unit in Healthcare-associated Infections and Antimictobial Resistance at Imperial College London
Surgical site infection after gastrointestinal surgery in high-income, middle-income, and low-income countries: a prospective, international, multicentre cohort study
Background: Surgical site infection (SSI) is one of the most common infections associated with health care, but its importance as a global health priority is not fully understood. We quantified the burden of SSI after gastrointestinal surgery in countries in all parts of the world.
Methods: This international, prospective, multicentre cohort study included consecutive patients undergoing elective or emergency gastrointestinal resection within 2-week time periods at any health-care facility in any country. Countries with participating centres were stratified into high-income, middle-income, and low-income groups according to the UN's Human Development Index (HDI). Data variables from the GlobalSurg 1 study and other studies that have been found to affect the likelihood of SSI were entered into risk adjustment models. The primary outcome measure was the 30-day SSI incidence (defined by US Centers for Disease Control and Prevention criteria for superficial and deep incisional SSI). Relationships with explanatory variables were examined using Bayesian multilevel logistic regression models. This trial is registered with ClinicalTrials.gov, number NCT02662231.
Findings: Between Jan 4, 2016, and July 31, 2016, 13 265 records were submitted for analysis. 12 539 patients from 343 hospitals in 66 countries were included. 7339 (58·5%) patient were from high-HDI countries (193 hospitals in 30 countries), 3918 (31·2%) patients were from middle-HDI countries (82 hospitals in 18 countries), and 1282 (10·2%) patients were from low-HDI countries (68 hospitals in 18 countries). In total, 1538 (12·3%) patients had SSI within 30 days of surgery. The incidence of SSI varied between countries with high (691 [9·4%] of 7339 patients), middle (549 [14·0%] of 3918 patients), and low (298 [23·2%] of 1282) HDI (p < 0·001). The highest SSI incidence in each HDI group was after dirty surgery (102 [17·8%] of 574 patients in high-HDI countries; 74 [31·4%] of 236 patients in middle-HDI countries; 72 [39·8%] of 181 patients in low-HDI countries). Following risk factor adjustment, patients in low-HDI countries were at greatest risk of SSI (adjusted odds ratio 1·60, 95% credible interval 1·05–2·37; p=0·030). 132 (21·6%) of 610 patients with an SSI and a microbiology culture result had an infection that was resistant to the prophylactic antibiotic used. Resistant infections were detected in 49 (16·6%) of 295 patients in high-HDI countries, in 37 (19·8%) of 187 patients in middle-HDI countries, and in 46 (35·9%) of 128 patients in low-HDI countries (p < 0·001).
Interpretation: Countries with a low HDI carry a disproportionately greater burden of SSI than countries with a middle or high HDI and might have higher rates of antibiotic resistance. In view of WHO recommendations on SSI prevention that highlight the absence of high-quality interventional research, urgent, pragmatic, randomised trials based in LMICs are needed to assess measures aiming to reduce this preventable complication
Recommended from our members
Advances in Model Selection Techniques with Applications to Statistical Network Analysis and Recommender Systems
This dissertation focuses on developing novel model selection techniques, the process by which a statistician selects one of a number of competing models of varying dimensions, under an array of different statistical assumptions on observed data. Traditionally, two main reasons have been advocated by researchers for performing model selection strategies over classical maximum likelihood estimates (MLEs). The first reason is prediction accuracy, where by shrinking or setting to zero some model parameters, one sacrifices the unbiasedness of MLEs for a reduced variance, which in turn leads to an overall improvement in predictive performance. The second reason relates to interpretability of the selected models in the presence of a large number of predictors, where in order to obtain a parsimonious representation exhibiting the relationship between the response and covariates, we are willing to sacrifice some of the smaller details brought in by spurious predictors.
In the first part of this work, we revisit the family of variable selection techniques known as sure independence screening procedures for generalized linear models and the Cox proportional hazards model. After clever combination of some of its most powerful variants, we propose new extensions based on the idea of sample splitting, data-driven thresholding, and combinations thereof. A publicly available package developed in the R statistical software demonstrates considerable improvements in terms of model selection and competitive computational time between our enhanced variable selection procedures and traditional penalized likelihood methods applied directly to the full set of covariates.
Next, we develop model selection techniques within the framework of statistical network analysis for two frequent problems arising in the context of stochastic blockmodels: community number selection and change-point detection. In the second part of this work, we propose a composite likelihood based approach for selecting the number of communities in stochastic blockmodels and its variants, with robustness consideration against possible misspecifications in the underlying conditional independence assumptions of the stochastic blockmodel. Several simulation studies, as well as two real data examples, demonstrate the superiority of our composite likelihood approach when compared to the traditional Bayesian Information Criterion or variational Bayes solutions. In the third part of this thesis, we extend our analysis on static network data to the case of dynamic stochastic blockmodels, where our model selection task is the segmentation of a time-varying network into temporal and spatial components by means of a change-point detection hypothesis testing problem. We propose a corresponding test statistic based on the idea of data aggregation across the different temporal layers through kernel-weighted adjacency matrices computed before and after each candidate change-point, and illustrate our approach on synthetic data and the Enron email corpus.
The matrix completion problem consists in the recovery of a low-rank data matrix based on a small sampling of its entries. In the final part of this dissertation, we extend prior work on nuclear norm regularization methods for matrix completion by incorporating a continuum of penalty functions between the convex nuclear norm and nonconvex rank functions. We propose an algorithmic framework for computing a family of nonconvex penalized matrix completion problems with warm-starts, and present a systematic study of the resulting spectral thresholding operators. We demonstrate that our proposed nonconvex regularization framework leads to improved model selection properties in terms of finding low-rank solutions with better predictive performance on a wide range of synthetic data and the famous Netflix data recommender system
Recommended from our members
Advances in Model Selection Techniques with Applications to Statistical Network Analysis and Recommender Systems
This dissertation focuses on developing novel model selection techniques, the process by which a statistician selects one of a number of competing models of varying dimensions, under an array of different statistical assumptions on observed data. Traditionally, two main reasons have been advocated by researchers for performing model selection strategies over classical maximum likelihood estimates (MLEs). The first reason is prediction accuracy, where by shrinking or setting to zero some model parameters, one sacrifices the unbiasedness of MLEs for a reduced variance, which in turn leads to an overall improvement in predictive performance. The second reason relates to interpretability of the selected models in the presence of a large number of predictors, where in order to obtain a parsimonious representation exhibiting the relationship between the response and covariates, we are willing to sacrifice some of the smaller details brought in by spurious predictors.
In the first part of this work, we revisit the family of variable selection techniques known as sure independence screening procedures for generalized linear models and the Cox proportional hazards model. After clever combination of some of its most powerful variants, we propose new extensions based on the idea of sample splitting, data-driven thresholding, and combinations thereof. A publicly available package developed in the R statistical software demonstrates considerable improvements in terms of model selection and competitive computational time between our enhanced variable selection procedures and traditional penalized likelihood methods applied directly to the full set of covariates.
Next, we develop model selection techniques within the framework of statistical network analysis for two frequent problems arising in the context of stochastic blockmodels: community number selection and change-point detection. In the second part of this work, we propose a composite likelihood based approach for selecting the number of communities in stochastic blockmodels and its variants, with robustness consideration against possible misspecifications in the underlying conditional independence assumptions of the stochastic blockmodel. Several simulation studies, as well as two real data examples, demonstrate the superiority of our composite likelihood approach when compared to the traditional Bayesian Information Criterion or variational Bayes solutions. In the third part of this thesis, we extend our analysis on static network data to the case of dynamic stochastic blockmodels, where our model selection task is the segmentation of a time-varying network into temporal and spatial components by means of a change-point detection hypothesis testing problem. We propose a corresponding test statistic based on the idea of data aggregation across the different temporal layers through kernel-weighted adjacency matrices computed before and after each candidate change-point, and illustrate our approach on synthetic data and the Enron email corpus.
The matrix completion problem consists in the recovery of a low-rank data matrix based on a small sampling of its entries. In the final part of this dissertation, we extend prior work on nuclear norm regularization methods for matrix completion by incorporating a continuum of penalty functions between the convex nuclear norm and nonconvex rank functions. We propose an algorithmic framework for computing a family of nonconvex penalized matrix completion problems with warm-starts, and present a systematic study of the resulting spectral thresholding operators. We demonstrate that our proposed nonconvex regularization framework leads to improved model selection properties in terms of finding low-rank solutions with better predictive performance on a wide range of synthetic data and the famous Netflix data recommender system