1,408 research outputs found
Modeling and Analysing Respondent Driven Sampling as a Counting Process
Respondent-driven sampling (RDS) is an approach to sampling design and
analysis which utilizes the networks of social relationships that connect
members of the target population, using chain-referral methods to facilitate
sampling. RDS typically leads to biased sampling, favoring participants with
many acquaintances. Naive estimates, such as the sample average, which are
uncorrected for the sampling bias, will themselves be biased. To compensate for
this bias, current methodology suggests inverse-degree weighting, where the
"degree" is the number of acquaintances. This stems from the fundamental RDS
assumption that the probability of sampling an individual is proportional to
their degree. Since this assumption is tenuous at best, we propose to harness
the additional information encapsulated in the time of recruitment, into a
model-based inference framework for RDS. This information is typically
collected by researchers, but ignored. We adapt methods developed for inference
in epidemic processes to estimate the population size, degree counts and
frequencies. While providing valuable information in themselves, these
quantities ultimately serve to debias other estimators, such a disease's
prevalence. A fundamental advantage of our approach is that, being model-based,
it makes all assumptions of the data-generating process explicit. This enables
verification of the assumptions, maximum likelihood estimation, extension with
covariates, and model selection. We develop asymptotic theory, proving
consistency and asymptotic normality properties. We further compare these
estimators to the standard inverse-degree weighting through simulations, and
using real-world data. In both cases we find our estimators to outperform
current methods. The likelihood problem in the model we present is convex, and
thus efficiently solvable. We implement these estimators in an R package,
chords, available on CRAN.Comment: 16 page
Evolutionary Interactions between N-Linked Glycosylation Sites in the HIV-1 Envelope
The addition of asparagine (N)-linked polysaccharide chains (i.e., glycans) to the gp120 and gp41 glycoproteins of human immunodeficiency virus type 1 (HIV-1) envelope is not only required for correct protein folding, but also may provide protection against neutralizing antibodies as a “glycan shield.” As a result, strong host-specific selection is frequently associated with codon positions where nonsynonymous substitutions can create or disrupt potential N-linked glycosylation sites (PNGSs). Moreover, empirical data suggest that the individual contribution of PNGSs to the neutralization sensitivity or infectivity of HIV-1 may be critically dependent on the presence or absence of other PNGSs in the envelope sequence. Here we evaluate how glycan–glycan interactions have shaped the evolution of HIV-1 envelope sequences by analyzing the distribution of PNGSs in a large-sequence alignment. Using a “covarion”-type phylogenetic model, we find that the rates at which individual PNGSs are gained or lost vary significantly over time, suggesting that the selective advantage of having a PNGS may depend on the presence or absence of other PNGSs in the sequence. Consequently, we identify specific interactions between PNGSs in the alignment using a new paired-character phylogenetic model of evolution, and a Bayesian graphical model. Despite the fundamental differences between these two methods, several interactions are jointly identified by both. Mapping these interactions onto a structural model of HIV-1 gp120 reveals that negative (exclusive) interactions occur significantly more often between colocalized glycans, while positive (inclusive) interactions are restricted to more distant glycans. Our results imply that the adaptive repertoire of alternative configurations in the HIV-1 glycan shield is limited by functional interactions between the N-linked glycans. This represents a potential vulnerability of rapidly evolving HIV-1 populations that may provide useful glycan-based targets for neutralizing antibodies
An Evolutionary-Network Model Reveals Stratified Interactions in the V3 Loop of the HIV-1 Envelope
The third variable loop (V3) of the human immunodeficiency virus type 1 (HIV-1) envelope is a principal determinant of antibody neutralization and progression to AIDS. Although it is undoubtedly an important target for vaccine research, extensive genetic variation in V3 remains an obstacle to the development of an effective vaccine. Comparative methods that exploit the abundance of sequence data can detect interactions between residues of rapidly evolving proteins such as the HIV-1 envelope, revealing biological constraints on their variability. However, previous studies have relied implicitly on two biologically unrealistic assumptions: (1) that founder effects in the evolutionary history of the sequences can be ignored, and; (2) that statistical associations between residues occur exclusively in pairs. We show that comparative methods that neglect the evolutionary history of extant sequences are susceptible to a high rate of false positives (20%–40%). Therefore, we propose a new method to detect interactions that relaxes both of these assumptions. First, we reconstruct the evolutionary history of extant sequences by maximum likelihood, shifting focus from extant sequence variation to the underlying substitution events. Second, we analyze the joint distribution of substitution events among positions in the sequence as a Bayesian graphical model, in which each branch in the phylogeny is a unit of observation. We perform extensive validation of our models using both simulations and a control case of known interactions in HIV-1 protease, and apply this method to detect interactions within V3 from a sample of 1,154 HIV-1 envelope sequences. Our method greatly reduces the number of false positives due to founder effects, while capturing several higher-order interactions among V3 residues. By mapping these interactions to a structural model of the V3 loop, we find that the loop is stratified into distinct evolutionary clusters. We extend our model to detect interactions between the V3 and C4 domains of the HIV-1 envelope, and account for the uncertainty in mapping substitutions to the tree with a parametric bootstrap
Adaptation to Different Human Populations by HIV-1 Revealed by Codon-Based Analyses
Several codon-based methods are available for detecting adaptive evolution in protein-coding sequences, but to date none specifically identify sites that are selected differentially in two populations, although such comparisons between populations have been historically useful in identifying the action of natural selection. We have developed two fixed effects maximum likelihood methods: one for identifying codon positions showing selection patterns that persist in a population and another for detecting whether selection is operating differentially on individual codons of a gene sampled from two different populations. Applying these methods to two HIV populations infecting genetically distinct human hosts, we have found that few of the positively selected amino acid sites persist in the population; the other changes are detected only at the tips of the phylogenetic tree and appear deleterious in the long term. Additionally, we have identified seven amino acid sites in protease and reverse transcriptase that are selected differentially in the two samples, demonstrating specific population-level adaptation of HIV to human populations
Fast hierarchical Bayesian analysis of population structure
We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet process mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analyzing an alignment of over 110 000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximize the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and sub-clades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.Peer reviewe
Transmitted Drug Resistance in the CFAR Network of Integrated Clinical Systems Cohort: Prevalence and Effects on Pre-Therapy CD4 and Viral Load
Human immunodeficiency virus type 1 (HIV-1) genomes often carry one or more mutations associated with drug resistance upon transmission into a therapy-naïve individual. We assessed the prevalence and clinical significance of transmitted drug resistance (TDR) in chronically-infected therapy-naïve patients enrolled in a multi-center cohort in North America. Pre-therapy clinical significance was quantified by plasma viral load (pVL) and CD4+ cell count (CD4) at baseline. Naïve bulk sequences of HIV-1 protease and reverse transcriptase (RT) were screened for resistance mutations as defined by the World Health Organization surveillance list. The overall prevalence of TDR was 14.2%. We used a Bayesian network to identify co-transmission of TDR mutations in clusters associated with specific drugs or drug classes. Aggregate effects of mutations by drug class were estimated by fitting linear models of pVL and CD4 on weighted sums over TDR mutations according to the Stanford HIV Database algorithm. Transmitted resistance to both classes of reverse transcriptase inhibitors was significantly associated with lower CD4, but had opposing effects on pVL. In contrast, position-specific analyses of TDR mutations revealed substantial effects on CD4 and pVL at several residue positions that were being masked in the aggregate analyses, and significant interaction effects as well. Residue positions in RT with predominant effects on CD4 or pVL (D67 and M184) were re-evaluated in causal models using an inverse probability-weighting scheme to address the problem of confounding by other mutations and demographic or risk factors. We found that causal effect estimates of mutations M184V/I ( pVL) and D67N/G ( and pVL) were compensated by K103N/S and K219Q/E/N/R. As TDR becomes an increasing dilemma in this modern era of highly-active antiretroviral therapy, these results have immediate significance for the clinical management of HIV-1 infections and our understanding of the ongoing adaptation of HIV-1 to human populations
Does the Reading of Different Orthographies Produce Distinct Brain Activity Patterns? An ERP Study
Orthographies vary in the degree of transparency of spelling-sound correspondence. These range from shallow orthographies with transparent grapheme-phoneme relations, to deep orthographies, in which these relations are opaque. Only a few studies have examined whether orthographic depth is reflected in brain activity. In these studies a between-language design was applied, making it difficult to isolate the aspect of orthographic depth. In the present work this question was examined using a within-subject-and-language investigation. The participants were speakers of Hebrew, as they are skilled in reading two forms of script transcribing the same oral language. One form is the shallow pointed script (with diacritics), and the other is the deep unpointed script (without diacritics). Event-related potentials (ERPs) were recorded while skilled readers carried out a lexical decision task in the two forms of script. A visual non-orthographic task controlled for the visual difference between the scripts (resulting from the addition of diacritics to the pointed script only). At an early visual-perceptual stage of processing (∼165 ms after target onset), the pointed script evoked larger amplitudes with longer latencies than the unpointed script at occipital-temporal sites. However, these effects were not restricted to orthographic processing, and may therefore have reflected, at least in part, the visual load imposed by the diacritics. Nevertheless, the results implied that distinct orthographic processing may have also contributed to these effects. At later stages (∼340 ms after target onset) the unpointed script elicited larger amplitudes than the pointed one with earlier latencies. As this latency has been linked to orthographic-linguistic processing and to the classification of stimuli, it is suggested that these differences are associated with distinct lexical processing of a shallow and a deep orthography
Producing polished prokaryotic pangenomes with the Panaroo pipeline
Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content resulting from horizontal gene transfer, gene duplication and gene loss. However, the automated annotation of prokaryotic genomes is imperfect, and errors due to fragmented assemblies, contamination, diverse gene families and mis-assemblies accumulate over the population, leading to profound consequences when analysing the set of all genes found in a species. Here, we introduce Panaroo, a graph-based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies. Panaroo is available at https://github.com/gtonkinhill/panaroo.Peer reviewe
- …