31 research outputs found
On the Distribution of Genetic Variation in Ecological Communities
Biodiversity in ecological communities is structured hierarchically across spatial and temporal scales. Many open questions remain as to how this structure accumulates. For example, what are the relative contributions of dispersal versus in situ speciation? Or, how important are stochastic drift versus deterministic processes? Up to this point, these questions have been investigated by isolated disciplines (e.g. macroecology, comparative phylogeography, macroevolution) using tools and data that tend to focus on only one axis of community scale data (e.g. phylogenies, relative abundances, and/or trait information). Yet we know that there are feedbacks among processes that respond on short, medium, and long time scales (local changes of abundance, accumulation of population genetic variation, and speciation processes, respectively). Therefore, the focus of my work is: first, to develop a model of the distribution of genetic variation in ecological communities; second, to construct a multi-scale model of the accumulation of biodiversity in ecological communities that jointly models three axes of data that respond on ecological, population genetic, and phylogenetic timescales; and third, to incorporate abiotic variables with community-scale genetic data in a machine learning framework to make predictions about the distribution of genetic variation across the landscape. First, I will present a modelling approach that involves merging Hubbell\u27s neutral theory with neutral population genetic theory to construct a joint model of species abundance and genetic diversity. This model simulates joint distributions of abundance and genetic variation assuming both ecological and iv population genetic neutrality, and captures both equilibrium and non-equilibrium dynamics. These simulations can be used for a variety of applications, including estimating the shape of the abundance distribution using only a sample of community-scale genetic data. Next, I will present a model that extends the double neutral model to incorporate non-neutral processes (such as ecological interactions) and to introduce a speciation process. The goal of this work is to fully integrate abundance and trait data with phylogenies and population genetic data into a unified framework with the aim of testing community assembly models and estimating ecological parameters using observed community data. One result of this work is the finding that genetic diversity is distributed more uniformly in ecological communities than abundance. Another critical insight is that community-scale genetic data provide a record of community history on a population-genetic timescale, which can complement ecological information obtained from sampled abundance data, and deep time community history recorded in phylogenies. Finally, I will describe a machine learning framework that integrates community-scale genetic data and abiotic variables (climatic/environmental) to make predictions about genetic diversity across the landscape. I demonstrate this method using densely sampled abundances and community-scale sequence data collected from 10 decapod crustacean communities distributed throughout the Coral Triangle. The observed distributions of abundance and genetic diversity in these communities largely agree with model predictions, in that abundance distributions demonstrated higher dominance. The machine learning inference procedure identified mean annual sea surface temperature and proximity of the sampling site to deep water as key factors contributing to the shape and magnitude of community-scale genetic diversity. As community-scale genetic data becomes easier to cost-effectively obtain, this only increases the importance of hierarchical models of biodiversity accumulation that account for feedbacks across timescales to make the most accurate inference about community history from this dat
Strategies for improving approximate Bayesian computation tests for synchronous diversification
Background: Estimating the variability in isolation times across co-distributed taxon pairs that may have experienced the same allopatric isolating mechanism is a core goal of comparative phylogeography. The use of hierarchical Approximate Bayesian Computation (ABC) and coalescent models to infer temporal dynamics of lineage co-diversification has been a contentious topic in recent years. Key issues that remain unresolved include the choice of an appropriate prior on the number of co-divergence events (Ψ), as well as the optimal strategies for data summarization.
Methods: Through simulation-based cross validation we explore the impact of the strategy for sorting summary statistics and the choice of prior on Ψ on the estimation of co-divergence variability. We also introduce a new setting (β) that can potentially improve estimation of Ψ by enforcing a minimal temporal difference between pulses of co-divergence. We apply this new method to three empirical datasets: one dataset each of co-distributed taxon pairs of Panamanian frogs and freshwater fishes, and a large set of Neotropical butterfly sister-taxon pairs.
Results: We demonstrate that the choice of prior on Ψ has little impact on inference, but that sorting summary statistics yields substantially more reliable estimates of co-divergence variability despite violations of assumptions about exchangeability. We find the implementation of β improves estimation of Ψ, with improvement being most dramatic given larger numbers of taxon pairs. We find equivocal support for synchronous co-divergence for both of the Panamanian groups, but we find considerable support for asynchronous divergence among the Neotropical butterflies.
Conclusions: Our simulation experiments demonstrate that using sorted summary statistics results in improved estimates of the variability in divergence times, whereas the choice of hyperprior on Ψ has negligible effect. Additionally, we demonstrate that estimating the number of pulses of co-divergence across co-distributed taxonpairs is improved by applying a flexible buffering regime over divergence times. This improves the correlation between Ψ and the true variability in isolation times and allows for more meaningful interpretation of this hyperparameter. This will allow for more accurate identification of the number of temporally distinct pulses of codivergence that generated the diversification pattern of a given regional assemblage of sister-taxon-pairs
High-throughput sequencing for community analysis: the promise of DNA barcoding to uncover diversity, relatedness, abundances and interactions in spider communities
Large-scale studies on community ecology are highly desirable but often difficult to accomplish due to the considerable investment of time, labor and, money required to characterize richness, abundance, relatedness, and interactions. Nonetheless, such large-scale perspectives are necessary for understanding the composition, dynamics, and resilience of biological communities. Small invertebrates play a central role in ecosystems, occupying critical positions in the food web and performing a broad variety of ecological functions. However, it has been particularly difficult to adequately characterize communities of these animals because of their exceptionally high diversity and abundance. Spiders in particular fulfill key roles as both predator and prey in terrestrial food webs and are hence an important focus of ecological studies. In recent years, large-scale community analyses have benefitted tremendously from advances in DNA barcoding technology. High-throughput sequencing (HTS), particularly DNA metabarcoding, enables community-wide analyses of diversity and interactions at unprecedented scales and at a fraction of the cost that was previously possible. Here, we review the current state of the application of these technologies to the analysis of spider communities. We discuss amplicon-based DNA barcoding and metabarcoding for the analysis of community diversity and molecular gut content analysis for assessing predator-prey relationships. We also highlight applications of the third generation sequencing technology for long read and portable DNA barcoding. We then address the development of theoretical frameworks for community-level studies, and finally highlight critical gaps and future directions for DNA analysis of spider communities
Behavioural response to song and genetic divergence in two subspecies of white-crowned sparrows (Zonotrichia leucophrys)
© 2016 John Wiley & Sons Ltd Divergence in sexual signals may drive reproductive isolation between lineages, but behavioural barriers can weaken in contact zones. Here, we investigate the role of song as a behavioural and genetic barrier in a contact zone between two subspecies of white-crowned sparrows (Zonotrichia leucophrys). We employed a reduced genomic data set to assess population structure and infer the history underlying divergence, gene flow and hybridization. We also measured divergence in song and tested behavioural responses to song using playback experiments within and outside the contact zone. We found that the subspecies form distinct genetic clusters, and demographic inference supported a model of secondary contact. Song phenotype, particularly length of the first note (a whistle), was a significant predictor of genetic subspecies identity and genetic distance along the hybrid zone, suggesting a close link between song and genetic divergence in this system. Individuals from both parental and admixed localities responded significantly more strongly to their own song than to the other subspecies song, supporting song as a behavioural barrier. Putative parental and admixed individuals were not significantly different in their strength of discrimination between own and other songs; however, individuals from admixed localities tended to discriminate less strongly, and this difference in discrimination strength was explained by song dissimilarity as well as genetic distance. Therefore, we find that song acts as a reproductive isolating mechanism that is potentially weakening in a contact zone between the subspecies. Our findings also support the hypothesis that intraspecific song variation can reduce gene flow between populations
RADseq as a valuable tool for plants with large genomes-a case study in cycads
Full genome sequencing of organisms with large and complex genomes is intractable and cost ineffective under most research budgets. Cycads (Cycadales) represent one of the oldest lineages of the extant seed plants and, partly due to their age, have incredibly large genomes up to ~60 Gbp. Restriction site-associated DNA sequencing (RADseq) offers an approach to find genome-wide informative markers and has proven to be effective with both model and nonmodel organisms. We tested the application of RADseq using ezRAD across all 10 genera of the Cycadales including an example data set of Cycas calcicola representing 72 samples from natural populations. Using previously available plastid and mitochondrial genomes as references, reads were mapped recovering plastid and mitochondrial genome regions and nuclear markers for all of the genera. De novo assembly generated up to 138,407 high-depth clusters and up to 1,705 phylogenetically informative loci for the genera, and 4,421 loci for the example assembly of C. calcicola. The number of loci recovered by de novo assembly was lower than previous RADseq studies, yet still sufficient for downstream analysis. However, the number of markers could be increased by relaxing our assembly parameters, especially for the C. calcicola data set. Our results demonstrate the successful application of RADseq across the Cycadales to generate a large number of markers for all genomic compartments, despite the large number of plastids present in a typical plant cell. Our modified protocol was adapted to be applied to cycads and other organisms with large genomes to yield many informative genome-wide markers
Coming of age for COI metabarcoding of whole organism community DNA: towards bioinformatic harmonisation
Metabarcoding of DNA extracted from community samples of whole organisms (whole organism community DNA, wocDNA) is increasingly being applied to terrestrial, marine and freshwater metazoan communities to provide rapid, accurate and high resolution data for novel molecular ecology research. The growth of this field has been accompanied by considerable development that builds on microbial metabarcoding methods to develop appropriate and efficient sampling and laboratory protocols for whole organism metazoan communities. However, considerably less attention has focused on ensuring bioinformatic methods are adapted and applied comprehensively in wocDNA metabarcoding. In this study we examined over 600 papers and identified 111 studies that performed COI metabarcoding of wocDNA. We then systematically reviewed the bioinformatic methods employed by these papers to identify the state-of-the-art. Our results show that the increasing use of wocDNA COI metabarcoding for metazoan diversity is characterised by a clear absence of bioinformatic harmonisation, and the temporal trends show little change in this situation. The reviewed literature showed (i) high heterogeneity across pipelines, tasks and tools used, (ii) limited or no adaptation of bioinformatic procedures to the nature of the COI fragment, and (iii) a worrying underreporting of tasks, software and parameters. Based upon these findings we propose a set of recommendations that we think the metabarcoding community should consider to ensure that bioinformatic methods are appropriate, comprehensive and comparable. We believe that adhering to these recommendations will improve the long-term integrative potential of wocDNA COI metabarcoding for biodiversity science
Strategies for improving approximate Bayesian computation tests for synchronous diversification
Abstract Background Estimating the variability in isolation times across co-distributed taxon pairs that may have experienced the same allopatric isolating mechanism is a core goal of comparative phylogeography. The use of hierarchical Approximate Bayesian Computation (ABC) and coalescent models to infer temporal dynamics of lineage co-diversification has been a contentious topic in recent years. Key issues that remain unresolved include the choice of an appropriate prior on the number of co-divergence events (Ψ), as well as the optimal strategies for data summarization. Methods Through simulation-based cross validation we explore the impact of the strategy for sorting summary statistics and the choice of prior on Ψ on the estimation of co-divergence variability. We also introduce a new setting (β) that can potentially improve estimation of Ψ by enforcing a minimal temporal difference between pulses of co-divergence. We apply this new method to three empirical datasets: one dataset each of co-distributed taxon pairs of Panamanian frogs and freshwater fishes, and a large set of Neotropical butterfly sister-taxon pairs. Results We demonstrate that the choice of prior on Ψ has little impact on inference, but that sorting summary statistics yields substantially more reliable estimates of co-divergence variability despite violations of assumptions about exchangeability. We find the implementation of β improves estimation of Ψ, with improvement being most dramatic given larger numbers of taxon pairs. We find equivocal support for synchronous co-divergence for both of the Panamanian groups, but we find considerable support for asynchronous divergence among the Neotropical butterflies. Conclusions Our simulation experiments demonstrate that using sorted summary statistics results in improved estimates of the variability in divergence times, whereas the choice of hyperprior on Ψ has negligible effect. Additionally, we demonstrate that estimating the number of pulses of co-divergence across co-distributed taxon-pairs is improved by applying a flexible buffering regime over divergence times. This improves the correlation between Ψ and the true variability in isolation times and allows for more meaningful interpretation of this hyperparameter. This will allow for more accurate identification of the number of temporally distinct pulses of co-divergence that generated the diversification pattern of a given regional assemblage of sister-taxon-pairs
An integrated model of population genetics and community ecology
[Aim]: Quantifying abundance distributions is critical for understanding both how communities assemble, and how community structure varies through time and space, yet estimating abundances requires considerable investment in fieldwork. Community-level population genetic data potentially offer a powerful way to indirectly infer richness, abundance and the history of accumulation of biodiversity within a community. Here we introduce a joint model linking neutral community assembly and comparative phylogeography to generate both community-level richness, abundance and genetic variation under a neutral model, capturing both equilibrium and non-equilibrium dynamics, [Location]: Global, [Methods]: Our model combines a forward-time individual-based community assembly process with a rescaled backward-time neutral coalescent model of multi-taxa population genetics. We explore general dynamics of genetic and abundance-based summary statistics and use approximate Bayesian computation (ABC) to estimate parameters underlying the model of island community assembly. Finally, we demonstrate two applications of the model using community-scale mtDNA sequence data and densely sampled abundances of an arachnid community on La Réunion. First, we use genetic data alone to estimate a summary of the abundance distribution, ground-truthing this against the observed abundances. Then, we jointly use the observed genetic data and abundances to estimate the proximity of the community to equilibrium, [Results]: Simulation experiments of our ABC procedure demonstrate that coupling abundance with genetic data leads to improved accuracy and precision of model parameter estimates compared with using abundance-only data. We further demonstrate reasonable precision and accuracy in estimating a metric underlying the shape of the abundance distribution, temporal progress towards local equilibrium and several key parameters of the community assembly process. For the insular arachnid assemblage, we find the joint distribution of genetic diversity and abundance approaches equilibrium expectations, and that the Shannon entropy of the observed abundances can be estimated using genetic data alone, [Main conclusions]: The framework that we present unifies neutral community assembly and comparative phylogeography to characterize the community-level distribution of both abundance and genetic variation through time, providing a resource that should greatly enhance understanding of both the processes structuring ecological communities and the associated aggregate demographic histories.Funding was provided by grants from FAPESP (BIOTA, 2013/50297‐0 to M.J.H. and AC Carnaval), NASA through the Dimensions of Biodiversity Program (DOB 1343578) and the National Science Foundation (DEB‐1253710 to M.J.H.). This work would not have been possible without help from the City University of New York High Performance Computing Center, with support from the National Science Foundation (CNS‐0855217 and CNS‐0958379).Peer Reviewe
Additional file 1: of Strategies for improving approximate Bayesian computation tests for synchronous diversification
Supplementary information. File contains Supporting Materials and Methods, Supplementary Tables S1 & S2, and Supplementary figs. S1–S20. (DOCX 2273 kb