89 research outputs found
Coupling geometry on binary bipartite networks: hypotheses testing on pattern geometry and nestedness
Upon a matrix representation of a binary bipartite network, via the
permutation invariance, a coupling geometry is computed to approximate the
minimum energy macrostate of a network's system. Such a macrostate is supposed
to constitute the intrinsic structures of the system, so that the coupling
geometry should be taken as information contents, or even the nonparametric
minimum sufficient statistics of the network data. Then pertinent null and
alternative hypotheses, such as nestedness, are to be formulated according to
the macrostate. That is, any efficient testing statistic needs to be a function
of this coupling geometry. These conceptual architectures and mechanisms are by
and large still missing in community ecology literature, and rendered
misconceptions prevalent in this research area. Here the algorithmically
computed coupling geometry is shown consisting of deterministic multiscale
block patterns, which are framed by two marginal ultrametric trees on row and
column axes, and stochastic uniform randomness within each block found on the
finest scale. Functionally a series of increasingly larger ensembles of matrix
mimicries is derived by conforming to the multiscale block configurations. Here
matrix mimicking is meant to be subject to constraints of row and column sums
sequences. Based on such a series of ensembles, a profile of distributions
becomes a natural device for checking the validity of testing statistics or
structural indexes. An energy based index is used for testing whether network
data indeed contains structural geometry. A new version block-based nestedness
index is also proposed. Its validity is checked and compared with the existing
ones. A computing paradigm, called Data Mechanics, and its application on one
real data network are illustrated throughout the developments and discussions
in this paper
Design and applicability of DNA arrays and DNA barcodes in biodiversity monitoring
<p>Abstract</p> <p>Background</p> <p>The rapid and accurate identification of species is a critical component of large-scale biodiversity monitoring programs. DNA arrays (micro and macro) and DNA barcodes are two molecular approaches that have recently garnered much attention. Here, we compare these two platforms for identification of an important group, the mammals.</p> <p>Results</p> <p>Our analyses, based on the two commonly used mitochondrial genes cytochrome <it>c </it>oxidase I (the standard DNA barcode for animal species) and cytochrome b (a common species-level marker), suggest that both arrays and barcodes are capable of discriminating mammalian species with high accuracy. We used three different datasets of mammalian species, comprising different sampling strategies. For DNA arrays we designed three probes for each species to address intraspecific variation. As for DNA barcoding, our analyses show that both cytochrome <it>c </it>oxidase I and cytochrome b genes, and even smaller fragments of them (mini-barcodes) can successfully discriminate species in a wide variety of specimens.</p> <p>Conclusion</p> <p>This study showed that DNA arrays and DNA barcodes are valuable molecular methods for biodiversity monitoring programs. Both approaches were capable of discriminating among mammalian species in our test assemblages. However, because designing DNA arrays require advance knowledge of target sequences, the use of this approach could be limited in large scale monitoring programs where unknown haplotypes might be encountered. DNA barcodes, by contrast, are sequencing-based and therefore could provide more flexibility in large-scale studies.</p
BBCA: Improving the Scalability of *BEAST Using Random Binning
Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses. Results: We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci. Conclusions: Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.Grant Agency of the Czech Republic P501-10-0208Academy of Sciences of the Czech Republic AVOZ50040507, AVOZ50040702, MSMT LC0604Ministry of Innovation and Science of Spain, MICINN CGL2007-64839-C02/BOSCSIC (Superior Council of Scientific InvestigationsJosé Castillejo Grant from the MICINN of the Spanish GovernmentComputer Science
Disk Covering Methods Improve Phylogenomic Analyses
Motivation: With the rapid growth rate of newly sequenced genomes, species tree inference from multiple genes has become a basic bioinformatics task in comparative and evolutionary biology. However, accurate species tree estimation is difficult in the presence of gene tree discordance, which is often due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent. Several highly accurate coalescent-based species tree estimation methods have been developed over the last decade, including MP-EST. However, the running time for MP-EST increases rapidly as the number of species grows. Results: We present divide-and-conquer techniques that improve the scalability of MP-EST so that it can run efficiently on large datasets. Surprisingly, this technique also improves the accuracy of species trees estimated by MP-EST, as our study shows on a collection of simulated and biological datasets.NSF DEB 0733029, DBI 1062335Computer Science
Bone-Associated Gene Evolution and the Origin of Flight in Birds
Background
Bones have been subjected to considerable selective pressure throughout vertebrate evolution, such as occurred during the adaptations associated with the development of powered flight. Powered flight evolved independently in two extant clades of vertebrates, birds and bats. While this trait provided advantages such as in aerial foraging habits, escape from predators or long-distance travels, it also imposed great challenges, namely in the bone structure.
Results
We performed comparative genomic analyses of 89 bone-associated genes from 47 avian genomes (including 45 new), 39 mammalian, and 20 reptilian genomes, and demonstrate that birds, after correcting for multiple testing, have an almost two-fold increase in the number of bone-associated genes with evidence of positive selection (~52.8 %) compared with mammals (~30.3 %). Most of the positive-selected genes in birds are linked with bone regulation and remodeling and thirteen have been linked with functional pathways relevant to powered flight, including bone metabolism, bone fusion, muscle development and hyperglycemia levels. Genes encoding proteins involved in bone resorption, such as TPP1, had a high number of sites under Darwinian selection in birds.
Conclusions
Patterns of positive selection observed in bird ossification genes suggest that there was a period of intense selective pressure to improve flight efficiency that was closely linked with constraints on body size
Investigating the relative influence of genetic drift and natural selection in shaping patterns of population structure in Delphinids (Delphinus delphis; Tursiops spp.)
Speciation models relying on geographic barriers to limit gene flow gather widespread consensus, but are insufficient to explain diversification in highly mobile marine organisms. Adaptation to different environments has been suggested as an alternative driver for differentiation, particularly in cetaceans. In this study, patterns of population structure at neutral and functional markers were investigated for both common (Delphinus delphis) and bottlenose dolphin (Tursiops spp.), chosen due to high levels of morphological and ecological variation within each genus. Candidate functional markers were selected by investigating signals of positive selection in both mammals and cetaceans.
No population structure was found in the European common dolphin for neutral microsatellite loci, in contrast to what is observed in other sympatric cetacean species. The previously described differention of the Eastern Mediterranean Sea population, probably results from a recent human-mediated bottleneck. Functional markers showed almost complete uniformity suggesting purifying selection. One non-synonymous mutation in β-casein and the DQβ1 locus were exceptions, with patterns of population differentiation possibly the result of differences in local selective pressures.
Additionally, large mitogenomic sequences were used to investigate the worldwide phylogeography of several ecotypes/species within the genus Tursiops, with a recent biogeographical calibration point being used to calculate divergence times. Good node resolution with high statistical support was achieved, with good separation between most ecotypes in their own lineages. However, the results give no support for a monophiletic Tursiops. Divergence times are clustered in specific geological periods characterized by climatic fluctuations from cold to warmer periods.
The Common and bottlenose dolphins exhibit contrasting patterns of population structure in an environment containing few geographical barriers. Such difference is speculated to be related with different feeding ecologies and social structures, although data on such are still limited. Although selection can be detected in the genomes of cetaceans both at the species and population level, current patterns of differentiation are thought to occur mainly due to drift
Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses
Because biological processes can make different loci have different
evolutionary histories, species tree estimation requires multiple loci from
across the genome. While many processes can result in discord between gene
trees and species trees, incomplete lineage sorting (ILS), modeled by the
multi-species coalescent, is considered to be a dominant cause for gene tree
heterogeneity. Coalescent-based methods have been developed to estimate species
trees, many of which operate by combining estimated gene trees, and so are
called summary methods. Because summary methods are generally fast, they have
become very popular techniques for estimating species trees from multiple loci.
However, recent studies have established that summary methods can have reduced
accuracy in the presence of gene tree estimation error, and also that many
biological datasets have substantial gene tree estimation error, so that
summary methods may not be highly accurate on biologically realistic
conditions. Mirarab et al. (Science 2014) presented the statistical binning
technique to improve gene tree estimation in multi-locus analyses, and showed
that it improved the accuracy of MP-EST, one of the most popular
coalescent-based summary methods. Statistical binning, which uses a simple
statistical test for combinability and then uses the larger sets of genes to
re-calculate gene trees, has good empirical performance, but using statistical
binning within a phylogenomics pipeline does not have the desirable property of
being statistically consistent. We show that weighting the recalculated gene
trees by the bin sizes makes statistical binning statistically consistent under
the multispecies coalescent, and maintains the good empirical performance.
Thus, "weighted statistical binning" enables highly accurate genome-scale
species tree estimation, and is also statistical consistent under the
multi-species coalescent model.Comment: (1) In Press, PLoS ON
Molecular evolution of bovine Toll-like receptor 2 suggests substitutions of functional relevance
<p>Abstract</p> <p>Background</p> <p>There is accumulating evidence that polymorphism in Toll-like receptor (<it>TLR) </it>genes might be associated with disease resistance or susceptibility traits in livestock. Polymorphic sites affecting TLR function should exhibit signatures of positive selection, identified as a high ratio of non-synonymous to synonymous nucleotide substitutions (ω). Phylogeny based models of codon substitution based on estimates of ω for each amino acid position can therefore offer a valuable tool to predict sites of functional relevance. We have used this approach to identify such polymorphic sites within the bovine <it>TLR2 </it>genes from ten <it>Bos indicus </it>and <it>Bos taurus </it>cattle breeds. By analysing <it>TLR2 </it>gene phylogeny in a set of mammalian species and a subset of ruminant species we have estimated the selective pressure on individual sites and domains and identified polymorphisms at sites of putative functional importance.</p> <p>Results</p> <p>The ω were highest in the mammalian TLR2 domains thought to be responsible for ligand binding and lowest in regions responsible for heterodimerisation with other TLR-related molecules. Several positively-selected sites were detected in or around ligand-binding domains. However a comparison of the ruminant subset of <it>TLR2 </it>sequences with the whole mammalian set of sequences revealed that there has been less selective pressure among ruminants than in mammals as a whole. This suggests that there have been functional changes during ruminant evolution. Twenty newly-discovered non-synonymous polymorphic sites were identified in cattle. Three of them were localised at positions shaped by positive selection in the ruminant dataset (Leu227Phe, His305Pro, His326Gln) and in domains involved in the recognition of ligands. His326Gln is of particular interest as it consists of an exchange of differentially-charged amino acids at a position which has previously been shown to be crucial for ligand binding in human TLR2.</p> <p>Conclusion</p> <p>Within bovine TLR2, polymorphisms at amino acid positions 227, 305 and 326 map to functionally important sites of TLR2 and should be considered as candidate SNPs for immune related traits in cattle. A final proof of their functional relevance requires further studies to determine their functional effect on the immune response after stimulation with relevant ligands and/or their association with immune related traits in animals.</p
- …