89 research outputs found

    Coupling geometry on binary bipartite networks: hypotheses testing on pattern geometry and nestedness

    Full text link
    Upon a matrix representation of a binary bipartite network, via the permutation invariance, a coupling geometry is computed to approximate the minimum energy macrostate of a network's system. Such a macrostate is supposed to constitute the intrinsic structures of the system, so that the coupling geometry should be taken as information contents, or even the nonparametric minimum sufficient statistics of the network data. Then pertinent null and alternative hypotheses, such as nestedness, are to be formulated according to the macrostate. That is, any efficient testing statistic needs to be a function of this coupling geometry. These conceptual architectures and mechanisms are by and large still missing in community ecology literature, and rendered misconceptions prevalent in this research area. Here the algorithmically computed coupling geometry is shown consisting of deterministic multiscale block patterns, which are framed by two marginal ultrametric trees on row and column axes, and stochastic uniform randomness within each block found on the finest scale. Functionally a series of increasingly larger ensembles of matrix mimicries is derived by conforming to the multiscale block configurations. Here matrix mimicking is meant to be subject to constraints of row and column sums sequences. Based on such a series of ensembles, a profile of distributions becomes a natural device for checking the validity of testing statistics or structural indexes. An energy based index is used for testing whether network data indeed contains structural geometry. A new version block-based nestedness index is also proposed. Its validity is checked and compared with the existing ones. A computing paradigm, called Data Mechanics, and its application on one real data network are illustrated throughout the developments and discussions in this paper

    Design and applicability of DNA arrays and DNA barcodes in biodiversity monitoring

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid and accurate identification of species is a critical component of large-scale biodiversity monitoring programs. DNA arrays (micro and macro) and DNA barcodes are two molecular approaches that have recently garnered much attention. Here, we compare these two platforms for identification of an important group, the mammals.</p> <p>Results</p> <p>Our analyses, based on the two commonly used mitochondrial genes cytochrome <it>c </it>oxidase I (the standard DNA barcode for animal species) and cytochrome b (a common species-level marker), suggest that both arrays and barcodes are capable of discriminating mammalian species with high accuracy. We used three different datasets of mammalian species, comprising different sampling strategies. For DNA arrays we designed three probes for each species to address intraspecific variation. As for DNA barcoding, our analyses show that both cytochrome <it>c </it>oxidase I and cytochrome b genes, and even smaller fragments of them (mini-barcodes) can successfully discriminate species in a wide variety of specimens.</p> <p>Conclusion</p> <p>This study showed that DNA arrays and DNA barcodes are valuable molecular methods for biodiversity monitoring programs. Both approaches were capable of discriminating among mammalian species in our test assemblages. However, because designing DNA arrays require advance knowledge of target sequences, the use of this approach could be limited in large scale monitoring programs where unknown haplotypes might be encountered. DNA barcodes, by contrast, are sequencing-based and therefore could provide more flexibility in large-scale studies.</p

    BBCA: Improving the Scalability of *BEAST Using Random Binning

    Get PDF
    Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses. Results: We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci. Conclusions: Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.Grant Agency of the Czech Republic P501-10-0208Academy of Sciences of the Czech Republic AVOZ50040507, AVOZ50040702, MSMT LC0604Ministry of Innovation and Science of Spain, MICINN CGL2007-64839-C02/BOSCSIC (Superior Council of Scientific InvestigationsJosé Castillejo Grant from the MICINN of the Spanish GovernmentComputer Science

    Disk Covering Methods Improve Phylogenomic Analyses

    Get PDF
    Motivation: With the rapid growth rate of newly sequenced genomes, species tree inference from multiple genes has become a basic bioinformatics task in comparative and evolutionary biology. However, accurate species tree estimation is difficult in the presence of gene tree discordance, which is often due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent. Several highly accurate coalescent-based species tree estimation methods have been developed over the last decade, including MP-EST. However, the running time for MP-EST increases rapidly as the number of species grows. Results: We present divide-and-conquer techniques that improve the scalability of MP-EST so that it can run efficiently on large datasets. Surprisingly, this technique also improves the accuracy of species trees estimated by MP-EST, as our study shows on a collection of simulated and biological datasets.NSF DEB 0733029, DBI 1062335Computer Science

    Bone-Associated Gene Evolution and the Origin of Flight in Birds

    Get PDF
    Background Bones have been subjected to considerable selective pressure throughout vertebrate evolution, such as occurred during the adaptations associated with the development of powered flight. Powered flight evolved independently in two extant clades of vertebrates, birds and bats. While this trait provided advantages such as in aerial foraging habits, escape from predators or long-distance travels, it also imposed great challenges, namely in the bone structure. Results We performed comparative genomic analyses of 89 bone-associated genes from 47 avian genomes (including 45 new), 39 mammalian, and 20 reptilian genomes, and demonstrate that birds, after correcting for multiple testing, have an almost two-fold increase in the number of bone-associated genes with evidence of positive selection (~52.8 %) compared with mammals (~30.3 %). Most of the positive-selected genes in birds are linked with bone regulation and remodeling and thirteen have been linked with functional pathways relevant to powered flight, including bone metabolism, bone fusion, muscle development and hyperglycemia levels. Genes encoding proteins involved in bone resorption, such as TPP1, had a high number of sites under Darwinian selection in birds. Conclusions Patterns of positive selection observed in bird ossification genes suggest that there was a period of intense selective pressure to improve flight efficiency that was closely linked with constraints on body size

    Investigating the relative influence of genetic drift and natural selection in shaping patterns of population structure in Delphinids (Delphinus delphis; Tursiops spp.)

    Get PDF
    Speciation models relying on geographic barriers to limit gene flow gather widespread consensus, but are insufficient to explain diversification in highly mobile marine organisms. Adaptation to different environments has been suggested as an alternative driver for differentiation, particularly in cetaceans. In this study, patterns of population structure at neutral and functional markers were investigated for both common (Delphinus delphis) and bottlenose dolphin (Tursiops spp.), chosen due to high levels of morphological and ecological variation within each genus. Candidate functional markers were selected by investigating signals of positive selection in both mammals and cetaceans. No population structure was found in the European common dolphin for neutral microsatellite loci, in contrast to what is observed in other sympatric cetacean species. The previously described differention of the Eastern Mediterranean Sea population, probably results from a recent human-mediated bottleneck. Functional markers showed almost complete uniformity suggesting purifying selection. One non-synonymous mutation in β-casein and the DQβ1 locus were exceptions, with patterns of population differentiation possibly the result of differences in local selective pressures. Additionally, large mitogenomic sequences were used to investigate the worldwide phylogeography of several ecotypes/species within the genus Tursiops, with a recent biogeographical calibration point being used to calculate divergence times. Good node resolution with high statistical support was achieved, with good separation between most ecotypes in their own lineages. However, the results give no support for a monophiletic Tursiops. Divergence times are clustered in specific geological periods characterized by climatic fluctuations from cold to warmer periods. The Common and bottlenose dolphins exhibit contrasting patterns of population structure in an environment containing few geographical barriers. Such difference is speculated to be related with different feeding ecologies and social structures, although data on such are still limited. Although selection can be detected in the genomes of cetaceans both at the species and population level, current patterns of differentiation are thought to occur mainly due to drift

    Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses

    Full text link
    Because biological processes can make different loci have different evolutionary histories, species tree estimation requires multiple loci from across the genome. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called summary methods. Because summary methods are generally fast, they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate on biologically realistic conditions. Mirarab et al. (Science 2014) presented the statistical binning technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple statistical test for combinability and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomics pipeline does not have the desirable property of being statistically consistent. We show that weighting the recalculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistical consistent under the multi-species coalescent model.Comment: (1) In Press, PLoS ON

    Molecular evolution of bovine Toll-like receptor 2 suggests substitutions of functional relevance

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is accumulating evidence that polymorphism in Toll-like receptor (<it>TLR) </it>genes might be associated with disease resistance or susceptibility traits in livestock. Polymorphic sites affecting TLR function should exhibit signatures of positive selection, identified as a high ratio of non-synonymous to synonymous nucleotide substitutions (ω). Phylogeny based models of codon substitution based on estimates of ω for each amino acid position can therefore offer a valuable tool to predict sites of functional relevance. We have used this approach to identify such polymorphic sites within the bovine <it>TLR2 </it>genes from ten <it>Bos indicus </it>and <it>Bos taurus </it>cattle breeds. By analysing <it>TLR2 </it>gene phylogeny in a set of mammalian species and a subset of ruminant species we have estimated the selective pressure on individual sites and domains and identified polymorphisms at sites of putative functional importance.</p> <p>Results</p> <p>The ω were highest in the mammalian TLR2 domains thought to be responsible for ligand binding and lowest in regions responsible for heterodimerisation with other TLR-related molecules. Several positively-selected sites were detected in or around ligand-binding domains. However a comparison of the ruminant subset of <it>TLR2 </it>sequences with the whole mammalian set of sequences revealed that there has been less selective pressure among ruminants than in mammals as a whole. This suggests that there have been functional changes during ruminant evolution. Twenty newly-discovered non-synonymous polymorphic sites were identified in cattle. Three of them were localised at positions shaped by positive selection in the ruminant dataset (Leu227Phe, His305Pro, His326Gln) and in domains involved in the recognition of ligands. His326Gln is of particular interest as it consists of an exchange of differentially-charged amino acids at a position which has previously been shown to be crucial for ligand binding in human TLR2.</p> <p>Conclusion</p> <p>Within bovine TLR2, polymorphisms at amino acid positions 227, 305 and 326 map to functionally important sites of TLR2 and should be considered as candidate SNPs for immune related traits in cattle. A final proof of their functional relevance requires further studies to determine their functional effect on the immune response after stimulation with relevant ligands and/or their association with immune related traits in animals.</p
    corecore