10 research outputs found

    The power of phylogenetic approaches to detect horizontally transferred genes

    Get PDF
    BACKGROUND: Horizontal gene transfer plays an important role in evolution because it sometimes allows recipient lineages to adapt to new ecological niches. High genes transfer frequencies were inferred for prokaryotic and early eukaryotic evolution. Does horizontal gene transfer also impact phylogenetic reconstruction of the evolutionary history of genomes and organisms? The answer to this question depends at least in part on the actual gene transfer frequencies and on the ability to weed out transferred genes from further analyses. Are the detected transfers mainly false positives, or are they the tip of an iceberg of many transfer events most of which go undetected by current methods? RESULTS: Phylogenetic detection methods appear to be the method of choice to infer gene transfers, especially for ancient transfers and those followed by orthologous replacement. Here we explore how well some of these methods perform using in silico transfers between the terminal branches of a gamma proteobacterial, genome based phylogeny. For the experiments performed here on average the AU test at a 5% significance level detects 90.3% of the transfers and 91% of the exchanges as significant. Using the Robinson-Foulds distance only 57.7% of the exchanges and 60% of the donations were identified as significant. Analyses using bipartition spectra appeared most successful in our test case. The power of detection was on average 97% using a 70% cut-off and 94.2% with 90% cut-off for identifying conflicting bipartitions, while the rate of false positives was below 4.2% and 2.1% for the two cut-offs, respectively. For all methods the detection rates improved when more intervening branches separated donor and recipient. CONCLUSION: Rates of detected transfers should not be mistaken for the actual transfer rates; most analyses of gene transfers remain anecdotal. The method and significance level to identify potential gene transfer events represent a trade-off between the frequency of erroneous identification (false positives) and the power to detect actual transfer events

    BranchClust: a phylogenetic algorithm for selecting gene families

    Get PDF
    BACKGROUND: Automated methods for assembling families of orthologous genes include those based on sequence similarity scores and those based on phylogenetic approaches. The first are easy to automate but usually they do not distinguish between paralogs and orthologs or have restriction on the number of taxa. Phylogenetic methods often are based on reconciliation of a gene tree with a known rooted species tree; a limitation of this approach, especially in case of prokaryotes, is that the species tree is often unknown, and that from the analyses of single gene families the branching order between related organisms frequently is unresolved. RESULTS: Here we describe an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and outparalogs. The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at . CONCLUSION: BranchClust outperforms the Reciprocal Best Blast hit method in selecting more sets of putatively orthologous genes. In the test cases examined, the correctness of the selected families and of the identified in- and outparalogs was confirmed by inspection of the pertinent phylogenetic trees

    A Computational Framework Discovers New Copy Number Variants with Functional Importance

    Get PDF
    Structural variants which cause changes in copy numbers constitute an important component of genomic variability. They account for 0.7% of genomic differences in two individual genomes, of which copy number variants (CNVs) are the largest component. A recent population-based CNV study revealed the need of better characterization of CNVs, especially the small ones (<500 bp).We propose a three step computational framework (Identification of germline Changes in Copy Number or IgC2N) to discover and genotype germline CNVs. First, we detect candidate CNV loci by combining information across multiple samples without imposing restrictions to the number of coverage markers or to the variant size. Secondly, we fine tune the detection of rare variants and infer the putative copy number classes for each locus. Last, for each variant we combine the relative distance between consecutive copy number classes with genetic information in a novel attempt to estimate the reference model bias. This computational approach is applied to genome-wide data from 1250 HapMap individuals. Novel variants were discovered and characterized in terms of size, minor allele frequency, type of polymorphism (gains, losses or both), and mechanism of formation. Using data generated for a subset of individuals by a 42 million marker platform, we validated the majority of the variants with the highest validation rate (66.7%) was for variants of size larger than 1 kb. Finally, we queried transcriptomic data from 129 individuals determined by RNA-sequencing as further validation and to assess the functional role of the new variants. We investigated the possible enrichment for variant's regulatory effect and found that smaller variants (<1 Kb) are more likely to regulate gene transcript than larger variants (p-value = 2.04e-08). Our results support the validity of the computational framework to detect novel variants relevant to disease susceptibility studies and provide evidence of the importance of genetic variants in regulatory network studies

    Unsupervised Learning in Spectral Genome Analysis

    No full text
    The tree representation as a model for organismal evolution has been in use since before Darwin. However, with the recent unprecedented access to biomolecular data it has been discovered that, especially in the microbial world, individual genes making up the genome of an organism give rise to different and sometimes conflicting evolutionary tree topologies. This discovery calls into question the notion of a single evolutionary tree for an organism and gives rise to the notion of an evolutionary consensus tree based on the evolutionary patterns of the majority of genes in a genome embedded in a network of gene histories. Here we discuss an approach to the analysis of genomic data of multiple genomes using bipartition spectral analysis and unsupervised learning. An interesting observation is that genes within genomes that have evolutionary tree topologies that are in significant conflict with the evolutionary consensus tree of an organism point to possible horizontal gene transfer events which often delineate significant evolutionary events. 1

    Hidden chromosome symmetry: in silico transformation reveals symmetry in 2D DNA walk trajectories of 671 chromosomes.

    Get PDF
    Maps of 2D DNA walk of 671 examined chromosomes show composition complexity change from symmetrical half-turn in bacteria to pseudo-random trajectories in archaea, fungi and humans. In silico transformation of gene order and strand position returns most of the analyzed chromosomes to a symmetrical bacterial-like state with one transition point. The transformed chromosomal sequences also reveal remarkable segmental compositional symmetry between regions from different strands located equidistantly from the transition point. Despite extensive chromosome rearrangement the relation of gene numbers on opposite strands for chromosomes of different taxa varies in narrow limits around unity with Pearson coefficient r = 0.98. Similar relation is observed for total genes' length (r = 0.86) and cumulative GC (r = 0.95) and AT (r = 0.97) skews. This is also true for human coding sequences (CDS), which comprise only several percent of the entire chromosome length. We found that frequency distributions of the length of gene clusters, continuously located on the same strand, have close values for both strands. Eukaryotic gene distribution is believed to be non-random. Contribution of different subsystems to the noted symmetries and distributions, and evolutionary aspects of symmetry are discussed

    Inferring Horizontal Gene Transfer

    No full text
    corecore