3 research outputs found

    Improving Comparative Genomic Studies:Definitions and Algorithms for Syntenic Blocks

    Get PDF
    Comparative genomics aims to understand the structure of genomes and the function of various genomic fragments, by transferring knowledge gained from well studied genomes, to the new object of study. Rapid and inexpensive high-throughput sequencing is making available more and more complete genome sequences. Despite the significant scientific advance, we still lack good models for the evolution of the genomic architecture, therefore analyzing these genomes presents formidable challenges. Early approaches used pairwise comparisons, but today researchers are attempting to leverage the larger potential of multiway comparisons. Current approaches are based on the identification of so called syntenic blocks: blocks of sequence that exhibit conserved features across the genomes under study. Syntenic blocks are in many ways analogous to genesâ -in many cases, the markers are used to constructing them are genes. Like genes they can exist in multiple copies, in which case we could define analogs of orthology and paralogy. However, whereas genes are studied at the sequence level, syntenic blocks are too large for that level of detail - it is their structure and function as a unit that makes them valuable for genome level comparative studies. Syntenic blocks are required for complex computations to scale to the billions of nucleotides present in many genomes; they enable comparisons across broad ranges of genomes because they filter outmuch of the individual variability; they highlight candidate regions for in-depth studies; and they facilitate whole-genome comparisons through visualization tools. The identification of such blocks is the first step in comparative studies, yet its effect on final results has not been well studied, nor has any formalization of syntenic blocks been proposed. Tools for the identification of syntenic blocks yield quite different results, thereby preventing a systematic assessment of the next steps in an analysis. Current tools do not include measurable quality objectives and thus cannot be benchmarked against themselves. Comparisons among tools have also been neglected - what few results are given use superficial measures unrelated to quality or consistency. In this thesis we address two major challenges, and present: (i) a theoretical model as well as an experimental basis for comparing syntenic blocks and thus also for improving the design of tools for the identification of syntenic blocks; (ii) a prototype model that serves as a basis for implementing effective synteny mining tools. We offer an overview of the milestones present in literature, on the development of concepts and tool related to synteny; we illustrate the application of the model and the measures by applying them to syntenic blocks produced by different contemporary tools on publicly available data sets. We have taken the first step towards a formal approach to the construction of syntenic blocks by developing a simple quality criterion based on sound evolutionary principles. Our experiments demonstrate widely divergent results among these tools, throwing into question the robustness of the basic approach in comparative genomics. Our findings highlight the need for a well founded, systematic approach to the decomposition of genomes into syntenic blocks and motivate the second part of the work - starting from the proposed model, we extend the concept with data dependent features and constraints, in order to test the concept on cases of interest

    Gene family assignment-free comparative genomics

    Get PDF
    Dörr D, Thévenin A, Stoye J. Gene family assignment-free comparative genomics. BMC Bioinformatics. 2012;13(Suppl 19: Proc. of RECOMB-CG 2012): S3

    Gene family assignment-free comparative genomics

    No full text
    Abstract Background The comparison of relative gene orders between two genomes offers deep insights into functional correlations of genes and the evolutionary relationships between the corresponding organisms. Methods for gene order analyses often require prior knowledge of homologies between all genes of the genomic dataset. Since such information is hard to obtain, it is common to predict homologous groups based on sequence similarity. These hypothetical groups of homologous genes are called gene families. Results This manuscript promotes a new branch of gene order studies in which prior assignment of gene families is not required. As a case study, we present a new similarity measure between pairs of genomes that is related to the breakpoint distance. We propose an exact and a heuristic algorithm for its computation. We evaluate our methods on a dataset comprising 12 γ-proteobacteria from the literature. Conclusions In evaluating our algorithms, we show that the exact algorithm is suitable for computations on small genomes. Moreover, the results of our heuristic are close to those of the exact algorithm. In general, we demonstrate that gene order studies can be improved by direct, gene family assignment-free comparisons.</p
    corecore