1 research outputs found

    Linkage disequilibrium maps to guide contig ordering for genome assembly

    No full text
    Motivation: Efforts to establish reference genome sequences by \textit{de novo} sequence assembly have to address the difficulty of linking relatively short sequence contigs to form much larger chromosome assemblies. Efficient strategies are required to span gaps and establish contig order and relative orientation. We consider here the use of linkage disequilibrium (LD) maps of sequenced contigs and the utility of LD for ordering, orienting and positioning linked sequences. LD maps are readily constructed from population data and have at least an order of magnitude higher resolution than linkage maps providing the potential to resolve difficult areas in assemblies. We empirically evaluate a linkage disequilibrium map-based method using single nucleotide polymorphism genotype data in a ~216 kilobase region of human 6p21.3 from which three shorter contigs are formed.Results: LD map length is most informative about the correct order and orientation and is suggested by the shortest LD map where the residual error variance is close to one. For regions in strong LD this method may be less informative for correcting inverted contigs than for identifying correct contig orders. For positioning two contigs in linkage disequilibrium with each other the inter-contig distances may be roughly estimated by this method. Availability: The LDMAP program is written in C for a linux platform and is available at https://www.soton.ac.uk/genomicinformatics/research/ld.page<br/
    corecore