2 research outputs found

    LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs).</p> <p>Results</p> <p>To address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize.</p> <p>Conclusions</p> <p>The results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods.</p

    Mining for Significant Information from Unstructured and Structured Biological Data and Its Applications

    Get PDF
    Massive amounts of biological data are being accumulated in science. Searching for significant meaningful information and patterns from different types of data is necessary towards gaining knowledge from these large amounts of data available to users. However, data mining techniques do not normally deal with significance. Integrating data mining techniques with standard statistical procedures provides a way for mining statistically signi- ficant, interesting information from both structured and unstructured data. In this dissertation, different algorithms for mining significant biological information from both unstructured and structured data are proposed. A weighted-density-based approach is presented for mining item data from unstructured textual representations. Different algorithms in the area of radiation hybrid mapping are developed for mining significant information from structured binary data. The proposed algorithms have different applications in the ordering problem in radiation hybrid mapping including: identifying unreliable markers, and building solid framework maps. Effectiveness of the proposed algorithms towards improving map stability is demonstrated. Map stability is determined based on resampling analysis. The proposed algorithms deal effectively and efficiently with multidimensional data and also reduce computational cost dramatically. Evaluation shows that the proposed algorithms outperform comparative methods in terms of both accuracy and computation cost
    corecore