612,918 research outputs found

    Partial DNA Assembly: A Rate-Distortion Perspective

    Full text link
    Earlier formulations of the DNA assembly problem were all in the context of perfect assembly; i.e., given a set of reads from a long genome sequence, is it possible to perfectly reconstruct the original sequence? In practice, however, it is very often the case that the read data is not sufficiently rich to permit unambiguous reconstruction of the original sequence. While a natural generalization of the perfect assembly formulation to these cases would be to consider a rate-distortion framework, partial assemblies are usually represented in terms of an assembly graph, making the definition of a distortion measure challenging. In this work, we introduce a distortion function for assembly graphs that can be understood as the logarithm of the number of Eulerian cycles in the assembly graph, each of which correspond to a candidate assembly that could have generated the observed reads. We also introduce an algorithm for the construction of an assembly graph and analyze its performance on real genomes.Comment: To be published at ISIT-2016. 11 pages, 10 figure

    Multi-objective discrete particle swarm optimisation algorithm for integrated assembly sequence planning and assembly line balancing

    Get PDF
    In assembly optimisation, assembly sequence planning and assembly line balancing have been extensively studied because both activities are directly linked with assembly efficiency that influences the final assembly costs. Both activities are categorised as NP-hard and usually performed separately. Assembly sequence planning and assembly line balancing optimisation presents a good opportunity to be integrated, considering the benefits such as larger search space that leads to better solution quality, reduces error rate in planning and speeds up time-to-market for a product. In order to optimise an integrated assembly sequence planning and assembly line balancing, this work proposes a multi-objective discrete particle swarm optimisation algorithm that used discrete procedures to update its position and velocity in finding Pareto optimal solution. A computational experiment with 51 test problems at different difficulty levels was used to test the multi-objective discrete particle swarm optimisation performance compared with the existing algorithms. A statistical test of the algorithm performance indicates that the proposed multi-objective discrete particle swarm optimisation algorithm presents significant improvement in terms of the quality of the solution set towards the Pareto optimal set

    Targeted Assembly of Short Sequence Reads

    Get PDF
    As next-generation sequence (NGS) production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants, by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled strin-gently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming ge-nomic mutations, polymorphism, fusion and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly

    An assembly oriented design framework for product structure engineering and assembly sequence planning

    Get PDF
    The paper describes a novel framework for an assembly-oriented design (AOD) approach as a new functional product lifecycle management (PLM) strategy, by considering product design and assembly sequence planning phases concurrently. Integration issues of product life cycle into the product development process have received much attention over the last two decades, especially at the detailed design stage. The main objective of the research is to define assembly sequence into preliminary design stages by introducing and applying assembly process knowledge in order to provide an assembly context knowledge to support life-oriented product development process, particularly for product structuring. The proposed framework highlights a novel algorithm based on a mathematical model integrating boundary conditions related to DFA rules, engineering decisions for assembly sequence and the product structure definition. This framework has been implemented in a new system called PEGASUS considered as an AOD module for a PLM system. A case study of applying the framework to a catalytic-converter and diesel particulate filter sub-system, belonging to an exhaust system from an industrial automotive supplier, is introduced to illustrate the efficiency of the proposed AOD methodology

    A clone-free, single molecule map of the domestic cow (Bos taurus) genome.

    Get PDF
    BackgroundThe cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation.ResultsThe optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmap = 1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts).ConclusionAlignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI's current designation of UMD3.1 sequence assembly as the "reference assembly" and the Btau4.6 as the "alternate assembly." The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds

    Combining Heuristics in Assembly Sequence Planning

    Get PDF
    Assembly Sequence Planning is tackled by modelling and solving a planning problem that considers the execution of the plan in a system with multiple assembly machines. The objective of the plan is the minimization of the total assembly time (makespan). To meet this objective, the model takes into account the durations and resources for the assembly tasks, the change of configuration in the machines, and the transportation of intermediate subassemblies between different workstations. In order to solve the problem, different heuristics has been defined from two relaxed model of it, one considering only the precedence constraints among tasks, and the other one considering only the use of shared resources. From these basic heuristics, other ones have been defined, combining both types of information from the problem, so that the refinement produces substantial improvements over the initial heuristics.Ministerio de Ciencia y TecnologĂ­aDPI2003-07146-C02-0

    Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps

    Get PDF
    The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps
    • …
    corecore