14,655 research outputs found

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Optimal Assembly for High Throughput Shotgun Sequencing

    Get PDF
    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.Comment: 26 pages, 18 figure

    Four payment models for the multi-mode resource constrained project scheduling problem with discounted cash flows

    Get PDF
    In this paper, the multi-mode resource constrained project scheduling problem with discounted cash flows is considered. The objective is the maximization of the net present value of all cash flows. Time value of money is taken into consideration, and cash in- and outflows are associated with activities and/or events. The resources can be of renewable, nonrenewable, and doubly constrained resource types. Four payment models are considered: Lump sum payment at the terminal event, payments at prespecified event nodes, payments at prespecified time points and progress payments. For finding solutions to problems proposed, a genetic algorithm (GA) approach is employed, which uses a special crossover operator that can exploit the multi-component nature of the problem. The models are investigated at the hand of an example problem. Sensitivity analyses are performed over the mark up and the discount rate. A set of 93 problems from literature are solved under the four different payment models and resource type combinations with the GA approach employed resulting in satisfactory computation times. The GA approach is compared with a domain specific heuristic for the lump sum payment case with renewable resources and is shown to outperform it

    NEH-based heuristics for the permutation flowshop scheduling problem to minimize total tardiness

    Get PDF
    Since Johnson׳s seminal paper in 1954, scheduling jobs in a permutation flowshop has been receiving the attention of hundreds of practitioners and researchers, being one of the most studied topics in the Operations Research literature. Among the different objectives that can be considered, minimising the total tardiness (i.e. the sum of the surplus of the completion time of each job over its due date) is regarded as a key objective for manufacturing companies, as it entails the fulfilment of the due dates committed to customers. Since this problem is known to be NP-hard, most research has focused on proposing approximate procedures to solve it in reasonable computation times. Particularly, several constructive heuristics have been proposed, with NEHedd being the most efficient one, serving also to provide an initial solution for more elaborate approximate procedures. In this paper, we first analyse in detail the decision problem depending on the generation of the due dates of the jobs, and discuss the similarities with different related decision problems. In addition, for the most characteristic tardiness scenario, the analysis shows that a huge number of ties appear during the construction of the solutions done by the NEHedd heuristic, and that wisely breaking the ties greatly influences the quality of the final solution. Since no tie-breaking mechanism has been designed for this heuristic up to now, we propose several mechanisms that are exhaustively tested. The results show that some of them outperform the original NEHedd by about 25% while keeping the same computational requirements.Ministerio de Ciencia e Innovación DPI2010-15573/DPIMinisterio de Ciencia e Innovación DPI2013-44461-P/DP
    corecore