14,655 research outputs found
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Optimal Assembly for High Throughput Shotgun Sequencing
We present a framework for the design of optimal assembly algorithms for
shotgun sequencing under the criterion of complete reconstruction. We derive a
lower bound on the read length and the coverage depth required for
reconstruction in terms of the repeat statistics of the genome. Building on
earlier works, we design a de Brujin graph based assembly algorithm which can
achieve very close to the lower bound for repeat statistics of a wide range of
sequenced genomes, including the GAGE datasets. The results are based on a set
of necessary and sufficient conditions on the DNA sequence and the reads for
reconstruction. The conditions can be viewed as the shotgun sequencing analogue
of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by
Hybridization.Comment: 26 pages, 18 figure
Four payment models for the multi-mode resource constrained project scheduling problem with discounted cash flows
In this paper, the multi-mode resource constrained project scheduling problem with discounted cash flows is considered. The objective is the maximization of the net present value of all cash flows. Time value of money is taken into consideration, and cash in- and outflows are associated with activities and/or events. The resources can be of renewable, nonrenewable, and doubly constrained resource types. Four payment models are considered: Lump sum payment at the terminal event, payments at prespecified event nodes, payments at prespecified time points and progress payments. For finding solutions to problems proposed, a
genetic algorithm (GA) approach is employed, which uses a special crossover operator that can exploit the multi-component nature of the problem. The models are investigated at the hand of an example problem. Sensitivity analyses are performed over the mark up and the discount rate. A set of 93 problems from literature are solved under the four different payment models and resource type combinations with the GA approach employed resulting in satisfactory computation times. The GA approach is compared with a domain specific heuristic for the lump sum payment case with renewable resources and is shown to outperform it
NEH-based heuristics for the permutation flowshop scheduling problem to minimize total tardiness
Since Johnson׳s seminal paper in 1954, scheduling jobs in a permutation flowshop has been receiving the attention of hundreds of practitioners and researchers, being one of the most studied topics in the Operations Research literature. Among the different objectives that can be considered, minimising the total tardiness (i.e. the sum of the surplus of the completion time of each job over its due date) is regarded as a key objective for manufacturing companies, as it entails the fulfilment of the due dates committed to customers. Since this problem is known to be NP-hard, most research has focused on proposing approximate procedures to solve it in reasonable computation times. Particularly, several constructive heuristics have been proposed, with NEHedd being the most efficient one, serving also to provide an initial solution for more elaborate approximate procedures. In this paper, we first analyse in detail the decision problem depending on the generation of the due dates of the jobs, and discuss the similarities with different related decision problems. In addition, for the most characteristic tardiness scenario, the analysis shows that a huge number of ties appear during the construction of the solutions done by the NEHedd heuristic, and that wisely breaking the ties greatly influences the quality of the final solution. Since no tie-breaking mechanism has been designed for this heuristic up to now, we propose several mechanisms that are exhaustively tested. The results show that some of them outperform the original NEHedd by about 25% while keeping the same computational requirements.Ministerio de Ciencia e Innovación DPI2010-15573/DPIMinisterio de Ciencia e Innovación DPI2013-44461-P/DP
Recommended from our members
TCR Convergence in Individuals Treated With Immune Checkpoint Inhibition for Cancer.
Tumor antigen-driven selection may expand T cells having T cell receptors (TCRs) of shared antigen specificity but different amino acid or nucleotide sequence in a process known as TCR convergence. Substitution sequencing errors introduced by TCRβ (TCRB) repertoire sequencing may create artifacts resembling TCR convergence. Given the anticipated differences in substitution error rates across different next-generation sequencing platforms, the choice of platform could be consequential. To test this, we performed TCRB sequencing on the same peripheral blood mononuclear cells (PBMC) from individuals with cancer receiving anti-CTLA-4 or anti-PD-1 using an Illumina-based approach (Sequenta) and an Ion Torrent-based approach (Oncomine TCRB-LR). While both approaches found similar TCR diversity, clonality, and clonal overlap, we found that Illumina-based sequencing resulted in higher TCR convergence than with the Ion Torrent approach. To build upon this initial observation we conducted a systematic comparison of Illumina-based TCRB sequencing assays, including those employing molecular barcodes, with the Oncomine assay, revealing differences in the frequency of convergent events, purportedly artifactual rearrangements, and sensitivity of detection. Finally, we applied the Ion Torrent-based approach to evaluate clonality and convergence in a cohort of individuals receiving anti-CTLA-4 blockade for cancer. We found that clonality and convergence independently predicted response and could be combined to improve the accuracy of a logistic regression classifier. These results demonstrate the importance of the sequencing platform in assessing TCRB convergence
- …