Search CORE

14,655 research outputs found

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Optimal Assembly for High Throughput Shotgun Sequencing

Author: Bresler Guy
Bresler Ma'ayan
Tse David
Publication venue
Publication date: 18/02/2013
Field of study

We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.Comment: 26 pages, 18 figure

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

Four payment models for the multi-mode resource constrained project scheduling problem with discounted cash flows

Author: Sahin Sule
Sivrikaya-Serifoglu Funda
Sivrikaya-Şerifoğlu Funda
Ulusoy Gunduz
Ulusoy Gündüz
Şahin Şule
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2001
Field of study

In this paper, the multi-mode resource constrained project scheduling problem with discounted cash flows is considered. The objective is the maximization of the net present value of all cash flows. Time value of money is taken into consideration, and cash in- and outflows are associated with activities and/or events. The resources can be of renewable, nonrenewable, and doubly constrained resource types. Four payment models are considered: Lump sum payment at the terminal event, payments at prespecified event nodes, payments at prespecified time points and progress payments. For finding solutions to problems proposed, a genetic algorithm (GA) approach is employed, which uses a special crossover operator that can exploit the multi-component nature of the problem. The models are investigated at the hand of an example problem. Sensitivity analyses are performed over the mark up and the discount rate. A set of 93 problems from literature are solved under the four different payment models and resource type combinations with the GA approach employed resulting in satisfactory computation times. The GA approach is compared with a domain specific heuristic for the lump sum payment case with renewable resources and is shown to outperform it

Sabanci University Research Database

NEH-based heuristics for the permutation flowshop scheduling problem to minimize total tardiness

Author: Fernández-Viagas Escudero Víctor
Framiñán Torres José Manuel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Since Johnson׳s seminal paper in 1954, scheduling jobs in a permutation flowshop has been receiving the attention of hundreds of practitioners and researchers, being one of the most studied topics in the Operations Research literature. Among the different objectives that can be considered, minimising the total tardiness (i.e. the sum of the surplus of the completion time of each job over its due date) is regarded as a key objective for manufacturing companies, as it entails the fulfilment of the due dates committed to customers. Since this problem is known to be NP-hard, most research has focused on proposing approximate procedures to solve it in reasonable computation times. Particularly, several constructive heuristics have been proposed, with NEHedd being the most efficient one, serving also to provide an initial solution for more elaborate approximate procedures. In this paper, we first analyse in detail the decision problem depending on the generation of the due dates of the jobs, and discuss the similarities with different related decision problems. In addition, for the most characteristic tardiness scenario, the analysis shows that a huge number of ties appear during the construction of the solutions done by the NEHedd heuristic, and that wisely breaking the ties greatly influences the quality of the final solution. Since no tie-breaking mechanism has been designed for this heuristic up to now, we propose several mechanisms that are exhaustively tested. The results show that some of them outperform the original NEHedd by about 25% while keeping the same computational requirements.Ministerio de Ciencia e Innovación DPI2010-15573/DPIMinisterio de Ciencia e Innovación DPI2013-44461-P/DP

idUS. Depósito de Investigación Universidad de Sevilla

Recommended from our members

TCR Convergence in Individuals Treated With Immune Checkpoint Inhibition for Cancer.

Author: Conroy Jeffrey
Fong Lawrence
Looney Timothy John
Lowman Geoffrey
Morrison Carl
Oh David
Topacio-Hall Denise
Zhang Li
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Tumor antigen-driven selection may expand T cells having T cell receptors (TCRs) of shared antigen specificity but different amino acid or nucleotide sequence in a process known as TCR convergence. Substitution sequencing errors introduced by TCRβ (TCRB) repertoire sequencing may create artifacts resembling TCR convergence. Given the anticipated differences in substitution error rates across different next-generation sequencing platforms, the choice of platform could be consequential. To test this, we performed TCRB sequencing on the same peripheral blood mononuclear cells (PBMC) from individuals with cancer receiving anti-CTLA-4 or anti-PD-1 using an Illumina-based approach (Sequenta) and an Ion Torrent-based approach (Oncomine TCRB-LR). While both approaches found similar TCR diversity, clonality, and clonal overlap, we found that Illumina-based sequencing resulted in higher TCR convergence than with the Ion Torrent approach. To build upon this initial observation we conducted a systematic comparison of Illumina-based TCRB sequencing assays, including those employing molecular barcodes, with the Oncomine assay, revealing differences in the frequency of convergent events, purportedly artifactual rearrangements, and sensitivity of detection. Finally, we applied the Ion Torrent-based approach to evaluate clonality and convergence in a cohort of individuals receiving anti-CTLA-4 blockade for cancer. We found that clonality and convergence independently predicted response and could be combined to improve the accuracy of a logistic regression classifier. These results demonstrate the importance of the sequencing platform in assessing TCRB convergence

eScholarship - University of California