Search CORE

11,769 research outputs found

Restricted Dynamic Programming Heuristic for Precedence Constrained Bottleneck Generalized TSP

Author: Salii Y.
Publication venue: Уральский федеральный университет
Publication date: 01/01/2015
Field of study

We develop a restricted dynamical programming heuristic for a complicated traveling salesman problem: a) cities are grouped into clusters, resp. Generalized TSP; b) precedence constraints are imposed on the order of visiting the clusters, resp. Precedence Constrained TSP; c) the costs of moving to the next cluster and doing the required job inside one are aggregated in a minimax manner, resp. Bottleneck TSP; d) all the costs may depend on the sequence of previously visited clusters, resp. Sequence-Dependent TSP or Time Dependent TSP. Such multiplicity of constraints complicates the use of mixed integer-linear programming, while dynamic programming (DP) benefits from them; the latter may be supplemented with a branch-and-bound strategy, which necessitates a “DP-compliant” heuristic. The proposed heuristic always yields a feasible solution, which is not always the case with heuristics, and its precision may be tuned until it becomes the exact DP

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Author: Barbary Kyle
Franklin Michael J.
Nothaft Frank Austin
Patterson David A.
Perlmutter Saul
Sparks Evan
Zahn Oliver
Zhang Zhao
Publication venue
Publication date: 22/12/2015
Field of study

Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, tools from the HPC software stack are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark -- a modern big data platform -- to parallelize many-task applications. We present Kira, a flexible and distributed astronomy image processing toolkit using Apache Spark. We then use the Kira toolkit to implement a Source Extractor application for astronomy images, called Kira SE. With Kira SE as the use case, we study the programming flexibility, dataflow richness, scheduling capacity and performance of Apache Spark running on the EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon EC2 cloud. Furthermore, we show that by leveraging software originally designed for big data infrastructure, Kira SE achieves competitive performance to the C implementation running on the NERSC Edison supercomputer. Our experience with Kira indicates that emerging Big Data platforms such as Apache Spark are a performant alternative for many-task scientific applications

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Many-Task Computing and Blue Waters

Author: Armstrong Timothy G.
Katz Daniel S.
Wilde Michael
Wozniak Justin M.
Zhang Zhao
Publication venue
Publication date: 01/01/2012
Field of study

This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

arXiv.org e-Print Archive

CiteSeerX

PseudoFuN: Deriving functional potentials of pseudogenes from integrative relationships with genes and microRNAs across 32 cancers

Author: Campbell Moray J.
Dan Li Shuyu
Franz Eric
Huang Kun
Huang Zhi
Johnson Travis S.
Li Sihong
Zhang Yan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/05/2019
Field of study

BACKGROUND: Long thought "relics" of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene-parent gene relationships without leveraging other homologous genes/pseudogenes. RESULTS: We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four "flavors" of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a "one stop shop" for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers. CONCLUSIONS: Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike

IUPUIScholarWorks

Automated data processing architecture for the Gemini Planet Imager Exoplanet Survey

Author: Ammons S. Mark
Arriaga Pauline
Bailey Vanessa
Barman Travis
Bruzzone Sebastian
Bulger Joanna
Chilcote Jeffrey
Cotten Tara
De Rosa Robert
Doyon René
Duchêne Gaspard
Fitzgerald Michael
Follette Katherine
Goodsell Stephen
Graham James
Greenbaum Alexandra
Hibon Pascale
Hung Li-Wei
Ingraham Patrick
Kalas Paul
Konopacky Quinn
Larkin James
Macintosh Bruce
Maire Jérôme
Marchis Franck
Marley Mark
Marois Christian
Metchev Stanimir
Millar-Blanchaer Maxwell
Nielsen Eric
Oppenheimer Rebecca
Palmer David
Patience Jennifer
Perrin Marshall
Poyneer Lisa
Pueyo Laurent
Rajan Abhijith
Rameau Julien
Rantakyrö Fredrik
Ruffio Jean-Baptiste
Savransky Dmitry
Schneider Adam
Shapiro Jacob
Sivaramakrishnan Anand
Song Inseok
Soummer Remi
Thomas Sandrine
Wallace J. Kent
Wang Jason
Ward-Duong Kimberly
Wiktorowicz Sloane
Wolff Schuyler
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2018
Field of study

The Gemini Planet Imager Exoplanet Survey (GPIES) is a multi-year direct imaging survey of 600 stars to discover and characterize young Jovian exoplanets and their environments. We have developed an automated data architecture to process and index all data related to the survey uniformly. An automated and flexible data processing framework, which we term the Data Cruncher, combines multiple data reduction pipelines together to process all spectroscopic, polarimetric, and calibration data taken with GPIES. With no human intervention, fully reduced and calibrated data products are available less than an hour after the data are taken to expedite follow-up on potential objects of interest. The Data Cruncher can run on a supercomputer to reprocess all GPIES data in a single day as improvements are made to our data reduction pipelines. A backend MySQL database indexes all files, which are synced to the cloud, and a front-end web server allows for easy browsing of all files associated with GPIES. To help observers, quicklook displays show reduced data as they are processed in real-time, and chatbots on Slack post observing information as well as reduced data products. Together, the GPIES automated data processing architecture reduces our workload, provides real-time data reduction, optimizes our observing strategy, and maintains a homogeneously reduced dataset to study planet occurrence and instrument performance.Comment: 21 pages, 3 figures, accepted in JATI

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

HAL-INSU

The University of Arizona

eScholarship - University of California

HAL Université de Savoie