480 research outputs found
Extending Science Gateway Frameworks to Support Big Data Applications in the Cloud
Cloud computing offers massive scalability and elasticity required by many scientific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new oppor-tunities for application developers. This paper investigates how workflow sys-tems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data
Three geographically separate domestications of Asian rice
Domesticated rice (Oryza sativa L.) accompanied the dawn of Asian civilization(1) and has become one of world's staple crops. From archaeological and genetic evidence various contradictory scenarios for the origin of different varieties of cultivated rice have been proposed, the most recent based on a single domestication(2,3). By examining the footprints of selection in the genomes of different cultivated rice types, we show that there were three independent domestications in different parts of Asia. We identify wild populations in southern China and the Yangtze valley as the source of the japonica gene pool, and populations in Indochina and the Brahmaputra valley as the source of the indica gene pool. We reveal a hitherto unrecognized origin for the aus variety in central India or Bangladesh. We also conclude that aromatic rice is a result of a hybridization between japonica and aus, and that the tropical and temperate versions of japonica are later adaptations of one crop. Our conclusions are in accord with archaeological evidence that suggests widespread origins of rice cultivation(1,4). We therefore anticipate that our results will stimulate a more productive collaboration between genetic and archaeological studies of rice domestication, and guide utilization of genetic resources in breeding programmes aimed at crop improvement.European Research Council [339941]info:eu-repo/semantics/publishedVersio
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores
<p>Abstract</p> <p>Background</p> <p>Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits.</p> <p>Findings</p> <p>Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run.</p> <p>Conclusions</p> <p>GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from <url>http://www.cceb.upenn.edu/~mli/software/GENIE/</url>.</p
MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
<p>Abstract</p> <p>Background</p> <p>MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF (<it>MapReduce Speeds up RF</it>), a multi-core algorithm to generate a <it>t </it>× <it>t </it>Robinson-Foulds distance matrix between <it>t </it>trees using the MapReduce paradigm.</p> <p>Results</p> <p>We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually.</p> <p>Conclusion</p> <p>Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees.</p
Region of hadron-quark mixed phase in hybrid stars
Hadron--quark mixed phase is expected in a wide region of the inner structure
of hybrid stars. However, we show that the hadron--quark mixed phase should be
restricted to a narrower region to because of the charge screening effect. The
narrow region of the mixed phase seems to explain physical phenomena of neutron
stars such as the strong magnetic field and glitch phenomena, and it would give
a new cooling curve for the neutron star.Comment: to be published in Physical Review
Real-time digital pathogen surveillance - the time is now
It is time to shake up public health surveillance. New technologies for sequencing, aided by friction-free approaches to data sharing, could have an impact on public health efforts
CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.https://doi.org/10.1186/1471-2105-12-35
Re-Assembly of the Genome of Francisella tularensis Subsp. holarctica OSU18
Francisella tularensis is a highly infectious human intracellular pathogen that is the causative agent of tularemia. It occurs in several major subtypes, including the live vaccine strain holarctica (type B). F. tularensis is classified as category A biodefense agent in part because a relatively small number of organisms can cause severe illness. Three complete genomes of subspecies holarctica have been sequenced and deposited in public archives, of which OSU18 was the first and the only strain for which a scientific publication has appeared [1]. We re-assembled the OSU18 strain using both de novo and comparative assembly techniques, and found that the published sequence has two large inversion mis-assemblies. We generated a corrected assembly of the entire genome along with detailed information on the placement of individual reads within the assembly. This assembly will provide a more accurate basis for future comparative studies of this pathogen
- …