Search CORE

480 research outputs found

Metassembler: merging and optimizing de novo genome assemblies

Author: A Rahman
Alejandro Hernandez Wences
AV Zimin
B Langmead
D Earl
D Marbach
ER Mardis
G Parra
J Nijkamp
KR Bradnam
LM Soto-Jimenez
M Hunt
M Pop
MC Schatz
MC Schatz
MC Schatz
Michael C. Schatz
R Vicedomini
RJ Roberts
S Kurtz
SL Salzberg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Extending Science Gateway Frameworks to Support Big Data Applications in the Cloud

Author: B Ludascher
Carlos Blanco
D Churches
Gabor Terstyanszky
J Dean
L Li
MC Schatz
P Kacsuk
Shashank Gugnani
T Oinn
Tamas Kiss
X Fei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Cloud computing offers massive scalability and elasticity required by many scientific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new oppor-tunities for application developers. This paper investigates how workflow sys-tems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data

Crossref

UCrea

Springer - Publisher Connector

WestminsterResearch

CloudMan as a platform for tool, data, and analysis distribution

Author: B Langmead
Brad Chapman
E Afgan
E Afgan
E Afgan
E Afgan
Enis Afgan
J Goecks
James Taylor
M DePristo
MC Schatz
MC Schatz
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Three geographically separate domestications of Asian rice

Author: AJ Garris
B-R Lu
BL Gross
C Li
C Vitte
C-C Yang
DQ Fuller
H-I Oka
J Ma
J Molina
J Yu
JP Londo
K Zhao
L Tan
L-Z Gao
MC Schatz
N Patterson
Q Zhu
RG Allaby
S Hutter
SA Goff
WP Maddison
X Huang
Z He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2015
Field of study

Domesticated rice (Oryza sativa L.) accompanied the dawn of Asian civilization(1) and has become one of world's staple crops. From archaeological and genetic evidence various contradictory scenarios for the origin of different varieties of cultivated rice have been proposed, the most recent based on a single domestication(2,3). By examining the footprints of selection in the genomes of different cultivated rice types, we show that there were three independent domestications in different parts of Asia. We identify wild populations in southern China and the Yangtze valley as the source of the japonica gene pool, and populations in Indochina and the Brahmaputra valley as the source of the indica gene pool. We reveal a hitherto unrecognized origin for the aus variety in central India or Bangladesh. We also conclude that aromatic rice is a result of a hybridization between japonica and aus, and that the tropical and temperate versions of japonica are later adaptations of one crop. Our conclusions are in accord with archaeological evidence that suggests widespread origins of rice cultivation(1,4). We therefore anticipate that our results will stimulate a more productive collaboration between genetic and archaeological studies of rice domestication, and guide utilization of genetic resources in breeding programmes aimed at crop improvement.European Research Council [339941]info:eu-repo/semantics/publishedVersio

Crossref

PubMed Central

Sapientia

The University of Manchester - Institutional Repository

FigShare

GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

Author: BJ Keating
H Zhou
HJ Cordell
J He
J Marchini
JE Stone
Kai Wang
L Dematte
MC Schatz
Mingyao Li
NA Davis
S Purcell
Satish Chikkagoudar
T Schupbach
VW Lee
Publication venue: BioMed Central
Publication date: 01/05/2011
Field of study

Abstract Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from <url>http://www.cceb.upenn.edu/~mli/software/GENIE/</url>.</p

Crossref

Directory of Open Access Journals

PubMed Central

MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees

Author: C Ranger
C Stockham
DE Soltis
DF Robinson
DM Hillis
E Gabriel
J Dean
LA Lewis
MC Schatz
SJ Sul
SJ Sul
SJ Sul
SJ Sul
Suzanne J Matthews
Tiffani L Williams
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF (<it>MapReduce Speeds up RF</it>), a multi-core algorithm to generate a <it>t </it>× <it>t </it>Robinson-Foulds distance matrix between <it>t </it>trees using the MapReduce paradigm. Results We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually. Conclusion Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Texas A&M Repository

Region of hadron-quark mixed phase in hybrid stars

Author: A Muendel
A Winther
CA Bertulani
F Ajzenberg-Selove
F Hammache
G Baur
G Baur
G Baur
G Baur
H Esbensen
H Esbensen
H Schatz
H Utsunomiya
J Görres
J Görres
JH Kelley
JJ Cowan
JR Oppenheimer
JR Oppenheimer
Jv Schwarzenberg
K Hencken
K Ieki
K-H Schmidt
KM Nollett
LD Landau
M Zinser
MC Abreu
N Iwasa
S Shimoura
S Typel
S Typel
S Typel
S Typel
T Aumann
T Kobayashi
T Motobayashi
T Nakamura
T Nakamura
T Rauscher
VS Melezhik
Publication venue
Publication date: 18/08/2000
Field of study

Hadron--quark mixed phase is expected in a wide region of the inner structure of hybrid stars. However, we show that the hadron--quark mixed phase should be restricted to a narrower region to because of the charge screening effect. The narrow region of the mixed phase seems to explain physical phenomena of neutron stars such as the strong magnetic field and glitch phenomena, and it would give a new cooling curve for the neutron star.Comment: to be published in Physical Review

arXiv.org e-Print Archive

Crossref

Juelich Shared Electronic Resources

CERN Document Server

Real-time digital pathogen surveillance - the time is now

Author: A Marí Saéz
Andrew Rambaut
C Fraser
H Rohde
J Quick
Jennifer Gardy
MC Schatz
Nicholas J. Loman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

It is time to shake up public health surveillance. New technologies for sequencing, aided by friction-free approaches to data sharing, could have an impact on public health efforts

Crossref

Springer - Publisher Connector

University of Birmingham Research Portal

PubMed Central

Edinburgh Research Explorer

CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

Author: A Bateman
A Bateman
A Tridgell
Aaron Gussman
AC Stewart
AL Delcher
B Langmead
B Langmead
BE Suzek
C Hemmerich
C Rapier
Cesar Arze
D Field
D Hull
David R Riley
DL Wheeler
DR Zerbino
E Afgan
EE Schadt
F Meyer
J Dean
J Goecks
J Orvis
J White
J White
J White
James R White
JD Selengut
JG Caporaso
JP Mesirov
JR Cole
JR Miller
JR White
JT Dudley
K Galens
K Keahey
K Lagesen
Kevin Galens
LD Stein
M Reich
Mahesh Vangala
Malcolm Matalka
MC Schatz
MC Schatz
MC Schatz
O Trelles
Owen White
PD Schloss
RC Edgar
RK Aziz
RL Tatusov
S Angiuoli
Samuel V Angiuoli
SD Kahn
SF Altschul
SF Altschul
SR Eddy
TM Lowe
W Florian Fricke
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.https://doi.org/10.1186/1471-2105-12-35

Crossref

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

Re-Assembly of the Genome of Francisella tularensis Subsp. holarctica OSU18

Author: A Johansson
AM Phillippy
AM Phillippy
D Gordon
Daniela Puiu
DT Dennis
EW Myers
JF Petrosino
JR White
L Rohmer
M Enserink
M Pop
Matthew W. Hahn
MC Schatz
P Havlak
S Kurtz
SL Salzberg
SL Salzberg
Steven L. Salzberg
Publication venue: Public Library of Science
Publication date: 17/10/2008
Field of study

Francisella tularensis is a highly infectious human intracellular pathogen that is the causative agent of tularemia. It occurs in several major subtypes, including the live vaccine strain holarctica (type B). F. tularensis is classified as category A biodefense agent in part because a relatively small number of organisms can cause severe illness. Three complete genomes of subspecies holarctica have been sequenced and deposited in public archives, of which OSU18 was the first and the only strain for which a scientific publication has appeared [1]. We re-assembled the OSU18 strain using both de novo and comparative assembly techniques, and found that the published sequence has two large inversion mis-assemblies. We generated a corrected assembly of the entire genome along with detailed information on the placement of individual reads within the assembly. This assembly will provide a more accurate basis for future comparative studies of this pathogen

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central