12 research outputs found
STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately 30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2
Using Molecular Features of Xenobiotics to Predict Hepatic Gene Expression Response
Despite recent advances in molecular
medicine and rational drug
design, many drugs still fail because toxic effects arise at the cellular
and tissue level. In order to better understand these effects, cellular
assays can generate high-throughput measurements of gene expression
changes induced by small molecules. However, our understanding of
how the chemical features of small molecules influence gene expression
is very limited. Therefore, we investigated the extent to which chemical
features of small molecules can reliably be associated with significant
changes in gene expression. Specifically, we analyzed the gene expression
response of rat liver cells to 170 different drugs and searched for
genes whose expression could be related to chemical features alone.
Surprisingly, we can predict the up-regulation of 87 genes (increased
expression of at least 1.5 times compared to controls). We show an
average cross-validation predictive area under the receiver operating
characteristic curve (AUROC) of 0.7 or greater for each of these 87
genes. We applied our method to an external data set of rat liver
gene expression response to a novel drug and achieved an AUROC of
0.7. We also validated our approach by predicting up-regulation of
Cytochrome P450 1A2 (CYP1A2) in three drugs known to induce CYP1A2
that were not in our data set. Finally, a detailed analysis of the
CYP1A2 predictor allowed us to identify which fragments made significant
contributions to the predictive scores
Bioinformatics challenges for personalized medicine
Motivation: Widespread availability of low-cost, full genome sequencing will introduce new challenges for bioinformatics
Approximate costs for STORMSeq.
<p>Note that these costs are approximate and may depend on a number of factors related to the input files.</p
Overview of the STORMSeq system.
<p>The user uploads short reads to Amazon S3 and starts a webserver on Amazon EC2, which controls the mapping and variant calling pipeline. Progress can be monitored on the webserver and results are uploaded to persistent storage on Amazon S3.</p
Sample output.
<p>STORMSeq provides basic visualization for summary statistics, such as (A) genome-wide SNP density and (B) size distribution of short indels.</p