25 research outputs found
Shape Analysis of High-throughput Genomics Data
RNA sequencing refers to the use of
next-generation sequencing technologies to characterize
the identity and abundance of target RNA species in a biological sample
of interest.
The recent improvement and reduction in the cost of next-generation
sequencing technologies have been
paralleled by the development of statistical methodologies to analyze the
data they produce.
Coupled with the reduction in cost is the increase in the complexity
of experiments.
Some of the old challenges still remain.
For example the issue of normalization is important now more than ever.
Some of the crude assumptions made in the early stages of RNA sequencing
data analysis were necessary since the technology was new and untested,
the number of replicates were small, and the experiments were relatively
simple.
One of the many uses of RNA sequencing experiments is the
identification of genes whose abundance levels are significantly different
across various biological conditions of interest.
Several methods have been developed to answer this question.
Some of these newly developed methods are based on the assumption
that the data observed or a transformation of the data are relatively symmetric
with light tails, usually summarized by assuming a Gaussian random component.
It is indeed very difficult to assess this assumption for small sample sizes
(e.g. sample sizes in the range of 4 to 30).
In this dissertation, we utilize L-moments statistics as the basis for
normalization, exploratory data analysis, the assessment of distributional assumptions,
and the hypothesis testing of high-throughput transcriptomic data.
In particular, we introduce a new normalization method for high-throughput
transcriptomic data that is a modification of quantile normalization.
We use L-moments ratios for assessing the shape
(skewness and kurtosis statistics) of high-throughput transcriptome data.
Based on these statistics, we propose a test for assessing whether
the shapes of the observed samples differ across biological conditions.
We also illustrate the utility of this framework to characterize
the robustness of distributional assumptions made by statistical methods
for differential expression.
We apply it to RNA-seq data and find that methods based on the simple t-test
for differential expression analysis using L-moments statistics as weights are robust.
Finally we provide an algorithm based on L-moments ratios for identifying genes with
distributions that are markedly different from the majority in the data
DNA Methylation Patterns in Cord Blood of Neonates Across Gestational Age Association With Cell-Type Proportions
Background: A statistical methodology is available to estimate the proportion of cell types (cellular heterogeneity) in adult whole blood specimens used in epigenome-wide association studies (EWAS). However, there is no methodology to estimate the proportion of cell types in umbilical cord blood (also a heterogeneous tissue) used in EWAS.
Objectives: The objectives of this study were to determine whether differences in DNA methylation (DNAm) patterns in umbilical cord blood are the result of blood cell type proportion changes that typically occur across gestational age and to demonstrate the effect of cell type proportion confounding by comparing preterm infants exposed and not exposed to antenatal steroids.
Methods: We obtained DNAm profiles of cord blood using the Illumina HumanMethylation27k BeadChip array for 385 neonates from the Boston Birth Cohort. We estimated cell type proportions for six cell types using the deconvolution method developed by Houseman et al. (2012).
Results: The cell type proportion estimates segregated into two groups that were significantly different by gestational age, indicating that gestational age was associated with cell type proportion. Among infants exposed to antenatal steroids, the number of differentially methylated CpGs dropped from 127 to 1 after controlling for cell type proportion.
Discussion: EWAS utilizing cord blood are confounded by cell type proportion. Careful study design including correction for cell type proportion and interpretation of results of EWAS using cord blood are critical
Buruli Ulcer in Ghana: Results of a National Case Search
A national search for cases of Buruli ulcer in Ghana identified 5,619 patients, with 6,332 clinical lesions at various stages. The overall crude national prevalence rate of active lesions was 20.7 per 100,000, but the rate was 150.8 per 100,000 in the most disease-endemic district. The case search demonstrated widespread disease and gross underreporting compared with the routine reporting system. The epidemiologic information gathered will contribute to the design of control programs for Buruli ulcer
Simultaneous transcriptional profiling of Leishmania major and its murine macrophage host cell reveals insights into host-pathogen interactions
Parasites of the genus Leishmania are the causative agents of leishmaniasis, a group of diseases that range in manifestations from skin lesions to fatal visceral disease. The life cycle of Leishmania parasites is split between its insect vector and its mammalian host, where it resides primarily inside of macrophages. Once intracellular, Leishmania parasites must evade or deactivate the host's innate and adaptive immune responses in order to survive and replicate. We performed transcriptome profiling using RNA-seq to simultaneously identify global changes in murine macrophage and L. major gene expression as the parasite entered and persisted within murine macrophages during the first 72 h of an infection. Differential gene expression, pathway, and gene ontology analyses enabled us to identify modulations in host and parasite responses during an infection. The most substantial and dynamic gene expression responses by both macrophage and parasite were observed during early infection. Murine genes related to both pro- and anti-inflammatory immune responses and glycolysis were substantially upregulated and genes related to lipid metabolism, biogenesis, and Fc gamma receptor-mediated phagocytosis were downregulated. Upregulated parasite genes included those aimed at mitigating the effects of an oxidative response by the host immune system while downregulated genes were related to translation, cell signaling, fatty acid biosynthesis, and flagellum structure. The gene expression patterns identified in this work yield signatures that characterize multiple developmental stages of L. major parasites and the coordinated response of Leishmania-infected macrophages in the real-time setting of a dual biological system. This comprehensive dataset offers a clearer and more sensitive picture of the interplay between host and parasite during intracellular infection, providing additional insights into how pathogens are able to evade host defenses and modulate the biological functions of the cell in order to survive in the mammalian environment.https://doi.org/10.1186/s12864-015-2237-
Analysis and correction of compositional bias in sparse sequencing count data
Abstract Background Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. Results We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it. Conclusions Compositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed