23 research outputs found

    Resolution of large and small differences in gene expression using models for the Bayesian analysis of gene expression levels and spotted DNA microarrays

    Get PDF
    BACKGROUND: The detection of small yet statistically significant differences in gene expression in spotted DNA microarray studies is an ongoing challenge. Meeting this challenge requires careful examination of the performance of a range of statistical models, as well as empirical examination of the effect of replication on the power to resolve these differences. RESULTS: New models are derived and software is developed for the analysis of microarray ratio data. These models incorporate multiplicative small error terms, and error standard deviations that are proportional to expression level. The fastest and most powerful method incorporates additive small error terms and error standard deviations proportional to expression level. Data from four studies are profiled for the degree to which they reveal statistically significant differences in gene expression. The gene expression level at which there is an empirical 50% probability of a significant call is presented as a summary statistic for the power to detect small differences in gene expression. CONCLUSIONS: Understanding the resolution of difference in gene expression that is detectable as significant is a vital component of experimental design and evaluation. These small differences in gene expression level are readily detected with a Bayesian analysis of gene expression level that has additive error terms and constrains samples to have a common error coefficient of variation. The power to detect small differences in a study may then be determined by logistic regression

    Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme

    Get PDF
    BACKGROUND: Gene expression profiling has become a useful biological resource in recent years, and it plays an important role in a broad range of areas in biology. The raw gene expression data, usually in the form of large matrix, may contain missing values. The downstream analysis methods that postulate complete matrix input are thus not applicable. Several methods have been developed to solve this problem, such as K nearest neighbor impute method, Bayesian principal components analysis impute method, etc. In this paper, we introduce a novel imputing approach based on the Support Vector Regression (SVR) method. The proposed approach utilizes an orthogonal coding input scheme, which makes use of multi-missing values in one row of a certain gene expression profile and imputes the missing value into a much higher dimensional space, to obtain better performance. RESULTS: A comparative study of our method with the previously developed methods has been presented for the estimation of the missing values on six gene expression data sets. Among the three different input-vector coding schemes we tried, the orthogonal input coding scheme obtains the best estimation results with the minimum Normalized Root Mean Squared Error (NRMSE). The results also demonstrate that the SVR method has powerful estimation ability on different kinds of data sets with relatively small NRMSE. CONCLUSION: The SVR impute method shows better performance than, or at least comparable with, the previously developed methods in present research. The outstanding estimation ability of this impute method is partly due to the use of the most missing value information by incorporating orthogonal input coding scheme. In addition, the solid theoretical foundation of SVR method also helps in estimation of performance together with orthogonal input coding scheme. The promising estimation ability demonstrated in the results section suggests that the proposed approach provides a proper solution to the missing value estimation problem. The source code of the SVR method is available from for non-commercial use

    A two-sample Bayesian t-test for microarray data

    Get PDF
    BACKGROUND: Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically. RESULTS: A two-sample Bayesian t-test is proposed for use in determining whether a gene is differentially expressed in two different samples. The test method is an extension of earlier work that made use of point estimates for the variance. The method proposed here explicitly calculates in analytic form the marginal distribution for the difference in the mean expression of two samples, obviating the need for point estimates of the variance without recourse to posterior simulation. The prior distribution involves a single hyperparameter that can be calculated in a statistically rigorous manner, making clear the connection between the prior degrees of freedom and prior variance. CONCLUSION: The test is easy to understand and implement and application to both real and simulated data shows that the method has equal or greater power compared to the previous method and demonstrates consistent Type I error rates. The test is generally applicable outside the microarray field to any situation where prior information about the variance is available and is not limited to cases where estimates of the variance are based on many similar observations

    Molecular evolution of sex-biased genes in the Drosophila ananassae subgroup

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genes with sex-biased expression often show rapid molecular evolution between species. Previous population genetic and comparative genomic studies of <it>Drosophila melanogaster </it>and <it>D. simulans </it>revealed that male-biased genes have especially high rates of adaptive evolution. To test if this is also the case for other lineages within the <it>melanogaster </it>group, we investigated gene expression in <it>D. ananassae</it>, a species that occurs in structured populations in tropical and subtropical regions. We used custom-made microarrays and published microarray data to characterize the sex-biased expression of 129 <it>D. ananassae </it>genes whose <it>D. melanogaster </it>orthologs had been classified previously as male-biased, female-biased, or unbiased in their expression and had been studied extensively at the population-genetic level. For 43 of these genes we surveyed DNA sequence polymorphism in a natural population of <it>D. ananassae </it>and determined divergence to the sister species <it>D. atripex </it>and <it>D. phaeopleura</it>.</p> <p>Results</p> <p>Sex-biased expression is generally conserved between <it>D. melanogaster </it>and <it>D. ananassae</it>, with the majority of genes exhibiting the same bias in the two species. However, about one-third of the genes have either gained or lost sex-biased expression in one of the species and a small proportion of genes (~4%) have changed bias from one sex to the other. The male-biased genes of <it>D. ananassae </it>show evidence of positive selection acting at the protein level. However, the signal of adaptive protein evolution for male-biased genes is not as strong in <it>D. ananassae </it>as it is in <it>D. melanogaster </it>and is limited to genes with conserved male-biased expression in both species. Within <it>D. ananassae</it>, a significant signal of adaptive evolution is also detected for female-biased and unbiased genes.</p> <p>Conclusions</p> <p>Our findings extend previous observations of widespread adaptive protein evolution to an independent <it>Drosophila </it>lineage, the <it>D. ananassae </it>subgroup. However, the rate of adaptive evolution is not greater for male-biased genes than for female-biased or unbiased genes, which suggests that there are differences in sex-biased gene evolution between the two lineages.</p

    The Transcriptional Response of Drosophila melanogaster to Infection with the Sigma Virus (Rhabdoviridae)

    Get PDF
    Bacterial and fungal infections induce a potent immune response in Drosophila melanogaster, but it is unclear whether viral infections induce an antiviral immune response. Using microarrays, we examined the changes in gene expression in Drosophila that occur in response to infection with the sigma virus, a negative-stranded RNA virus (Rhabdoviridae) that occurs in wild populations of D. melanogaster. We detected many changes in gene expression in infected flies, but found no evidence for the activation of the Toll, IMD or Jak-STAT pathways, which control immune responses against bacteria and fungi. We identified a number of functional categories of genes, including serine proteases, ribosomal proteins and chorion proteins that were overrepresented among the differentially expressed genes. We also found that the sigma virus alters the expression of many more genes in males than in females. These data suggest that either Drosophila do not mount an immune response against the sigma virus, or that the immune response is not controlled by known immune pathways. If the latter is true, the genes that we identified as differentially expressed after infection are promising candidates for controlling the host's response to the sigma virus

    Gene duplication in an African cichlid adaptive radiation

    Get PDF
    BACKGROUND: Gene duplication is a source of evolutionary innovation and can contribute to the divergence of lineages; however, the relative importance of this process remains to be determined. The explosive divergence of the African cichlid adaptive radiations provides both a model for studying the general role of gene duplication in the divergence of lineages and also an exciting foray into the identification of genomic features that underlie the dramatic phenotypic and ecological diversification in this particular lineage. We present the first genome-wide study of gene duplication in African cichlid fishes, identifying gene duplicates in three species belonging to the Lake Malawi adaptive radiation (Metriaclima estherae, Protomelas similis, Rhamphochromis “chilingali”) and one closely related species from a non-radiated riverine lineage (Astatotilapia tweddlei). RESULTS: Using Astatotilapia burtoni as reference, microarray comparative genomic hybridization analysis of 5689 genes reveals 134 duplicated genes among the four cichlid species tested. Between 51 and 55 genes were identified as duplicated in each of the three species from the Lake Malawi radiation, representing a 38%–49% increase in number of duplicated genes relative to the non-radiated lineage (37 genes). Duplicated genes include several that are involved in immune response, ATP metabolism and detoxification. CONCLUSIONS: These results contribute to our understanding of the abundance and type of gene duplicates present in cichlid fish lineages. The duplicated genes identified in this study provide candidates for the analysis of functional relevance with regard to phenotype and divergence. Comparative sequence analysis of gene duplicates can address the role of positive selection and adaptive evolution by gene duplication, while further study across the phylogenetic range of cichlid radiations (and more generally in other adaptive radiations) will determine whether the patterns of gene duplication seen in this study consistently accompany rapid radiation

    Use of genomic DNA control features and predicted operon structure in microarray data analysis: ArrayLeaRNA – a Bayesian approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarrays are widely used for the study of gene expression; however deciding on whether observed differences in expression are significant remains a challenge.</p> <p>Results</p> <p>A computing tool (ArrayLeaRNA) has been developed for gene expression analysis. It implements a Bayesian approach which is based on the Gumbel distribution and uses printed genomic DNA control features for normalization and for estimation of the parameters of the Bayesian model and prior knowledge from predicted operon structure. The method is compared with two other approaches: the classical LOWESS normalization followed by a two fold cut-off criterion and the OpWise method (Price, et al. 2006. BMC Bioinformatics. 7, 19), a published Bayesian approach also using predicted operon structure. The three methods were compared on experimental datasets with prior knowledge of gene expression. With ArrayLeaRNA, data normalization is carried out according to the genomic features which reflect the results of equally transcribed genes; also the statistical significance of the difference in expression is based on the variability of the equally transcribed genes. The operon information helps the classification of genes with low confidence measurements.</p> <p>ArrayLeaRNA is implemented in Visual Basic and freely available as an Excel add-in at <url>http://www.ifr.ac.uk/safety/ArrayLeaRNA/</url></p> <p>Conclusion</p> <p>We have introduced a novel Bayesian model and demonstrated that it is a robust method for analysing microarray expression profiles. ArrayLeaRNA showed a considerable improvement in data normalization, in the estimation of the experimental variability intrinsic to each hybridization and in the establishment of a clear boundary between non-changing and differentially expressed genes. The method is applicable to data derived from hybridizations of labelled cDNA samples as well as from hybridizations of labelled cDNA with genomic DNA and can be used for the analysis of datasets where differentially regulated genes predominate.</p

    Using comparative genomic hybridization to survey genomic sequence divergence across species: a proof-of-concept from Drosophila

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide analysis of sequence divergence among species offers profound insights into the evolutionary processes that shape lineages. When full-genome sequencing is not feasible for a broad comparative study, we propose the use of array-based comparative genomic hybridization (aCGH) in order to identify orthologous genes with high sequence divergence. Here we discuss experimental design, statistical power, success rate, sources of variation and potential confounding factors. We used a spotted PCR product microarray platform from <it>Drosophila melanogaster </it>to assess sequence divergence on a gene-by-gene basis in three fully sequenced heterologous species (<it>D. sechellia</it>, <it>D. simulans</it>, and <it>D. yakuba</it>). Because complete genome assemblies are available for these species this study presents a powerful test for the use of aCGH as a tool to measure sequence divergence.</p> <p>Results</p> <p>We found a consistent and linear relationship between hybridization ratio and sequence divergence of the sample to the platform species. At higher levels of sequence divergence (< 92% sequence identity to <it>D. melanogaster</it>) ~84% of features had significantly less hybridization to the array in the heterologous species than the platform species, and thus could be identified as "diverged". At lower levels of divergence (≥ 97% identity), only 13% of genes were identified as diverged. While ~40% of the variation in hybridization ratio can be accounted for by variation in sequence identity of the heterologous sample relative to <it>D. melanogaster</it>, other individual characteristics of the DNA sequences, such as GC content, also contribute to variation in hybridization ratio, as does technical variation.</p> <p>Conclusions</p> <p>Here we demonstrate that aCGH can accurately be used as a proxy to estimate genome-wide divergence, thus providing an efficient way to evaluate how evolutionary processes and genomic architecture can shape species diversity in non-model systems. Given the increased number of species for which microarray platforms are available, comparative studies can be conducted for many interesting lineages in order to identify highly diverged genes that may be the target of natural selection.</p