Search CORE

eScholarship - University of California

Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme

Author: Feng Huanqing
Jiang Zhaohui
Li Ao
Wang Xian
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Gene expression profiling has become a useful biological resource in recent years, and it plays an important role in a broad range of areas in biology. The raw gene expression data, usually in the form of large matrix, may contain missing values. The downstream analysis methods that postulate complete matrix input are thus not applicable. Several methods have been developed to solve this problem, such as K nearest neighbor impute method, Bayesian principal components analysis impute method, etc. In this paper, we introduce a novel imputing approach based on the Support Vector Regression (SVR) method. The proposed approach utilizes an orthogonal coding input scheme, which makes use of multi-missing values in one row of a certain gene expression profile and imputes the missing value into a much higher dimensional space, to obtain better performance. RESULTS: A comparative study of our method with the previously developed methods has been presented for the estimation of the missing values on six gene expression data sets. Among the three different input-vector coding schemes we tried, the orthogonal input coding scheme obtains the best estimation results with the minimum Normalized Root Mean Squared Error (NRMSE). The results also demonstrate that the SVR method has powerful estimation ability on different kinds of data sets with relatively small NRMSE. CONCLUSION: The SVR impute method shows better performance than, or at least comparable with, the previously developed methods in present research. The outstanding estimation ability of this impute method is partly due to the use of the most missing value information by incorporating orthogonal input coding scheme. In addition, the solid theoretical foundation of SVR method also helps in estimation of performance together with orthogonal input coding scheme. The promising estimation ability demonstrated in the results section suggests that the proposed approach provides a proper solution to the missing value estimation problem. The source code of the SVR method is available from for non-commercial use

Harvard University - DASH

Recommended from our members

Y Chromosome Mediates Ribosomal DNA Silencing and Modulates the Chromatin State in Drosophila

Author: Eickbush Thomas H.
Hartl Daniel L.
Martinsen Lene
Sackton Timothy
Silva Bernardo Lemos
Zhou Jun
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 16/09/2014
Field of study

Although the Drosophila Y chromosome is degenerated, heterochromatic, and contains few genes, increasing evidence suggests that it plays an important role in regulating the expression of numerous autosomal and X-linked genes. Here we use 15 Y chromosomes originating from a single founder 550 generations ago to study the role of the Y chromosome in regulating rRNA gene transcription, position-effect variegation (PEV), and the link among rDNA copy number, global gene expression, and chromatin regulation. Based on patterns of rRNA gene transcription indicated by transcription of the retrotransposon R2 that specifically inserts into the 28S rRNA gene, we show that X-linked rDNA is silenced in males. The silencing of X-linked rDNA expression by the Y chromosome is consistent across populations and independent of genetic background. These Y chromosomes also vary more than threefold in rDNA locus size and cause dramatically different levels of PEV suppression. The degree of suppression is negatively associated with the number and fraction of rDNA units without transposon insertions, but not with total rDNA locus size. Gene expression profiling revealed hundreds of differentially expressed genes among these Y chromosome introgression lines, as well as a divergent global gene expression pattern between the low-PEV and high-PEV flies. Our findings suggest that the Y chromosome is involved in diverse phenomena related to transcriptional regulation including X-linked rDNA silencing and suppression of PEV phenotype. These results further expand our understanding of the role of the Y chromosome in modulating global gene expression, and suggest a link with modifications of the chromatin state.Organismic and Evolutionary Biolog

A critical assessment of cross-species detection of gene duplicates using comparative genomic hybridization

Author: Machado Heather E
Renn Suzy CP
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

A two-sample Bayesian t-test for microarray data

Author: Dimmic Matthew W
Fox Richard J
Publication venue: BioMed Central
Publication date: 01/03/2006
Field of study

BACKGROUND: Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically. RESULTS: A two-sample Bayesian t-test is proposed for use in determining whether a gene is differentially expressed in two different samples. The test method is an extension of earlier work that made use of point estimates for the variance. The method proposed here explicitly calculates in analytic form the marginal distribution for the difference in the mean expression of two samples, obviating the need for point estimates of the variance without recourse to posterior simulation. The prior distribution involves a single hyperparameter that can be calculated in a statistically rigorous manner, making clear the connection between the prior degrees of freedom and prior variance. CONCLUSION: The test is easy to understand and implement and application to both real and simulated data shows that the method has equal or greater power compared to the previous method and demonstrates consistent Type I error rates. The test is generally applicable outside the microarray field to any situation where prior information about the variance is available and is not limited to cases where estimates of the variance are based on many similar observations

Molecular evolution of sex-biased genes in the Drosophila ananassae subgroup

Author: Baines John F
Grath Sonja
Parsch John
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Genes with sex-biased expression often show rapid molecular evolution between species. Previous population genetic and comparative genomic studies of <it>Drosophila melanogaster </it>and <it>D. simulans </it>revealed that male-biased genes have especially high rates of adaptive evolution. To test if this is also the case for other lineages within the <it>melanogaster </it>group, we investigated gene expression in <it>D. ananassae</it>, a species that occurs in structured populations in tropical and subtropical regions. We used custom-made microarrays and published microarray data to characterize the sex-biased expression of 129 <it>D. ananassae </it>genes whose <it>D. melanogaster </it>orthologs had been classified previously as male-biased, female-biased, or unbiased in their expression and had been studied extensively at the population-genetic level. For 43 of these genes we surveyed DNA sequence polymorphism in a natural population of <it>D. ananassae </it>and determined divergence to the sister species <it>D. atripex </it>and <it>D. phaeopleura</it>. Results Sex-biased expression is generally conserved between <it>D. melanogaster </it>and <it>D. ananassae</it>, with the majority of genes exhibiting the same bias in the two species. However, about one-third of the genes have either gained or lost sex-biased expression in one of the species and a small proportion of genes (~4%) have changed bias from one sex to the other. The male-biased genes of <it>D. ananassae </it>show evidence of positive selection acting at the protein level. However, the signal of adaptive protein evolution for male-biased genes is not as strong in <it>D. ananassae </it>as it is in <it>D. melanogaster </it>and is limited to genes with conserved male-biased expression in both species. Within <it>D. ananassae</it>, a significant signal of adaptive evolution is also detected for female-biased and unbiased genes. Conclusions Our findings extend previous observations of widespread adaptive protein evolution to an independent <it>Drosophila </it>lineage, the <it>D. ananassae </it>subgroup. However, the rate of adaptive evolution is not greater for male-biased genes than for female-biased or unbiased genes, which suggests that there are differences in sex-biased gene evolution between the two lineages.</p

Public Library of Science (PLOS)

Open Access LMU

MPG.PuRe

The Transcriptional Response of Drosophila melanogaster to Infection with the Sigma Virus (Rhabdoviridae)

Author: A Avila
A Fleuriet
AC Spradling
B Lemaitre
B Lemaitre
C Dostert
CW Tsai
D Contamine
D Galiana-Arnoux
DA Baker
DJ Obbard
DL Cox-Foster
E De Gregorio
E De Gregorio
E-D Ammar
EB Lewis
F Gnad
F Weber
F Wyers
Francis M. Jiggins
G Brun
GDD Hurst
J Bangham
J Bangham
J Rainer
J Reimand
J Townsend
JA Carpenter
Jennifer Carpenter
John F. Baines
John Parsch
JP Townsend
Julia Roller
K Bourtzis
K Roxstrom-Lindquist
KA McKean
M Boutros
P Flicek
RA Zambon
S Cherry
S Cherry
S Hutter
Sarah S. Saminadin-Peter
Stefan Bereswill
Stephan Hutter
Sánchez-Martínez Jesús Genaro
V Bischoff
XH Wang
Z Kambris
ZW Whitlow
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Bacterial and fungal infections induce a potent immune response in Drosophila melanogaster, but it is unclear whether viral infections induce an antiviral immune response. Using microarrays, we examined the changes in gene expression in Drosophila that occur in response to infection with the sigma virus, a negative-stranded RNA virus (Rhabdoviridae) that occurs in wild populations of D. melanogaster. We detected many changes in gene expression in infected flies, but found no evidence for the activation of the Toll, IMD or Jak-STAT pathways, which control immune responses against bacteria and fungi. We identified a number of functional categories of genes, including serine proteases, ribosomal proteins and chorion proteins that were overrepresented among the differentially expressed genes. We also found that the sigma virus alters the expression of many more genes in males than in females. These data suggest that either Drosophila do not mount an immune response against the sigma virus, or that the immune response is not controlled by known immune pathways. If the latter is true, the genes that we identified as differentially expressed after infection are promising candidates for controlling the host's response to the sigma virus

CiteSeerX

Edinburgh Research Explorer

Open Access LMU

MPG.PuRe

Gene duplication in an African cichlid adaptive radiation

Author: Christian RL Reilly
David H Lunt
Domino A Joyce
Ginger Jui
Heather E Machado
Suzy CP Renn
Publication venue: Springer Nature
Publication date: 26/02/2014
Field of study

BACKGROUND: Gene duplication is a source of evolutionary innovation and can contribute to the divergence of lineages; however, the relative importance of this process remains to be determined. The explosive divergence of the African cichlid adaptive radiations provides both a model for studying the general role of gene duplication in the divergence of lineages and also an exciting foray into the identification of genomic features that underlie the dramatic phenotypic and ecological diversification in this particular lineage. We present the first genome-wide study of gene duplication in African cichlid fishes, identifying gene duplicates in three species belonging to the Lake Malawi adaptive radiation (Metriaclima estherae, Protomelas similis, Rhamphochromis “chilingali”) and one closely related species from a non-radiated riverine lineage (Astatotilapia tweddlei). RESULTS: Using Astatotilapia burtoni as reference, microarray comparative genomic hybridization analysis of 5689 genes reveals 134 duplicated genes among the four cichlid species tested. Between 51 and 55 genes were identified as duplicated in each of the three species from the Lake Malawi radiation, representing a 38%–49% increase in number of duplicated genes relative to the non-radiated lineage (37 genes). Duplicated genes include several that are involved in immune response, ATP metabolism and detoxification. CONCLUSIONS: These results contribute to our understanding of the abundance and type of gene duplicates present in cichlid fish lineages. The duplicated genes identified in this study provide candidates for the analysis of functional relevance with regard to phenotype and divergence. Comparative sequence analysis of gene duplicates can address the role of positive selection and adaptive evolution by gene duplication, while further study across the phylogenetic range of cichlid radiations (and more generally in other adaptive radiations) will determine whether the patterns of gene duplication seen in this study consistently accompany rapid radiation

Use of genomic DNA control features and predicted operon structure in microarray data analysis: ArrayLeaRNA – a Bayesian approach

Author: A Dagkessamanskaia
C Lanczos
C Pin
Carmen Pin
EJ Alm
GK Smyth
I Lonnstedt
JL DeRisi
JP Townsend
JP Townsend
K Holmes
M Abramowitz
MA Newton
Mark Reuter
MF Anjum
MK Kerr
ML Mohedano
MN Price
MN Price
P Baldi
P Luu
PW Mielke
R Gottardo
RD Wolfinger
RJ Fox
S Eriksson
W Cleveland
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Microarrays are widely used for the study of gene expression; however deciding on whether observed differences in expression are significant remains a challenge. Results A computing tool (ArrayLeaRNA) has been developed for gene expression analysis. It implements a Bayesian approach which is based on the Gumbel distribution and uses printed genomic DNA control features for normalization and for estimation of the parameters of the Bayesian model and prior knowledge from predicted operon structure. The method is compared with two other approaches: the classical LOWESS normalization followed by a two fold cut-off criterion and the OpWise method (Price, et al. 2006. BMC Bioinformatics. 7, 19), a published Bayesian approach also using predicted operon structure. The three methods were compared on experimental datasets with prior knowledge of gene expression. With ArrayLeaRNA, data normalization is carried out according to the genomic features which reflect the results of equally transcribed genes; also the statistical significance of the difference in expression is based on the variability of the equally transcribed genes. The operon information helps the classification of genes with low confidence measurements. ArrayLeaRNA is implemented in Visual Basic and freely available as an Excel add-in at <url>http://www.ifr.ac.uk/safety/ArrayLeaRNA/</url> Conclusion We have introduced a novel Bayesian model and demonstrated that it is a robust method for analysing microarray expression profiles. ArrayLeaRNA showed a considerable improvement in data normalization, in the estimation of the experimental variability intrinsic to each hybridization and in the establishment of a clear boundary between non-changing and differentially expressed genes. The method is applicable to data derived from hybridizations of labelled cDNA samples as well as from hybridizations of labelled cDNA with genomic DNA and can be used for the analysis of datasets where differentially regulated genes predominate.</p

Using comparative genomic hybridization to survey genomic sequence divergence across species: a proof-of-concept from Drosophila

Author: Hofmann Hans A
Jones Albyn
Kulathinal Rob J
Machado Heather E
Renn Suzy CP
Soneji Kosha
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Genome-wide analysis of sequence divergence among species offers profound insights into the evolutionary processes that shape lineages. When full-genome sequencing is not feasible for a broad comparative study, we propose the use of array-based comparative genomic hybridization (aCGH) in order to identify orthologous genes with high sequence divergence. Here we discuss experimental design, statistical power, success rate, sources of variation and potential confounding factors. We used a spotted PCR product microarray platform from <it>Drosophila melanogaster </it>to assess sequence divergence on a gene-by-gene basis in three fully sequenced heterologous species (<it>D. sechellia</it>, <it>D. simulans</it>, and <it>D. yakuba</it>). Because complete genome assemblies are available for these species this study presents a powerful test for the use of aCGH as a tool to measure sequence divergence. Results We found a consistent and linear relationship between hybridization ratio and sequence divergence of the sample to the platform species. At higher levels of sequence divergence (< 92% sequence identity to <it>D. melanogaster</it>) ~84% of features had significantly less hybridization to the array in the heterologous species than the platform species, and thus could be identified as "diverged". At lower levels of divergence (≥ 97% identity), only 13% of genes were identified as diverged. While ~40% of the variation in hybridization ratio can be accounted for by variation in sequence identity of the heterologous sample relative to <it>D. melanogaster</it>, other individual characteristics of the DNA sequences, such as GC content, also contribute to variation in hybridization ratio, as does technical variation. Conclusions Here we demonstrate that aCGH can accurately be used as a proxy to estimate genome-wide divergence, thus providing an efficient way to evaluate how evolutionary processes and genomic architecture can shape species diversity in non-model systems. Given the increased number of species for which microarray platforms are available, comparative studies can be conducted for many interesting lineages in order to identify highly diverged genes that may be the target of natural selection.</p

Boston University Institutional Repository (OpenBU)