4,853 research outputs found

    A multivariate prediction model for microarray cross-hybridization

    Get PDF
    BACKGROUND: Expression microarray analysis is one of the most popular molecular diagnostic techniques in the post-genomic era. However, this technique faces the fundamental problem of potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate predictive modeling has been performed to understand how multiple variables contribute to (cross-) hybridization. RESULTS: We propose a systematic search strategy using multiple multivariate models [multiple linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an effective set of predictors for hybridization. We validate this approach on a set of DNA microarrays with cytochrome p450 family genes. The performance of our multiple multivariate models is compared with that of a recently proposed third-order polynomial regression method that uses percent identity as the sole predictor. All multivariate models agree that the 'most contiguous base pairs between probe and target sequences,' rather than percent identity, is the best univariate predictor. The predictive power is improved by inclusion of additional nonlinear effects, in particular target GC content, when regression trees or ANNs are used. CONCLUSION: A systematic multivariate approach is provided to assess the importance of multiple sequence features for hybridization and of relationships among these features. This approach can easily be applied to larger datasets. This will allow future developments of generalized hybridization models that will be able to correct for false-positive cross-hybridization signals in expression experiments

    A multivariate prediction model for microarray cross-hybridization

    Get PDF
    © 2006 Chen et al; licensee BioMed Central Ltd.The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/7/101doi:10.1186/1471-2105-7-101Background: Expression microarray analysis is one of the most popular molecular diagnostic techniques in the post-genomic era. However, this technique faces the fundamental problem of potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate predictive modeling has been performed to understand how multiple variables contribute to (cross-) hybridization. Results: We propose a systematic search strategy using multiple multivariate models [multiple linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an effective set of predictors for hybridization. We validate this approach on a set of DNA microarrays with cytochrome p450 family genes. The performance of our multiple multivariate models is compared with that of a recently proposed third-order polynomial regression method that uses percent identity as the sole predictor. All multivariate models agree that the 'most contiguous base pairs between probe and target sequences,' rather than percent identity, is the best univariate predictor. The predictive power is improved by inclusion of additional nonlinear effects, in particular target GC content, when regression trees or ANNs are used. Conclusion: A systematic multivariate approach is provided to assess the importance of multiple sequence features for hybridization and of relationships among these features. This approach can easily be applied to larger datasets. This will allow future developments of generalized hybridization models that will be able to correct for false-positive cross-hybridization signals in expression experiments

    Diverse correlation structures in gene expression data and their utility in improving statistical inference

    Full text link
    It is well known that correlations in microarray data represent a serious nuisance deteriorating the performance of gene selection procedures. This paper is intended to demonstrate that the correlation structure of microarray data provides a rich source of useful information. We discuss distinct correlation substructures revealed in microarray gene expression data by an appropriate ordering of genes. These substructures include stochastic proportionality of expression signals in a large percentage of all gene pairs, negative correlations hidden in ordered gene triples, and a long sequence of weakly dependent random variables associated with ordered pairs of genes. The reported striking regularities are of general biological interest and they also have far-reaching implications for theory and practice of statistical methods of microarray data analysis. We illustrate the latter point with a method for testing differential expression of nonoverlapping gene pairs. While designed for testing a different null hypothesis, this method provides an order of magnitude more accurate control of type 1 error rate compared to conventional methods of individual gene expression profiling. In addition, this method is robust to the technical noise. Quantitative inference of the correlation structure has the potential to extend the analysis of microarray data far beyond currently practiced methods.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS120 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Multivariate classification of gene expression microarray data

    Get PDF
    L'expressiódels gens obtinguts de l'anàliside microarrays s'utilitza en molts casos, per classificar les cèllules. En aquestatesi, unaversióprobabilística del mètodeDiscriminant Partial Least Squares (p-DPLS)s'utilitza per classificar les mostres de les expressions delsseus gens. p-DPLS esbasa en la regla de Bayes de la probabilitat a posteriori. Aquestsclassificadorssónforaçats a classficarsempre.Per superaraquestalimitaciós'haimplementatl'opció de rebuig.Aquestaopciópermetrebutjarlesmostresamb alt riscd'errors de classificació (és a dir, mostresambigüesi outliers).Aquestaopció de rebuigcombinacriterisbasats en els residuals x, el leverage ielsvalorspredits. A més,esdesenvolupa un mètode de selecció de variables per triarels gens mésrellevants, jaque la majoriadels gens analitzatsamb un microarraysónirrellevants per al propòsit particular de classificacióI podenconfondre el classificador. Finalment, el DPLSs'estenen a la classificació multi-classemitjançant la combinació de PLS ambl'anàlisidiscriminant lineal

    Open-target sparse sensing of biological agents using DNA microarray

    Get PDF
    Background Current biosensors are designed to target and react to specific nucleic acid sequences or structural epitopes. These 'target-specific' platforms require creation of new physical capture reagents when new organisms are targeted. An 'open-target' approach to DNA microarray biosensing is proposed and substantiated using laboratory generated data. The microarray consisted of 12,900 25 bp oligonucleotide capture probes derived from a statistical model trained on randomly selected genomic segments of pathogenic prokaryotic organisms. Open-target detection of organisms was accomplished using a reference library of hybridization patterns for three test organisms whose DNA sequences were not included in the design of the microarray probes. Results A multivariate mathematical model based on the partial least squares regression (PLSR) was developed to detect the presence of three test organisms in mixed samples. When all 12,900 probes were used, the model correctly detected the signature of three test organisms in all mixed samples (mean(R2)) = 0.76, CI = 0.95), with a 6% false positive rate. A sampling algorithm was then developed to sparsely sample the probe space for a minimal number of probes required to capture the hybridization imprints of the test organisms. The PLSR detection model was capable of correctly identifying the presence of the three test organisms in all mixed samples using only 47 probes (mean(R2)) = 0.77, CI = 0.95) with nearly 100% specificity. Conclusions We conceived an 'open-target' approach to biosensing, and hypothesized that a relatively small, non-specifically designed, DNA microarray is capable of identifying the presence of multiple organisms in mixed samples. Coupled with a mathematical model applied to laboratory generated data, and sparse sampling of capture probes, the prototype microarray platform was able to capture the signature of each organism in all mixed samples with high sensitivity and specificity. It was demonstrated that this new approach to biosensing closely follows the principles of sparse sensing.Mitre Corporatio

    Computational Models for Transplant Biomarker Discovery.

    Get PDF
    Translational medicine offers a rich promise for improved diagnostics and drug discovery for biomedical research in the field of transplantation, where continued unmet diagnostic and therapeutic needs persist. Current advent of genomics and proteomics profiling called "omics" provides new resources to develop novel biomarkers for clinical routine. Establishing such a marker system heavily depends on appropriate applications of computational algorithms and software, which are basically based on mathematical theories and models. Understanding these theories would help to apply appropriate algorithms to ensure biomarker systems successful. Here, we review the key advances in theories and mathematical models relevant to transplant biomarker developments. Advantages and limitations inherent inside these models are discussed. The principles of key -computational approaches for selecting efficiently the best subset of biomarkers from high--dimensional omics data are highlighted. Prediction models are also introduced, and the integration of multi-microarray data is also discussed. Appreciating these key advances would help to accelerate the development of clinically reliable biomarker systems

    DNA polymerase B deficiency is linked to aggressive breast cancer: a comprehensive analysis of gene copy number, mRNA and protein expression in multiple cohorts

    Get PDF
    Short arm of chromosome 8 is a hot spot for chromosomal breaks, losses and amplifications in breast cancer. Although such genetic changes may have phenotypic consequences, the identity of candidate gene(s) remains to be clearly defined. Pol β gene is localized to chromosome 8p12 - p11 and encodes a key DNA base excision repair protein. Pol β may be a tumour suppressor and involved in breast cancer pathogenesis. We conducted the first and the largest study to comprehensively evaluate pol β in breast cancer. We investigated pol β gene copy number changes in two cohorts (n=128 & n=1952), pol β mRNA expression in two cohorts (n=249 & n=1952) and pol β protein expression in two cohorts (n=1406 & n=252). Artificial neural network analysis for pol β interacting genes was performed in 249 tumours. For mechanistic insights, pol β gene copy number changes, mRNA and protein levels were investigated together in 1 28 tumours and validated in 1952 tumours. Low pol β mRNA expression as well as low pol β protein expression was associated high grade, lymph node positivity, pleomorphism, triple negative, basal - like phenotypes and poor survival (ps<0.001). In oestrogen receptor (ER) positive sub - group that received tamoxifen, low pol β protein remains associated with aggressive phenotype and poor survival (ps<0.001). Artificial neural network analysis revealed ER as a top pol β interacting gene. Mechanistically, there was strong positive correlation between pol β gene copy number changes and pol β mRNA expression (p<0.0000001) and between pol β mRNA and pol β protein expression (p<0.0000001). This is the first study to provide evidence that pol β deficiency is linked to aggressive breast cancer and may have prognostic and predictive significance in patients
    corecore