8 research outputs found

    Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries.</p> <p>Results</p> <p>We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter.</p> <p>Conclusion</p> <p>Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used.</p

    Correlation of the EST expression matrix with individual EST libraries from related tissues.

    No full text
    <p>Pearson product-moment correlation coefficients (vertical axes) calculated for each of the individual EST libraries and the EST expression matrix (Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s005" target="_blank">Dataset S4</a>). <b>A:</b> Brain EST libraries, these include one cerebellum and one cerebrum EST libraries. Assumed mixed tissue brain library showing positive correlation with pituitary gland is “NIH_MGC_181”. <b>B:</b> Peripheral nervous system libraries showing a degree of positive correlation with brain libraries. <b>C:</b> Heart libraries showing a degree of positive correlation with muscle libraries. <b>D:</b> Muscle libraries showing a degree of positive correlation with heart libraries. See Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s006" target="_blank">Dataset S5</a>. for the libraries' IDs.</p

    Correlation of the EST matrix with individual libraries from matching tissues showing no inter-tissue correlation.

    No full text
    <p>Pearson product-moment correlation coefficients (vertical axes) calculated for each of the individual EST libraries and the EST expression matrix (Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s005" target="_blank">Dataset S4</a>). <b>A:</b> Placental libraries. <b>B:</b> Lung libraries. <b>C:</b> Pancreatic libraries. <b>D:</b> Retinal libraries. <b>E:</b> Testis libraries. See Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s006" target="_blank">Dataset S5</a>. for the libraries' IDs.</p

    Intra-tissue and inter-tissue correlations.

    No full text
    <p>Correlation coefficients calculated for all of the 113 EST libraries (Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s006" target="_blank">Dataset S5</a>) against our EST expression matrix (Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s005" target="_blank">Dataset S4</a>). The data also include the tissues detailed previously in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone-0032966-g001" target="_blank">Figures 1</a>–<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone-0032966-g003" target="_blank">3</a>. <b>A:</b> Positive correlations between all expected matching libraries, e.g. all individual “Adipose” libraries vs. the “Adipose” expression matrix (Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s005" target="_blank">Dataset S4</a>) etc. Correlation value of “1” is for tissues where only one EST library was available. <b>B:</b> Correlations for all expected non-matching libraries, e.g. all “Adipose” libraries available vs. all but the “Adipose” expression arrays from our EST matrix (Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s005" target="_blank">Dataset S4</a>) etc. The presumed mixed tissue brain library “NIH_MGC_181” was excluded from calculations. <b>C:</b> Correlations for all expected related tissues, e.g. all individual “Bain” libraries available vs. the “Peripheral nervous system” expression matrix, etc. <b>D:</b> All expected positive correlations from all matching libraries as in panel A (left box plot). Correlations from all related tissues as in panel B (middle box plot). All expected correlations from non-matching tissues, as in panel C (right). In all panels the boxes are drawn from the first to third quartiles. Plots also show minimum value, median (thick line) and the maximum correlation values recorded.</p

    Correlation of the EST expression matrix with tissues with one or two libraries were available.

    No full text
    <p>Pearson product-moment correlation coefficients (vertical axes) calculated for each of the individual EST libraries and the EST expression matrix (Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s005" target="_blank">Dataset S4</a>). <b>A:</b> “Soares_pineal_gland_N3HPG” library (dark bars), “Pineal gland II” (lighter bars). <b>B:</b> “Small intestine I” EST library. <b>C:</b> “NCI_CGAP_Br7” library from mammary gland. <b>D:</b> “Thyroid” EST library.</p

    Correlation of the EST expression matrix with normalised EST libraries.

    No full text
    <p>Pearson correlation coefficients (vertical axes) calculated between the individual normalised EST libraries, two model libraries and the EST expression matrix (Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s005" target="_blank">Dataset S4</a>). <b>A:</b> Normalised placenta library NIH_MGC_148. <b>B and C:</b> normalised lung libraries “UI-CF-EC1” and “UI-CF-FN0” respectively. <b>D:</b> Normalised thymus library Soares_thymus_NHFTh. <b>E:</b> Artificial “normalised” EST matrix where all the expression levels are set to “1” (shown in blue). <b>F:</b> Artificial “random” EST matrix where all the expression levels are randomly assigned (shown in red).</p

    Correlation of the EST matrix with individual libraries from uncharacterised or poorly defined tissue preparations.

    No full text
    <p>Pearson correlation coefficients (vertical axes) calculated between the individual EST libraries and the EST expression matrix (Supplementary <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0032966#pone.0032966.s005" target="_blank">Dataset S4</a>). <b>A:</b> “Uncharacterised” library NCI_CGAP_HN5 derived from gum tissue. <b>B:</b> “Uncharacterised” Stratagene endothelial cell 937223 library. <b>C and D:</b> pooled libraries NIH_MGC_184 and NCI_CGAP_HN20 respectively. <b>E:</b> “Embryo, 8 week I” library. <b>F:</b> “Embryo, 12 week II” library.</p

    The Use of EST Expression Matrixes for the Quality Control of Gene Expression Data

    Get PDF
    EST expression profiling provides an attractive tool for studying differential gene expression, but cDNA libraries' origins and EST data quality are not always known or reported. Libraries may originate from pooled or mixed tissues; EST clustering, EST counts, library annotations and analysis algorithms may contain errors. Traditional data analysis methods, including research into tissue-specific gene expression, assume EST counts to be correct and libraries to be correctly annotated, which is not always the case. Therefore, a method capable of assessing the quality of expression data based on that data alone would be invaluable for assessing the quality of EST data and determining their suitability for mRNA expression analysis. Here we report an approach to the selection of a small generic subset of 244 UniGene clusters suitable for identification of the tissue of origin for EST libraries and quality control of the expression data using EST expression information alone. We created a small expression matrix of UniGene IDs using two rounds of selection followed by two rounds of optimisation. Our selection procedures differ from traditional approaches to finding "tissue-specific" genes and our matrix yields consistency high positive correlation values for libraries with confirmed tissues of origin and can be applied for tissue typing and quality control of libraries as small as just a few hundred total ESTs. Furthermore, we can pick up tissue correlations between related tissues e.g. brain and peripheral nervous tissue, heart and muscle tissues and identify tissue origins for a few libraries of uncharacterised tissue identity. It was possible to confirm tissue identity for some libraries which have been derived from cancer tissues or have been normalised. Tissue matching is affected strongly by cancer progression or library normalisation and our approach may potentially be applied for elucidating the stage of normalisation in normalised libraries or for cancer staging
    corecore