102 research outputs found

    Biasogram: visualization of confounding technical bias in gene expression data.

    Get PDF
    Gene expression profiles of clinical cohorts can be used to identify genes that are correlated with a clinical variable of interest such as patient outcome or response to a particular drug. However, expression measurements are susceptible to technical bias caused by variation in extraneous factors such as RNA quality and array hybridization conditions. If such technical bias is correlated with the clinical variable of interest, the likelihood of identifying false positive genes is increased. Here we describe a method to visualize an expression matrix as a projection of all genes onto a plane defined by a clinical variable and a technical nuisance variable. The resulting plot indicates the extent to which each gene is correlated with the clinical variable or the technical variable. We demonstrate this method by applying it to three clinical trial microarray data sets, one of which identified genes that may have been driven by a confounding technical variable. This approach can be used as a quality control step to identify data sets that are likely to yield false positive results

    Clustering-based approaches to SAGE data mining

    Get PDF
    Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation

    Methodological Deficits in Diagnostic Research Using ‘-Omics’ Technologies: Evaluation of the QUADOMICS Tool and Quality of Recently Published Studies

    Get PDF
    Background: QUADOMICS is an adaptation of QUADAS (a quality assessment tool for use in systematic reviews of diagnostic accuracy studies), which takes into account the particular challenges presented by '-omics' based technologies. Our primary objective was to evaluate the applicability and consistency of QUADOMICS. Subsequently we evaluated and describe the methodological quality of a sample of recently published studies using the tool. Methodology/Principal Findings: 45'-omics'- based diagnostic studies were identified by systematic search of Pubmed using suitable MeSH terms (>Genomics>, >Sensitivity and specificity>, >Diagnosis>). Three investigators independently assessed the quality of the articles using QUADOMICS and met to compare observations and generate a consensus. Consistency and applicability was assessed by comparing each reviewer's original rating with the consensus. Methodological quality was described using the consensus rating. Agreement was above 80% for all three reviewers. Four items presented difficulties with application, mostly due to the lack of a clearly defined gold standard. Methodological quality of our sample was poor; studies met roughly half of the applied criteria (mean ± sd, 54.7±18.4°%). Few studies were carried out in a population that mirrored the clinical situation in which the test would be used in practice, (6, 13.3%);none described patient recruitment sufficiently; and less than half described clinical and physiological factors that might influence the biomarker profile (20, 44.4%). Conclusions: The QUADOMICS tool can consistently be applied to diagnostic '-omics' studies presently published in biomedical journals. A substantial proportion of reports in this research field fail to address design issues that are fundamental to make inferences relevant for patient care. © 2010 Parker et al.This work was supported by the Spanish Agency for Health Technology Assessment, Exp PI06/90311, Instituto de Salud Carlos III and CIBER en Epidemiología y Salud Pública (CIBERESP) in SpainPeer Reviewe

    Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry for biological data analysis is an active field of research, providing an efficient way of high-throughput proteome screening. A popular variant of mass spectrometry is SELDI, which is often used to measure sample populations with the goal of developing (clinical) classifiers. Unfortunately, not only is the data resulting from such measurements quite noisy, variance between replicate measurements of the same sample can be high as well. Normalisation of spectra can greatly reduce the effect of this technical variance and further improve the quality and interpretability of the data. However, it is unclear which normalisation method yields the most informative result.</p> <p>Results</p> <p>In this paper, we describe the first systematic comparison of a wide range of normalisation methods, using two objectives that should be met by a good method. These objectives are minimisation of inter-spectra variance and maximisation of signal with respect to class separation. The former is assessed using an estimation of the coefficient of variation, the latter using the classification performance of three types of classifiers on real-world datasets representing two-class diagnostic problems. To obtain a maximally robust evaluation of a normalisation method, both objectives are evaluated over multiple datasets and multiple configurations of baseline correction and peak detection methods. Results are assessed for statistical significance and visualised to reveal the performance of each normalisation method, in particular with respect to using no normalisation. The normalisation methods described have been implemented in the freely available MASDA R-package.</p> <p>Conclusion</p> <p>In the general case, normalisation of mass spectra is beneficial to the quality of data. The majority of methods we compared performed significantly better than the case in which no normalisation was used. We have shown that normalisation methods that scale spectra by a factor based on the dispersion (e.g., standard deviation) of the data clearly outperform those where a factor based on the central location (e.g., mean) is used. Additional improvements in performance are obtained when these factors are estimated locally, using a sliding window within spectra, instead of globally, over full spectra. The underperforming category of methods using a globally estimated factor based on the central location of the data includes the method used by the majority of SELDI users.</p

    An integrative multi-platform analysis for discovering biomarkers of osteosarcoma

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>SELDI-TOF-MS (Surface Enhanced Laser Desorption/Ionization-Time of Flight-Mass Spectrometry) has become an attractive approach for cancer biomarker discovery due to its ability to resolve low mass proteins and high-throughput capability. However, the analytes from mass spectrometry are described only by their mass-to-charge ratio (<it>m</it>/<it>z</it>) values without further identification and annotation. To discover potential biomarkers for early diagnosis of osteosarcoma, we designed an integrative workflow combining data sets from both SELDI-TOF-MS and gene microarray analysis.</p> <p>Methods</p> <p>After extracting the information for potential biomarkers from SELDI data and microarray analysis, their associations were further inferred by link-test to identify biomarkers that could likely be used for diagnosis. Immuno-blot analysis was then performed to examine whether the expression of the putative biomarkers were indeed altered in serum from patients with osteosarcoma.</p> <p>Results</p> <p>Six differentially expressed protein peaks with strong statistical significances were detected by SELDI-TOF-MS. Four of the proteins were up-regulated and two of them were down-regulated. Microarray analysis showed that, compared with an osteoblastic cell line, the expression of 653 genes was changed more than 2 folds in three osteosarcoma cell lines. While expression of 310 genes was increased, expression of the other 343 genes was decreased. The two sets of biomarkers candidates were combined by the link-test statistics, indicating that 13 genes were potential biomarkers for early diagnosis of osteosarcoma. Among these genes, cytochrome c1 (CYC-1) was selected for further experimental validation.</p> <p>Conclusion</p> <p>Link-test on datasets from both SELDI-TOF-MS and microarray high-throughput analysis can accelerate the identification of tumor biomarkers. The result confirmed that CYC-1 may be a promising biomarker for early diagnosis of osteosarcoma.</p

    Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome.</p> <p>Results</p> <p>Using the yeast <it>Saccharomyces cerevisiae </it>as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA.</p> <p>Conclusion</p> <p>With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process.</p

    Normalization in MALDI-TOF imaging datasets of proteins: practical considerations

    Get PDF
    Normalization is critically important for the proper interpretation of matrix-assisted laser desorption/ionization (MALDI) imaging datasets. The effects of the commonly used normalization techniques based on total ion count (TIC) or vector norm normalization are significant, and they are frequently beneficial. In certain cases, however, these normalization algorithms may produce misleading results and possibly lead to wrong conclusions, e.g. regarding to potential biomarker distributions. This is typical for tissues in which signals of prominent abundance are present in confined areas, such as insulin in the pancreas or β-amyloid peptides in the brain. In this work, we investigated whether normalization can be improved if dominant signals are excluded from the calculation. Because manual interaction with the data (e.g., defining the abundant signals) is not desired for routine analysis, we investigated two alternatives: normalization on the spectra noise level or on the median of signal intensities in the spectrum. Normalization on the median and the noise level was found to be significantly more robust against artifact generation compared to normalization on the TIC. Therefore, we propose to include these normalization methods in the standard “toolbox” of MALDI imaging for reliable results under conditions of automation

    Changes in the serum proteome associated with the development of hepatocellular carcinoma in hepatitis C-related cirrhosis

    Get PDF
    Early diagnosis of hepatocellular carcinoma (HCC) is the key to the delivery of effective therapies. The conventional serological diagnostic test, estimation of serum alpha-fetoprotein (AFP) lacks both sensitivity and specificity as a screening tool and improved tests are needed to complement ultrasound scanning, the major modality for surveillance of groups at high risk of HCC. We have analysed the serum proteome of 182 patients with hepatitis C-induced liver cirrhosis (77 with HCC) by surface-enhanced laser desorption/ionisation time-of-flight mass spectrometry (SELDI). The patients were split into a training set (84 non-HCC, 60 HCC) and a ‘blind' test set (21 non-HCC, 17 HCC). Neural networks developed on the training set were able to classify the blind test set with 94% sensitivity (95% CI 73–99%) and 86% specificity (95% CI 65–95%). Two of the SELDI peaks (23/23.5 kDa) were elevated by an average of 50% in the serum of HCC patients (P<0.001) and were identified as κ and λ immunoglobulin light chains. This approach may permit identification of several individual proteins, which, in combination, may offer a novel way to diagnose HCC
    corecore