10 research outputs found

    Gene expression reliability estimation through cluster-based analysis

    Get PDF
    Gene expression is the fundamental control of the structure and functions of the cellular versatility and adaptability of any organisms. The measurement of gene expressions is performed on images generated by optical inspection of microarray devices which allow the simultaneous analysis of thousands of genes. The images produced by these devices are used to calculate the expression levels of mRNA in order to draw diagnostic information related to human disease. The quality measures are mandatory in genes classification and in the decision-making diagnostic. However, microarrays are characterized by imperfections due to sample contaminations, scratches, precipitation or imperfect gridding and spot detection. The automatic and efficient quality measurement of microarray is needed in order to discriminate faulty gene expression levels. In this paper we present a new method for estimate the quality degree and the data's reliability of a microarray analysis. The efficiency of the proposed approach in terms of genes expression classification has been demonstrated through a clustering supervised analysis performed on a set of three different histological samples related to the Lymphoma's cancer diseas

    Segmentation and intensity estimation for microarray images with saturated pixels

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray image analysis processes scanned digital images of hybridized arrays to produce the input spot-level data for downstream analysis, so it can have a potentially large impact on those and subsequent analysis. Signal saturation is an optical effect that occurs when some pixel values for highly expressed genes or peptides exceed the upper detection threshold of the scanner software (2<sup>16 </sup>- 1 = 65, 535 for 16-bit images). In practice, spots with a sizable number of saturated pixels are often flagged and discarded. Alternatively, the saturated values are used without adjustments for estimating spot intensities. The resulting expression data tend to be biased downwards and can distort high-level analysis that relies on these data. Hence, it is crucial to effectively correct for signal saturation.</p> <p>Results</p> <p>We developed a flexible mixture model-based segmentation and spot intensity estimation procedure that accounts for saturated pixels by incorporating a censored component in the mixture model. As demonstrated with biological data and simulation, our method extends the dynamic range of expression data beyond the saturation threshold and is effective in correcting saturation-induced bias when the lost information is not tremendous. We further illustrate the impact of image processing on downstream classification, showing that the proposed method can increase diagnostic accuracy using data from a lymphoma cancer diagnosis study.</p> <p>Conclusions</p> <p>The presented method adjusts for signal saturation at the segmentation stage that identifies a pixel as part of the foreground, background or other. The cluster membership of a pixel can be altered versus treating saturated values as truly observed. Thus, the resulting spot intensity estimates may be more accurate than those obtained from existing methods that correct for saturation based on already segmented data. As a model-based segmentation method, our procedure is able to identify inner holes, fuzzy edges and blank spots that are common in microarray images. The approach is independent of microarray platform and applicable to both single- and dual-channel microarrays.</p

    Identification and correction of previously unreported spatial phenomena using raw Illumina BeadArray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A key stage for all microarray analyses is the extraction of feature-intensities from an image. If this step goes wrong, then subsequent preprocessing and processing stages will stand little chance of rectifying the matter. Illumina employ random construction of their BeadArrays, making feature-intensity extraction even more important for the Illumina platform than for other technologies. In this paper we show that using raw Illumina data it is possible to identify, control, and perhaps correct for a range of spatial-related phenomena that affect feature-intensity extraction.</p> <p>Results</p> <p>We note that feature intensities can be unnaturally high when in the proximity of a number of phenomena relating either to the images themselves or to the layout of the beads on an array. Additionally we note that beads neighbour beads of the same type more often than one might expect, which may cause concern in some models of hybridization. We highlight issues in the identification of a bead's location, and in particular how this both affects and is affected by its intensity. Finally we show that beads can be wrongly identified in the image on either a local or array-wide scale, with obvious implications for data quality.</p> <p>Conclusions</p> <p>The image processing issues identified will often pass unnoticed by an analysis of the standard data returned from an experiment. We detail some simple diagnostics that can be implemented to identify problems of this nature, and outline approaches to correcting for such problems. These approaches require access to the raw data from the arrays, not just the summarized data usually returned, making the acquisition of such raw data highly desirable.</p

    A robust measure of correlation between two genes on a microarray

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The underlying goal of microarray experiments is to identify gene expression patterns across different experimental conditions. Genes that are contained in a particular pathway or that respond similarly to experimental conditions could be co-expressed and show similar patterns of expression on a microarray. Using any of a variety of clustering methods or gene network analyses we can partition genes of interest into groups, clusters, or modules based on measures of similarity. Typically, Pearson correlation is used to measure distance (or similarity) before implementing a clustering algorithm. Pearson correlation is quite susceptible to outliers, however, an unfortunate characteristic when dealing with microarray data (well known to be typically quite noisy.)</p> <p>Results</p> <p>We propose a resistant similarity metric based on Tukey's biweight estimate of multivariate scale and location. The resistant metric is simply the correlation obtained from a resistant covariance matrix of scale. We give results which demonstrate that our correlation metric is much more resistant than the Pearson correlation while being more efficient than other nonparametric measures of correlation (e.g., Spearman correlation.) Additionally, our method gives a systematic gene flagging procedure which is useful when dealing with large amounts of noisy data.</p> <p>Conclusion</p> <p>When dealing with microarray data, which are known to be quite noisy, robust methods should be used. Specifically, robust distances, including the biweight correlation, should be used in clustering and gene network analysis.</p

    Robust Microarray Image Processing

    Get PDF

    Integrating bioinformatics and physiology to describe genetic effects in complex polygenic diseases

    Get PDF
    Type 2 diabetes mellitus (T2DM) results from interaction between genetic and environmental factors. The worldwide prevalence of T2DM is increasing rapidly due to reduction in physical activity, increase in dietary intake, and the aging of the population. This thesis has focused on dissecting the genetic contribution in T2DM using largescale genomic approaches with a particular emphasis on analysis of gene transcripts in different tissues, predominantly muscle. In paper I, we identified TXNIP as a gene whose expression is powerfully suppressed by insulin yet stimulated by glucose. In healthy individuals, its expression was inversely correlated to total body measures of glucose uptake. Forced expression of TXNIP in cultured adipocytes significantly reduced glucose uptake, while silencing with RNA interference in adipocytes and in skeletal muscle enhanced glucose uptake, confirming that the gene product is also a regulator of glucose uptake. TXNIP expression is consistently elevated in the muscle of pre-diabetics and diabetics, although in a panel of 4,450 Scandinavian individuals, we found no evidence for association between common genetic variation in the TXNIP gene and T2DM. TXNIP regulates both insulindependent and insulin-independent pathways of glucose uptake in human skeletal muscle. Combined with recent studies that have implicated TXNIP in pancreatic ÎČ-cell glucose toxicity, our data suggest that TXNIP might play a key role in defective glucose homeostasis preceding overt T2DM. In paper II, we investigated molecular mechanisms associated with insulin sensitivity in skeletal muscle by relating global skeletal muscle gene expression to physiological measures of the insulin sensitivity. We identified 70 genes positively and 110 genes inversely correlated with insulin sensitivity in human skeletal muscle. Most notably, genes involved in a mammalian target-of-rapamycin signaling pathway were positively whereas genes encoding extracellular matrix structural constituent such as extracellular matrix-receptor, cell communication, and focal adhesion pathways were inversely correlated with insulin sensitivity. More specifically, expression of CPT1B was positively and that of LEO1 inversely correlated with insulin sensitivity, a finding which was replicated in an independent study of 9 non-diabetic men. These data suggest that a high capacity of fat oxidation in mitochondria is reflected by a high expression of CPT1B which is a marker of insulin sensitivity. In paper III, we investigated molecular mechanisms associated with maximal oxygen uptake (VO2max) and type 1 fibers in human skeletal muscle. We identified 66 genes positively and 83 genes inversely correlated with VO2max and 171 genes positively and 217 genes inversely correlated with percentage of type 1 fibers in human skeletal muscle. Genes involved in oxidative phosphorylation (OXPHOS) showed high expression in individuals with high VO2max, whereas the opposite was not the case in individuals with low VO2max. Instead, genes such as AHNAK and BCL6 were associated with low VO2max. Also, expression of the OXPHOS genes, NDUFB5 and ATP5C1, increased with exercise training and decreased with aging. In contrast, expression of AHNAK in skeletal muscle decreased with exercise training and increased with aging. These findings indicate that VO2max closely reflects expression of OXPHOS genes, particularly that of NDUFB5 and ATP5C1 in skeletal muscle and high expression of these genes suggest good muscle fitness. In contrast, a high expression of AHNAK was associated with a low VO2max and poor muscle fitness. In paper IV, we combined results from the Diabetes Genetics Initiative (DGI) and the Wellcome Trust Case Control Consortium (WTCCC) genome-wide association (GWA) studies with genome-wide expression profiling in pancreas, adipose tissue, liver, and skeletal muscle in patients with or without T2DM or animal models thereof to identify novel T2DM susceptibility loci. We identified 453 single nucleotide polymorphisms (SNPs) associated with T2DM with P < 0.01 in at least one of the GWA studies and 150 genes that were located in vicinity of these SNPs. Out of these 150 genes, we identified 41 genes differentially expressed using publicly available gene expression profiling data. Most notably, we were able to identify four genes namely IGF2BP2, CDKAL1, TSPAN8, and NOTCH2 for which SNPs located in vicinity of these genes have shown association with T2DM in different populations. In addition, we identified a SNP (rs27582) in the CAST gene which was associated with future risk of T2DM (odds ratio (OR) = 1.10, 95% CI: 1.00-1.20, P < 0.05) in a prospective study of 16,061 Swedish individuals followed for more than 25 years; this association was stronger in lean individuals (OR = 1.19, 95% CI: 1.03-1.36, P = 0.024). Moreover in the Botnia Prospective Study (BPS) involving 2,770 individuals followed for more than 7 years, carriers of the A-allele were more insulin resistant than carriers of the G-allele as indicated by higher fasting insulin concentrations (regression coefficient (ÎČ) = 0.048, P = 0.017) and higher HOMA-IR index (ÎČ = 0.044, P = 0.025) as well as lower insulin sensitivity index during OGTT (ÎČ = -0.039, P = 0.039) at follow-up. In conclusion, using gene expression in different tissues from patients with T2DM and animal models is a powerful tool for prioritizing SNPs from GWA studies for replication studies. We thereby identified association of a variant (rs27582) in the CAST gene with T2DM and insulin resistance

    Combinatorial image analysis of DNA microarray features

    No full text
    Motivation: DNA and protein microarrays have become an established leading-edge technology for large-scale analysis of gene and protein content and activity. Contact-printed microarrays has emerged as a relatively simple and cost effective method of choice but its reliability is especially susceptible to quality of pixel information obtained from digital scans of spotted features in the microarray image. Results: We address the statistical computation requirements for optimizing data acquisition and processing of digital scans. We consider the use of median filters to reduce noise levels in images and top-hat filters to correct for trends in background values. We also consider, as alternative estimators of spot intensity, discs of fixed radius, proportions of histograms and k-means clustering, either with or without a square-root intensity transformation and background subtraction. We identify, using combinatoric procedures, optimal filter and estimator parameters, in achieving consistency among the replicates of a gene on each microarray. Our results, using test data from microarrays of HCMV, indicate that a highly effective approach for improving reliability and quality of microarray data is to apply a 21 by 21 top-hat filter, then estimate spot intensity as the mean of the largest 20% of pixel values in the target region, after a square-root transformation, and corrected for background, by subtracting the mean of the smallest 70% of pixel values

    Combinatorial image analysis of DNA microarray features

    No full text