112 research outputs found

    Improving the scaling normalization for high-density oligonucleotide GeneChip expression microarrays

    Get PDF
    BACKGROUND: Normalization is an important step for microarray data analysis to minimize biological and technical variations. Choosing a suitable approach can be critical. The default method in GeneChip expression microarray uses a constant factor, the scaling factor (SF), for every gene on an array. The SF is obtained from a trimmed average signal of the array after excluding the 2% of the probe sets with the highest and the lowest values. RESULTS: Among the 76 U34A GeneChip experiments, the total signals on each array showed 25.8% variations in terms of the coefficient of variation, although all microarrays were hybridized with the same amount of biotin-labeled cRNA. The 2% of the probe sets with the highest signals that were normally excluded from SF calculation accounted for 34% to 54% of the total signals (40.7% ± 4.4%, mean ± sd). In comparison with normalization factors obtained from the median signal or from the mean of the log transformed signal, SF showed the greatest variation. The normalization factors obtained from log transformed signals showed least variation. CONCLUSIONS: Eliminating 40% of the signal data during SF calculation failed to show any benefit. Normalization factors obtained with log transformed signals performed the best. Thus, it is suggested to use the mean of the logarithm transformed data for normalization, rather than the arithmetic mean of signals in GeneChip gene expression microarrays

    A comparative analysis of transcription factor expression during metazoan embryonic development

    Get PDF
    During embryonic development, a complex organism is formed from a single starting cell. These processes of growth and differentiation are driven by large transcriptional changes, which are following the expression and activity of transcription factors (TFs). This study sought to compare TF expression during embryonic development in a diverse group of metazoan animals: representatives of vertebrates (Danio rerio, Xenopus tropicalis), a chordate (Ciona intestinalis) and invertebrate phyla such as insects (Drosophila melanogaster, Anopheles gambiae) and nematodes (Caenorhabditis elegans) were sampled, The different species showed overall very similar TF expression patterns, with TF expression increasing during the initial stages of development. C2H2 zinc finger TFs were over-represented and Homeobox TFs were under-represented in the early stages in all species. We further clustered TFs for each species based on their quantitative temporal expression profiles. This showed very similar TF expression trends in development in vertebrate and insect species. However, analysis of the expression of orthologous pairs between more closely related species showed that expression of most individual TFs is not conserved, following the general model of duplication and diversification. The degree of similarity between TF expression between Xenopus tropicalis and Danio rerio followed the hourglass model, with the greatest similarity occuring during the early tailbud stage in Xenopus tropicalis and the late segmentation stage in Danio rerio. However, for Drosophila melanogaster and Anopheles gambiae there were two periods of high TF transcriptome similarity, one during the Arthropod phylotypic stage at 8-10 hours into Drosophila development and the other later at 16-18 hours into Drosophila development.Comment: ~10 pages, 50 references, 6+3 figures and 5 table

    QServer: A Biclustering Server for Prediction and Assessment of Co-Expressed Gene Clusters

    Get PDF
    BACKGROUND: Biclustering is a powerful technique for identification of co-expressed gene groups under any (unspecified) substantial subset of given experimental conditions, which can be used for elucidation of transcriptionally co-regulated genes. RESULTS: We have previously developed a biclustering algorithm, QUBIC, which can solve more general biclustering problems than previous biclustering algorithms. To fully utilize the analysis power the algorithm provides, we have developed a web server, QServer, for prediction, computational validation and analyses of co-expressed gene clusters. Specifically, the QServer has the following capabilities in addition to biclustering by QUBIC: (i) prediction and assessment of conserved cis regulatory motifs in promoter sequences of the predicted co-expressed genes; (ii) functional enrichment analyses of the predicted co-expressed gene clusters using Gene Ontology (GO) terms, and (iii) visualization capabilities in support of interactive biclustering analyses. QServer supports the biclustering and functional analysis for a wide range of organisms, including human, mouse, Arabidopsis, bacteria and archaea, whose underlying genome database will be continuously updated. CONCLUSION: We believe that QServer provides an easy-to-use and highly effective platform useful for hypothesis formulation and testing related to transcription co-regulation

    Haemogenic endocardium contributes to transient definitive haematopoiesis.

    Get PDF
    Haematopoietic cells arise from spatiotemporally restricted domains in the developing embryo. Although studies of non-mammalian animal and in vitro embryonic stem cell models suggest a close relationship among cardiac, endocardial and haematopoietic lineages, it remains unknown whether the mammalian heart tube serves as a haemogenic organ akin to the dorsal aorta. Here we examine the haemogenic activity of the developing endocardium. Mouse heart explants generate myeloid and erythroid colonies in the absence of circulation. Haemogenic activity arises from a subset of endocardial cells in the outflow cushion and atria earlier than in the aorta-gonad-mesonephros region, and is transient and definitive in nature. Interestingly, key cardiac transcription factors, Nkx2-5 and Isl1, are expressed in and required for the haemogenic population of the endocardium. Together, these data suggest that a subset of endocardial/endothelial cells serve as a de novo source for transient definitive haematopoietic progenitors

    Graphical technique for identifying a monotonic variance stabilizing transformation for absolute gene intensity signals

    Get PDF
    BACKGROUND: The usefulness of log(2 )transformation for cDNA microarray data has led to its widespread application to Affymetrix data. For Affymetrix data, where absolute intensities are indicative of number of transcripts, there is a systematic relationship between variance and magnitude of measurements. Application of the log(2 )transformation expands the scale of genes with low intensities while compressing the scale of genes with higher intensities thus reversing the mean by variance relationship. The usefulness of these transformations needs to be examined. RESULTS: Using an Affymetrix GeneChip(® )dataset, problems associated with applying the log(2 )transformation to absolute intensity data are demonstrated. Use of the spread-versus-level plot to identify an appropriate variance stabilizing transformation is presented. For the data presented, the spread-versus-level plot identified a power transformation that successfully stabilized the variance of probe set summaries. CONCLUSION: The spread-versus-level plot is helpful to identify transformations for variance stabilization. This is robust against outliers and avoids assumption of models and maximizations

    Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies.

    Get PDF
    BackgroundThe advent of personalized medicine requires robust, reproducible biomarkers that indicate which treatment will maximize therapeutic benefit while minimizing side effects and costs. Numerous molecular signatures have been developed over the past decade to fill this need, but their validation and up-take into clinical settings has been poor. Here, we investigate the technical reasons underlying reported failures in biomarker validation for non-small cell lung cancer (NSCLC).MethodsWe evaluated two published prognostic multi-gene biomarkers for NSCLC in an independent 442-patient dataset. We then systematically assessed how technical factors influenced validation success.ResultsBoth biomarkers validated successfully (biomarker #1: hazard ratio (HR) 1.63, 95% confidence interval (CI) 1.21 to 2.19, P = 0.001; biomarker #2: HR 1.42, 95% CI 1.03 to 1.96, P = 0.030). Further, despite being underpowered for stage-specific analyses, both biomarkers successfully stratified stage II patients and biomarker #1 also stratified stage IB patients. We then systematically evaluated reasons for reported validation failures and find they can be directly attributed to technical challenges in data analysis. By examining 24 separate pre-processing techniques we show that minor alterations in pre-processing can change a successful prognostic biomarker (HR 1.85, 95% CI 1.37 to 2.50, P < 0.001) into one indistinguishable from random chance (HR 1.15, 95% CI 0.86 to 1.54, P = 0.348). Finally, we develop a new method, based on ensembles of analysis methodologies, to exploit this technical variability to improve biomarker robustness and to provide an independent confidence metric.ConclusionsBiomarkers comprise a fundamental component of personalized medicine. We first validated two NSCLC prognostic biomarkers in an independent patient cohort. Power analyses demonstrate that even this large, 442-patient cohort is under-powered for stage-specific analyses. We then use these results to discover an unexpected sensitivity of validation to subtle data analysis decisions. Finally, we develop a novel algorithmic approach to exploit this sensitivity to improve biomarker robustness

    Identification of biomarkers for ischemic cardiomyopathy based on microarray data analysis

    Get PDF
    Background: The aim of this study was to explore the biomarkers and potential mechanism underlying ischemic cardiomyopathy (ICM). Methods: Using the GSE42955 Affymetrix microarray data accessible from the Gene Expression Omnibus database, the differentially expressed genes between 12 ICM tissue samples and 5 normal controls were identified. To investigate the function changes in the course of disease progression, Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed on the differentially expressed genes, followed by analysis of the protein–protein interaction (PPI) network and modules. Results: A total of 50 up-regulated and 179 down-regulated genes were identified. The biological processes of immune response, response to virus, and cell adhesion molecules (CAMs) were significantly altered by the differentially expressed genes. The PPI network revealed certain hub nodes such as CXCL10, IRF1, STAT1, IFIT2, and IFIT3. Conclusions: Candidate biomarker genes such as CXCL10, IRF1, STAT1, IFIT2, and IFIT3 may be suitable therapeutic targets for ICM. Further study of the CAMs pathway and immune response biological processes will be helpful in understanding the pathogenesis of ICM

    Lymphoid priming in human bone marrow begins before expression of CD10 with upregulation of L-selectin.

    Get PDF
    Expression of the cell-surface antigen CD10 has long been used to define the lymphoid commitment of human cells. Here we report a unique lymphoid-primed population in human bone marrow that was generated from hematopoietic stem cells (HSCs) before onset of the expression of CD10 and commitment to the B cell lineage. We identified this subset by high expression of the homing molecule L-selectin (CD62L). CD10(-)CD62L(hi) progenitors had full lymphoid and monocytic potential but lacked erythroid potential. Gene-expression profiling placed the CD10(-)CD62L(hi) population at an intermediate stage of differentiation between HSCs and lineage-negative (Lin(-)) CD34(+)CD10(+) progenitors. CD62L was expressed on immature thymocytes, and its ligands were expressed at the cortico-medullary junction of the thymus, which suggested a possible role for this molecule in homing to the thymus. Our studies identify the earliest stage of lymphoid priming in human bone marrow

    Drug-Induced Regulation of Target Expression

    Get PDF
    Drug perturbations of human cells lead to complex responses upon target binding. One of the known mechanisms is a (positive or negative) feedback loop that adjusts the expression level of the respective target protein. To quantify this mechanism systems-wide in an unbiased way, drug-induced differential expression of drug target mRNA was examined in three cell lines using the Connectivity Map. To overcome various biases in this valuable resource, we have developed a computational normalization and scoring procedure that is applicable to gene expression recording upon heterogeneous drug treatments. In 1290 drug-target relations, corresponding to 466 drugs acting on 167 drug targets studied, 8% of the targets are subject to regulation at the mRNA level. We confirmed systematically that in particular G-protein coupled receptors, when serving as known targets, are regulated upon drug treatment. We further newly identified drug-induced differential regulation of Lanosterol 14-alpha demethylase, Endoplasmin, DNA topoisomerase 2-alpha and Calmodulin 1. The feedback regulation in these and other targets is likely to be relevant for the success or failure of the molecular intervention
    corecore