58 research outputs found

    A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6

    Get PDF
    Motivation: High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs. Results: As with our method for earlier generations of arrays, this one controls for allelic crosstalk, probe affinities and PCR fragment-length effects. Additionally, it also corrects for probe sequence effects and co-hybridization of fragments digested by multiple enzymes that takes place on the latest chips. We compare our method with Affymetrix's CN5 method and the dChip method by assessing how well they differentiate between various CN states at the full resolution and various amounts of smoothing. Although CRMA v2 is a single-array method, we observe that it performs as well as or better than alternative methods that use data from all arrays for their preprocessing. This shows that it is possible to do online analysis in large-scale projects where additional arrays are introduced over time. Availability: A bounded-memory implementation that can process any number of arrays is available in the open source R package aroma.affymetrix. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Relationship between estrogen receptor α location and gene induction reveals the importance of downstream sites and cofactors

    Get PDF
    BACKGROUND: To understand cancer-related modifications to transcriptional programs requires detailed knowledge about the activation of signal-transduction pathways and gene expression programs. To investigate the mechanisms of target gene regulation by human estrogen receptor alpha (hERalpha), we combine extensive location and expression datasets with genomic sequence analysis. In particular, we study the influence of patterns of DNA occupancy by hERalpha on expression phenotypes. RESULTS: We find that strong ChIP-chip sites co-localize with strong hERalpha consensus sites and detect nucleotide bias near hERalpha sites. The localization of ChIP-chip sites relative to annotated genes shows that weak sites are enriched near transcription start sites, while stronger sites show no positional bias. Assessing the relationship between binding configurations and expression phenotypes, we find binding sites downstream of the transcription start site (TSS) to be equally good or better predictors of hERalpha-mediated expression as upstream sites. The study of FOX and SP1 cofactor sites near hERalpha ChIP sites shows that induced genes frequently have FOX or SP1 sites. Finally we integrate these multiple datasets to define a high confidence set of primary hERalpha target genes. CONCLUSION: Our results support the model of long-range interactions of hERalpha with the promoter-bound cofactor SP1 residing at the promoter of hERalpha target genes. FOX motifs co-occur with hERalpha motifs along responsive genes. Importantly we show that the spatial arrangement of sites near the start sites and within the full transcript is important in determining response to estrogen signaling

    Low E2F1 transcript levels are a strong determinant of favorable breast cancer outcome

    Get PDF
    INTRODUCTION: We investigated whether mRNA levels of E2F1, a key transcription factor involved in proliferation, differentiation and apoptosis, could be used as a surrogate marker for the determination of breast cancer outcome. METHODS: E2F1 and other proliferation markers were measured by quantitative RT-PCR in 317 primary breast cancer patients from the Stiftung Tumorbank Basel. Correlations to one another as well as to the estrogen receptor and ERBB2 status and clinical outcome were investigated. Results were validated and further compared with expression-based prognostic profiles using The Netherlands Cancer Institute microarray data set reported by Fan and colleagues. RESULTS: E2F1 mRNA expression levels correlated strongly with the expression of other proliferation markers, and low values were mainly found in estrogen receptor-positive and ERBB2-negative phenotypes. Patients with low E2F1-expressing tumors were associated with favorable outcome (hazard ratio = 4.3 (95% confidence interval = 1.8-9.9), P = 0.001). These results were consistent in univariate and multivariate Cox analyses, and were successfully validated in The Netherlands Cancer Institute data set. Furthermore, E2F1 expression levels correlated well with the 70-gene signature displaying the ability of selecting a common subset of patients at good prognosis. Breast cancer patients' outcome was comparably predictable by E2F1 levels, by the 70-gene signature, by the intrinsic subtype gene classification, by the wound response signature and by the recurrence score. CONCLUSION: Assessment of E2F1 at the mRNA level in primary breast cancer is a strong determinant of breast cancer patient outcome. E2F1 expression identified patients at low risk of metastasis irrespective of the estrogen receptor and ERBB2 status, and demonstrated similar prognostic performance to different gene expression-based predictors

    A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6

    Get PDF
    Motivation: High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs

    Test of Four Colon Cancer Risk-Scores in Formalin Fixed Paraffin Embedded Microarray Gene Expression Data

    Get PDF
    Background Prognosis prediction for resected primary colon cancer is based on the T-stage Node Metastasis (TNM) staging system. We investigated if four well-documented gene expression risk scores can improve patient stratification. Methods Microarray-based versions of risk-scores were applied to a large independent cohort of 688 stage II/III tumors from the PETACC-3 trial. Prognostic value for relapse-free survival (RFS), survival after relapse (SAR), and overall survival (OS) was assessed by regression analysis. To assess improvement over a reference, prognostic model was assessed with the area under curve (AUC) of receiver operating characteristic (ROC) curves. All statistical tests were two-sided, except the AUC increase. Results All four risk scores (RSs) showed a statistically significant association (single-test, P < .0167) with OS or RFS in univariate models, but with HRs below 1.38 per interquartile range. Three scores were predictors of shorter RFS, one of shorter SAR. Each RS could only marginally improve an RFS or OS model with the known factors T-stage, N-stage, and microsatellite instability (MSI) status (AUC gains < 0.025 units). The pairwise interscore discordance was never high (maximal Spearman correlation = 0.563) A combined score showed a trend to higher prognostic value and higher AUC increase for OS (HR = 1.74, 95% confidence interval [CI] = 1.44 to 2.10, P < .001, AUC from 0.6918 to 0.7321) and RFS (HR = 1.56, 95% CI = 1.33 to 1.84, P < .001, AUC from 0.6723 to 0.6945) than any single score. Conclusions The four tested gene expression-based risk scores provide prognostic information but contribute only marginally to improving models based on established risk factors. A combination of the risk scores might provide more robust information. Predictors of RFS and SAR might need to be differen

    Selecting control genes for RT-QPCR using public microarray data

    Get PDF
    Background: Gene expression analysis has emerged as a major biological research area, with real-time quantitative reverse transcription PCR (RT-QPCR) being one of the most accurate and widely used techniques for expression profiling of selected genes. In order to obtain results that are comparable across assays, a stable normalization strategy is required. In general, the normalization of PCR measurements between different samples uses one to several control genes (e. g. housekeeping genes), from which a baseline reference level is constructed. Thus, the choice of the control genes is of utmost importance, yet there is not a generally accepted standard technique for screening a large number of candidates and identifying the best ones. Results: We propose a novel approach for scoring and ranking candidate genes for their suitability as control genes. Our approach relies on publicly available microarray data and allows the combination of multiple data sets originating from different platforms and/or representing different pathologies. The use of microarray data allows the screening of tens of thousands of genes, producing very comprehensive lists of candidates. We also provide two lists of candidate control genes: one which is breast cancer-specific and one with more general applicability. Two genes from the breast cancer list which had not been previously used as control genes are identified and validated by RT-QPCR. Open source R functions are available at http://www.isrec.isb-sib.ch/similar to vpopovic/research/ Conclusion: We proposed a new method for identifying candidate control genes for RT-QPCR which was able to rank thousands of genes according to some predefined suitability criteria and we applied it to the case of breast cancer. We also empirically showed that translating the results from microarray to PCR platform was achievable

    Identifying synergistic regulation involving c-Myc and sp1 in human tissues

    Get PDF
    Combinatorial gene regulation largely contributes to phenotypic versatility in higher eukaryotes. Genome-wide chromatin immuno-precipitation (ChIP) combined with expression profiling can dissect regulatory circuits around transcriptional regulators. Here, we integrate tiling array measurements of DNA-binding sites for c-Myc, sp1, TFIID and modified histones with a tissue expression atlas to establish the functional correspondence between physical binding, promoter activity and transcriptional regulation. For this we develop SLM, a methodology to map c-Myc and sp1-binding sites and then classify sites as sp1-only, c-Myc-only or dual. Dual sites show several distinct features compared to the single regulator sites: specifically, they exhibit overall higher degree of conservation between human and rodents, stronger correlation with TFIID-bound promoters, and preference for permissive chromatin state. By applying regression models to an expression atlas we identified a functionally distinct signature for strong dual c-Myc/sp1 sites. Namely, the correlation with c-Myc expression in promoters harboring dual-sites is increased for stronger sp1 sites by strong sp1 binding and the effect is largest in proliferating tissues. Our approach shows how integrated functional analyses can uncover tissue-specific and combinatorial regulatory dependencies in mammals

    Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures

    Get PDF
    INTRODUCTION: Breast cancer subtyping and prognosis have been studied extensively by gene expression profiling, resulting in disparate signatures with little overlap in their constituent genes. Although a previous study demonstrated a prognostic concordance among gene expression signatures, it was limited to only one dataset and did not fully elucidate how the different genes were related to one another nor did it examine the contribution of well-known biological processes of breast cancer tumorigenesis to their prognostic performance. METHOD: To address the above issues and to further validate these initial findings, we performed the largest meta-analysis of publicly available breast cancer gene expression and clinical data, which are comprised of 2,833 breast tumors. Gene coexpression modules of three key biological processes in breast cancer (namely, proliferation, estrogen receptor [ER], and HER2 signaling) were used to dissect the role of constituent genes of nine prognostic signatures. RESULTS: Using a meta-analytical approach, we consolidated the signatures associated with ER signaling, ERBB2 amplification, and proliferation. Previously published expression-based nomenclature of breast cancer 'intrinsic' subtypes can be mapped to the three modules, namely, the ER-/HER2- (basal-like), the HER2+ (HER2-like), and the low- and high-proliferation ER+/HER2- subtypes (luminal A and B). We showed that all nine prognostic signatures exhibited a similar prognostic performance in the entire dataset. Their prognostic abilities are due mostly to the detection of proliferation activity. Although ER- status (basal-like) and ERBB2+ expression status correspond to bad outcome, they seem to act through elevated expression of proliferation genes and thus contain only indirect information about prognosis. Clinical variables measuring the extent of tumor progression, such as tumor size and nodal status, still add independent prognostic information to proliferation genes. CONCLUSION: This meta-analysis unifies various results of previous gene expression studies in breast cancer. It reveals connections between traditional prognostic factors, expression-based subtyping, and prognostic signatures, highlighting the important role of proliferation in breast cancer prognosis.Journal ArticleMeta-AnalysisResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe

    The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases

    Get PDF
    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article
    corecore