93 research outputs found

    Indexing Strategies for Rapid Searches of Short Words in Genome Sequences

    Get PDF
    Searching for matches between large collections of short (14–30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries

    Consistency Analysis of Redundant Probe Sets on Affymetrix Three-Prime Expression Arrays and Applications to Differential mRNA Processing

    Get PDF
    Affymetrix three-prime expression microarrays contain thousands of redundant probe sets that interrogate different regions of the same gene. Differential expression analysis methods rarely consider probe redundancy, which can lead to inaccurate inference about overall gene expression or cause investigators to overlook potentially valuable information about differential regulation of variant mRNA products. We investigated the behaviour and consistency of redundant probe sets in a publicly-available data set containing samples from mouse brain amygdala and hippocampus and asked how applying filtering methods to the data affected consistency of results obtained from redundant probe sets. A genome-based filter that screens and groups probe sets according to their overlapping genomic alignments significantly improved redundant probe set consistency. Screening based on qualitative Present-Absent calls from MAS5 also improved consistency. However, even after applying these filters, many redundant probe sets showed significant fold-change differences relative to each other, suggesting differential regulation of alternative transcript production. Visual inspection of these loci using an interactive genome visualization tool (igb.bioviz.org) exposed thirty putative examples of differential regulation of alternative splicing or polyadenylation across brain regions in mouse. This work demonstrates how P/A-call and genome-based filtering can improve consistency among redundant probe sets while at the same time exposing possible differential regulation of RNA processing pathways across sample types

    Establishment of the epithelial-specific transcriptome of normal and malignant human breast cells based on MPSS and array expression data

    Get PDF
    INTRODUCTION: Diverse microarray and sequencing technologies have been widely used to characterise the molecular changes in malignant epithelial cells in breast cancers. Such gene expression studies to identify markers and targets in tumour cells are, however, compromised by the cellular heterogeneity of solid breast tumours and by the lack of appropriate counterparts representing normal breast epithelial cells. METHODS: Malignant neoplastic epithelial cells from primary breast cancers and luminal and myoepithelial cells isolated from normal human breast tissue were isolated by immunomagnetic separation methods. Pools of RNA from highly enriched preparations of these cell types were subjected to expression profiling using massively parallel signature sequencing (MPSS) and four different genome wide microarray platforms. Functional related transcripts of the differential tumour epithelial transcriptome were used for gene set enrichment analysis to identify enrichment of luminal and myoepithelial type genes. Clinical pathological validation of a small number of genes was performed on tissue microarrays. RESULTS: MPSS identified 6,553 differentially expressed genes between the pool of normal luminal cells and that of primary tumours substantially enriched for epithelial cells, of which 98% were represented and 60% were confirmed by microarray profiling. Significant expression level changes between these two samples detected only by microarray technology were shown by 4,149 transcripts, resulting in a combined differential tumour epithelial transcriptome of 8,051 genes. Microarray gene signatures identified a comprehensive list of 907 and 955 transcripts whose expression differed between luminal epithelial cells and myoepithelial cells, respectively. Functional annotation and gene set enrichment analysis highlighted a group of genes related to skeletal development that were associated with the myoepithelial/basal cells and upregulated in the tumour sample. One of the most highly overexpressed genes in this category, that encoding periostin, was analysed immunohistochemically on breast cancer tissue microarrays and its expression in neoplastic cells correlated with poor outcome in a cohort of poor prognosis estrogen receptor-positive tumours. CONCLUSION: Using highly enriched cell populations in combination with multiplatform gene expression profiling studies, a comprehensive analysis of molecular changes between the normal and malignant breast tissue was established. This study provides a basis for the identification of novel and potentially important targets for diagnosis, prognosis and therapy in breast cancer

    A robust method for estimating gene expression states using Affymetrix microarray probe level data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray technology is a high-throughput method for measuring the expression levels of thousand of genes simultaneously. The observed intensities combine a non-specific binding, which is a major disadvantage with microarray data. The Affymetrix GeneChip assigned a mismatch (MM) probe with the intention of measuring non-specific binding, but various opinions exist regarding usefulness of MM measures. It should be noted that not all observed intensities are associated with expressed genes and many of those are associated with unexpressed genes, of which measured values express mere noise due to non-specific binding, cross-hybridization, or stray signals. The implicit assumption that all genes are expressed leads to poor performance of microarray data analyses. We assume two functional states of a gene - expressed or unexpressed - and propose a robust method to estimate gene expression states using an order relationship between PM and MM measures.</p> <p>Results</p> <p>An indicator 'probability of a gene being expressed' was obtained using the number of probe pairs within a probe set where the PM measure exceeds the MM measure. We examined the validity of the proposed indicator using Human Genome U95 data sets provided by Affymetrix. The usefulness of 'probability of a gene being expressed' is illustrated through an exploration of candidate genes involved in neuroblastoma prognosis. We identified the candidate genes for which expression states differed (un-expressed or expressed) when compared between two outcomes. The validity of this result was subsequently confirmed by quantitative RT-PCR.</p> <p>Conclusion</p> <p>The proposed qualitative evaluation, 'probability of a gene being expressed', is a useful indicator for improving microarray data analysis. It is useful to reduce the number of false discoveries. Expression states - expressed or unexpressed - correspond to the most fundamental gene function 'On' and 'Off', which can lead to biologically meaningful results.</p

    Transcriptome profile analysis of flowering molecular processes of early flowering trifoliate orange mutant and the wild-type [Poncirus trifoliata (L.) Raf.] by massively parallel signature sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>After several years in the juvenile phase, trees undergo flowering transition to become mature (florally competent) trees. This transition depends on the balanced expression of a complex network of genes that is regulated by both endogenous and environmental factors. However, relatively little is known about the molecular processes regulating flowering transition in woody plants compared with herbaceous plants.</p> <p>Results</p> <p>Comparative transcript profiling of spring shoots after self-pruning was performed on a spontaneously early flowering trifoliate orange mutant (precocious trifoliate orange, <it>Poncirus trifoliata</it>) with a short juvenile phase and the wild-type (WT) tree by using massively parallel signature sequencing (MPSS). A total of 16,564,500 and 16,235,952 high quality reads were obtained for the WT and the mutant (MT), respectively. Interpretation of the MPSS signatures revealed that the total number of transcribed genes in the MT (31,468) was larger than in the WT (29,864), suggesting that newly initiated transcription occurs in the MT. Further comparison of the transcripts revealed that 2735 genes had more than twofold expression difference in the MT compared with the WT. In addition, we identified 110 citrus flowering-time genes homologous with known elements of flowering-time pathways through sequencing and bioinformatics analysis. These genes are highly conserved in citrus and other species, suggesting that the functions of the related proteins in controlling reproductive development may be conserved as well.</p> <p>Conclusion</p> <p>Our results provide a foundation for comparative gene expression studies between WT and precocious trifoliate orange. Additionally, a number of candidate genes required for the early flowering process of precocious trifoliate orange were identified. These results provide new insight into the molecular processes regulating flowering time in citrus.</p

    Application of affymetrix array and massively parallel signature sequencing for identification of genes involved in prostate cancer progression

    Get PDF
    BACKGROUND: Affymetrix GeneChip Array and Massively Parallel Signature Sequencing (MPSS) are two high throughput methodologies used to profile transcriptomes. Each method has certain strengths and weaknesses; however, no comparison has been made between the data derived from Affymetrix arrays and MPSS. In this study, two lineage-related prostate cancer cell lines, LNCaP and C4-2, were used for transcriptome analysis with the aim of identifying genes associated with prostate cancer progression. METHODS: Affymetrix GeneChip array and MPSS analyses were performed. Data was analyzed with GeneSpring 6.2 and in-house perl scripts. Expression array results were verified with RT-PCR. RESULTS: Comparison of the data revealed that both technologies detected genes the other did not. In LNCaP, 3,180 genes were only detected by Affymetrix and 1,169 genes were only detected by MPSS. Similarly, in C4-2, 4,121 genes were only detected by Affymetrix and 1,014 genes were only detected by MPSS. Analysis of the combined transcriptomes identified 66 genes unique to LNCaP cells and 33 genes unique to C4-2 cells. Expression analysis of these genes in prostate cancer specimens showed CA1 to be highly expressed in bone metastasis but not expressed in primary tumor and EPHA7 to be expressed in normal prostate and primary tumor but not bone metastasis. CONCLUSION: Our data indicates that transcriptome profiling with a single methodology will not fully assess the expression of all genes in a cell line. A combination of transcription profiling technologies such as DNA array and MPSS provides a more robust means to assess the expression profile of an RNA sample. Finally, genes that were differentially expressed in cell lines were also differentially expressed in primary prostate cancer and its metastases

    On the Wegener granulomatosis associated region on chromosome 6p21.3

    Get PDF
    BACKGROUND: Wegener granulomatosis (WG) belongs to the heterogeneous group of systemic vasculitides. The multifactorial pathophysiology of WG is supposedly caused by yet unknown environmental influence(s) on the basis of genetic predisposition. The presence of anti-neutrophil cytoplasmic antibodies (ANCA) in the plasma of patients and genetic involvement of the human leukocyte antigen system reflect an autoimmune background of the disease. Strong associations were revealed with WG by markers located in the major histocompatibility complex class II (MHC II) region in the vicinity of human leukocyte antigen (HLA)-DPB1 and the retinoid X receptor B (RXRB) loci. In order to define the involvement of the 6p21.3 region in WG in more detail this previous population-based association study was expanded here to the respective 3.6 megabase encompassing this region on chromosome 6. The RXRB gene was analysed as well as a splice-site variation of the butyrophilin-like (BTNL2) gene which is also located within the respective region. The latter polymorphism has been evaluated here as it appears as a HLA independent susceptibility factor in another granulomatous disorder, sarcoidosis. METHODS: 150–180 German WG patients and a corresponding cohort of healthy controls (n = 100–261) were used in a two-step study. A panel of 94 microsatellites was designed for the initial step using a DNA pooling approach. Markers with significantly differing allele frequencies between patient and control pools were individually genotyped. The RXRB gene was analysed for single strand conformation polymorphisms (SSCP) and restriction fragment length polymorphisms (RFLP). The splice-site polymorphism in the BTNL2 gene was also investigated by RFLP analysis. RESULTS: A previously investigated microsatellite (#1.0.3.7, Santa Cruz genome browser (UCSC) May 2004 Freeze localisation: chr6:31257596-34999883), which was used as a positive control, remained associated throughout the whole two-step approach. Yet, no additional evidence for association of other microsatellite markers was found in the entire investigated region. Analysis of the RXRB gene located in the WG associated region revealed associations of two variations (rs10548957 p(allelic )= 0.02 and rs6531 p(allelic )= 5.20 × 10(-5), OR = 1.88). Several alleles of markers located between HLA-DPB1, SNP rs6531 and microsatellite 1.0.3.7 showed linkage disequilibrium with r(2 )values exceeding 0.10. Significant differences were not demonstrable for the sarcoidosis associated splice-site variation (rs2076530 p(allelic )= 0.80) in our WG cohort. CONCLUSION: Since a microsatellite flanking the RXRB gene and two intragenic polymorphisms are associated significantly with WG on chromosome 6p21.3, further investigations should be focussed on extensive fine-mapping in this region by densely mapping with additional markers such as SNPs. This strategy may reveal even deeper insights into the genetic contributions of the respective region for the pathogenesis of WG

    A Type 2C Protein Phosphatase FgPtc3 Is Involved in Cell Wall Integrity, Lipid Metabolism, and Virulence in Fusarium graminearum

    Get PDF
    Type 2C protein phosphatases (PP2Cs) play important roles in regulating many biological processes in eukaryotes. Currently, little is known about functions of PP2Cs in filamentous fungi. The causal agent of wheat head blight, Fusarium graminearum, contains seven putative PP2C genes, FgPTC1, -3, -5, -5R, -6, -7 and -7R. In order to investigate roles of these PP2Cs, we constructed deletion mutants for all seven PP2C genes in this study. The FgPTC3 deletion mutant (ΔFgPtc3-8) exhibited reduced aerial hyphae formation and deoxynivalenol (DON) production, but increased production of conidia. The mutant showed increased resistance to osmotic stress and cell wall-damaging agents on potato dextrose agar plates. Pathogencity assays showed that ΔFgPtc3-8 is unable to infect flowering wheat head. All of the defects were restored when ΔFgPtc3-8 was complemented with the wild-type FgPTC3 gene. Additionally, the FgPTC3 partially rescued growth defect of a yeast PTC1 deletion mutant under various stress conditions. Ultrastructural and histochemical analyses showed that conidia of ΔFgPtc3-8 contained an unusually high number of large lipid droplets. Furthermore, the mutant accumulated a higher basal level of glycerol than the wild-type progenitor. Quantitative real-time PCR assays showed that basal expression of FgOS2, FgSLT2 and FgMKK1 in the mutant was significantly higher than that in the wild-type strain. Serial analysis of gene expression in ΔFgPtc3-8 revealed that FgPTC3 is associated with various metabolic pathways. In contrast to the FgPTC3 mutant, the deletion mutants of FgPTC1, FgPTC5, FgPTC5R, FgPTC6, FgPTC7 or FgPTC7R did not show aberrant phenotypic features when grown on PDA medium or inoculated on wheat head. These results indicate FgPtc3 is the key PP2C that plays a critical role in a variety of cellular and biological functions, including cell wall integrity, lipid and secondary metabolisms, and virulence in F. graminearum

    Alternative splicing enriched cDNA libraries identify breast cancer-associated transcripts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Alternative splicing (AS) is a central mechanism in the generation of genomic complexity and is a major contributor to transcriptome and proteome diversity. Alterations of the splicing process can lead to deregulation of crucial cellular processes and have been associated with a large spectrum of human diseases. Cancer-associated transcripts are potential molecular markers and may contribute to the development of more accurate diagnostic and prognostic methods and also serve as therapeutic targets. Alternative splicing-enriched cDNA libraries have been used to explore the variability generated by alternative splicing. In this study, by combining the use of trapping heteroduplexes and RNA amplification, we developed a powerful approach that enables transcriptome-wide exploration of the AS repertoire for identifying AS variants associated with breast tumor cells modulated by <it>ERBB2</it> (<it>HER-2/neu</it>) oncogene expression.</p> <p>Results</p> <p>The human breast cell line (C5.2) and a pool of 5 ERBB2 over-expressing breast tumor samples were used independently for the construction of two AS-enriched libraries. In total, 2,048 partial cDNA sequences were obtained, revealing 214 alternative splicing sequence-enriched tags (ASSETs). A subset with 79 multiple exon ASSETs was compared to public databases and reported 138 different AS events. A high success rate of RT-PCR validation (94.5%) was obtained, and 2 novel AS events were identified. The influence of <it>ERBB2</it>-mediated expression on AS regulation was evaluated by capillary electrophoresis and probe-ligation approaches in two mammary cell lines (Hb4a and C5.2) expressing different levels of <it>ERBB2</it>. The relative expression balance between AS variants from 3 genes was differentially modulated by <it>ERBB2</it> in this model system.</p> <p>Conclusions</p> <p>In this study, we presented a method for exploring AS from any RNA source in a transcriptome-wide format, which can be directly easily adapted to next generation sequencers. We identified AS transcripts that were differently modulated by <it>ERBB2</it>-mediated expression and that can be tested as molecular markers for breast cancer. Such a methodology will be useful for completely deciphering the cancer cell transcriptome diversity resulting from AS and for finding more precise molecular markers.</p

    RNA-Seq Analyses Generate Comprehensive Transcriptomic Landscape and Reveal Complex Transcript Patterns in Hepatocellular Carcinoma

    Get PDF
    RNA-seq is a powerful tool for comprehensive characterization of whole transcriptome at both gene and exon levels and with a unique ability of identifying novel splicing variants. To date, RNA-seq analysis of HBV-related hepatocellular carcinoma (HCC) has not been reported. In this study, we performed transcriptome analyses for 10 matched pairs of cancer and non-cancerous tissues from HCC patients on Solexa/Illumina GAII platform. On average, about 21.6 million sequencing reads and 10.6 million aligned reads were obtained for samples sequenced on each lane, which was able to identify >50% of all the annotated genes for each sample. Furthermore, we identified 1,378 significantly differently expressed genes (DEGs) and 24, 338 differentially expressed exons (DEEs). Comprehensive function analyses indicated that cell growth-related, metabolism-related and immune-related pathways were most significantly enriched by DEGs, pointing to a complex mechanism for HCC carcinogenesis. Positional gene enrichment analysis showed that DEGs were most significantly enriched at chromosome 8q21.3–24.3. The most interesting findings were from the analysis at exon levels where we characterized three major patterns of expression changes between gene and exon levels, implying a much complex landscape of transcript-specific differential expressions in HCC. Finally, we identified a novel highly up-regulated exon-exon junction in ATAD2 gene in HCC tissues. Overall, to our best knowledge, our study represents the most comprehensive characterization of HBV-related HCC transcriptome including exon level expression changes and novel splicing variants, which illustrated the power of RNA-seq and provided important clues for understanding the molecular mechanisms of HCC pathogenesis at system-wide levels
    corecore