185 research outputs found

    PLANdbAffy: probe-level annotation database for Affymetrix expression microarrays

    Get PDF
    Standard Affymetrix technology evaluates gene expression by measuring the intensity of mRNA hybridization with a panel of the 25-mer oligonucleotide probes, and summarizing the probe signal intensities by a robust average method. However, in many cases, signal intensity of the probe does not correlate with gene expression. This could be due to the hybridization of the probe to a transcript of another gene, mapping of the probe to an intron, alternative splicing, single nucleotide polymorphisms and other reasons. We have developed a database, PLANdbAffy (available at http://affymetrix2.bioinf.fbb.msu.ru), that contains the results of the alignment of probe sequences from five Affymetrix expression microarrays to the human genome. We have determined the probes matching the transcript-coding regions in the correct orientation. For each such probe alignment region, we determined the mRNA and EST sequences that contain the probe sequence. In the textual part of the database interface we summarize the data on the sequences that cover the probe alignment region and SNPs that are located inside it. The graphical part of our database interface is implemented as custom tracks to the UCSC genome browser that allows one to utilize all the data that are offered by UCSC browser

    Integrative methods for reconstruction of dynamic networks in chondrogenesis

    Get PDF
    Application of human mesenchymal stem cells represents a promising approach in the field of regenerative medicine. Specific stimulation can give rise to chondrocytes, osteocytes or adipocytes. Investigation of the underlying biological processes which induce the observed cellular differentiation is essential to efficiently generate specific tissues for therapeutic purposes. Upon treatment with diverse stimuli, gene expression levels of cultivated human mesenchymal stem cells were monitored using time series microarray experiments for the three lineages. Application of gene network inference is a common approach to identify the regulatory dependencies among a set of investigated genes. This thesis applies the NetGenerator V2.0 tool, which is capable to deal with multiple time series data, which investigates the effect of multiple external stimuli. The applied model is based on a system of linear ordinary differential equations, whose parameters are optimised to reproduce the given time series datasets. Several procedures in the inference process were adapted in this new version in order to allow for the integration of multiple datasets. Network inference was applied on in silico network examples as well as on multi-experiment microarray data of mesenchymal stem cells. The resulting chondrogenesis model was evaluated on the basis of several features including the model adaptation to the data, total number of connections, proportion of connections associated with prior knowledge and the model stability in a resampling procedure. Altogether, NetGenerator V2.0 has provided an automatic and efficient way to integrate experimental datasets and to enhance the interpretability and reliability of the resulting network. In a second chondrogenesis model, the miRNA and mRNA time series data were integrated for the purpose of network inference. One hypothesis of the model was verified by experiments, which demonstrated the negative effect of miR-524-5p on downstream genes

    Jetset: selecting the optimal microarray probe set to represent a gene

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Interpretation of gene expression microarrays requires a mapping from probe set to gene. On many Affymetrix gene expression microarrays, a given gene may be detected by multiple probe sets, which may deliver inconsistent or even contradictory measurements. Therefore, obtaining an unambiguous expression estimate of a pre-specified gene can be a nontrivial but essential task.</p> <p>Results</p> <p>We developed scoring methods to assess each probe set for specificity, splice isoform coverage, and robustness against transcript degradation. We used these scores to select a single representative probe set for each gene, thus creating a simple one-to-one mapping between gene and probe set. To test this method, we evaluated concordance between protein measurements and gene expression values, and between sets of genes whose expression is known to be correlated. For both test cases, we identified genes that were nominally detected by multiple probe sets, and we found that the probe set chosen by our method showed stronger concordance.</p> <p>Conclusions</p> <p>This method provides a simple, unambiguous mapping to allow assessment of the expression levels of specific genes of interest.</p

    A Signature Inferred from Drosophila Mitotic Genes Predicts Survival of Breast Cancer Patients

    Get PDF
    Introduction: The classification of breast cancer patients into risk groups provides a powerful tool for the identification of patients who will benefit from aggressive systemic therapy. The analysis of microarray data has generated several gene expression signatures that improve diagnosis and allow risk assessment. There is also evidence that cell proliferation-related genes have a high predictive power within these signatures. Methods: We thus constructed a gene expression signature (the DM signature) using the human orthologues of 108 Drosophila melanogaster genes required for either the maintenance of chromosome integrity (36 genes) or mitotic division (72 genes). Results: The DM signature has minimal overlap with the extant signatures and is highly predictive of survival in 5 large breast cancer datasets. In addition, we show that the DM signature outperforms many widely used breast cancer signatures in predictive power, and performs comparably to other proliferation-based signatures. For most genes of the DM signature, an increased expression is negatively correlated with patient survival. The genes that provide the highest contribution to the predictive power of the DM signature are those involved in cytokinesis. Conclusion: This finding highlights cytokinesis as an important marker in breast cancer prognosis and as a possible targe

    Studies on the relationships between oligonucleotide probe properties and hybridization signal intensities

    Get PDF
    Microarray technology is a commonly used tool in biomedical research for assessing global gene expression, surveying DNA sequence variations, and studying alternative gene splicing. Given the wide range of applications of this technology, comprehensive understanding of its underlying mechanisms is of importance. The focus of this work is on contributions from microarray probe properties (probe secondary structure: ?Gss, probe-target binding energy: ?G, probe-target mismatch) to the signal intensity. The benefits of incorporating or ignoring these properties to the process of microarray probe design and selection, as well as to microarray data preprocessing and analysis, are reported. Four related studies are described in this thesis. In the first, probe secondary structure was found to account for up to 3% of all variation on Affymetrix microarrays. In the second, a dinucleotide affinity model was developed and found to enhance the detection of differentially expressed genes when implemented as a background correction procedure in GeneChip preprocessing algorithms. This model is consistent with physical models of binding affinity of the probe target pair, which depends on the nearest-neighbor stacking interactions in addition to base-pairing. In the remaining studies, the importance of incorporating biophysical factors in both the design and the analysis of microarrays ‘percent bound’, predicted by equilibrium models of hybridization, is a useful factor in predicting and assessing the behavior of long oligonucleotide probes. However, a universal probe-property-independent three-parameter Langmuir model has also been tested, and this simple model has been shown to be as, or more, effective as complex, computationally expensive models developed for microarray target concentration estimation. The simple, platform-independent model can equal or even outperform models that explicitly incorporate probe properties, such as the model incorporating probe percent bound developed in Chapter Three. This suggests that with a “spiked-in” concentration series targeting as few as 5-10 genes, reliable estimation of target concentration can be achieved for the entire microarray

    Disease Gene Characterization through Large-Scale Co-Expression Analysis

    Get PDF
    In the post genome era, a major goal of biology is the identification of specific roles for individual genes. We report a new genomic tool for gene characterization, the UCLA Gene Expression Tool (UGET).Celsius, the largest co-normalized microarray dataset of Affymetrix based gene expression, was used to calculate the correlation between all possible gene pairs on all platforms, and generate stored indexes in a web searchable format. The size of Celsius makes UGET a powerful gene characterization tool. Using a small seed list of known cartilage-selective genes, UGET extended the list of known genes by identifying 32 new highly cartilage-selective genes. Of these, 7 of 10 tested were validated by qPCR including the novel cartilage-specific genes SDK2 and FLJ41170. In addition, we retrospectively tested UGET and other gene expression based prioritization tools to identify disease-causing genes within known linkage intervals. We first demonstrated this utility with UGET using genetically heterogeneous disorders such as Joubert syndrome, microcephaly, neuropsychiatric disorders and type 2 limb girdle muscular dystrophy (LGMD2) and then compared UGET to other gene expression based prioritization programs which use small but discrete and well annotated datasets. Finally, we observed a significantly higher gene correlation shared between genes in disease networks associated with similar complex or Mendelian disorders.UGET is an invaluable resource for a geneticist that permits the rapid inclusion of expression criteria from one to hundreds of genes in genomic intervals linked to disease. By using thousands of arrays UGET annotates and prioritizes genes better than other tools especially with rare tissue disorders or complex multi-tissue biological processes. This information can be critical in prioritization of candidate genes for sequence analysis

    A conserved BDNF, glutamate- and GABA-enriched gene module related to human depression identified by coexpression meta-analysis and DNA variant genome-wide association studies

    Get PDF
    Large scale gene expression (transcriptome) analysis and genome-wide association studies (GWAS) for single nucleotide polymorphisms have generated a considerable amount of gene- and disease-related information, but heterogeneity and various sources of noise have limited the discovery of disease mechanisms. As systematic dataset integration is becoming essential, we developed methods and performed meta-clustering of gene coexpression links in 11 transcriptome studies from postmortem brains of human subjects with major depressive disorder (MDD) and non-psychiatric control subjects. We next sought enrichment in the top 50 meta-analyzed coexpression modules for genes otherwise identified by GWAS for various sets of disorders. One coexpression module of 88 genes was consistently and significantly associated with GWAS for MDD, other neuropsychiatric disorders and brain functions, and for medical illnesses with elevated clinical risk of depression, but not for other diseases. In support of the superior discriminative power of this novel approach, we observed no significant enrichment for GWAS-related genes in coexpression modules extracted from single studies or in meta-modules using gene expression data from non-psychiatric control subjects. Genes in the identified module encode proteins implicated in neuronal signaling and structure, including glutamate metabotropic receptors (GRM1, GRM7), GABA receptors (GABRA2, GABRA4), and neurotrophic and development-related proteins [BDNF, reelin (RELN), Ephrin receptors (EPHA3, EPHA5)]. These results are consistent with the current understanding of molecular mechanisms of MDD and provide a set of putative interacting molecular partners, potentially reflecting components of a functional module across cells and biological pathways that are synchronously recruited in MDD, other brain disorders and MDD-related illnesses. Collectively, this study demonstrates the importance of integrating transcriptome data, gene coexpression modules and GWAS results for providing novel and complementary approaches to investigate the molecular pathology of MDD and other complex brain disorders. © 2014 Chang et al

    Analyse und Interpretation der Varianz von Genexpressionsdaten

    Get PDF
    Die vorliegende Dissertationsschrift fasst vier Arbeiten unter der Überschrift „Analyse und Interpretation der Varianz von Genexpressionsdaten“ zusammen. Zunächst wird der Begriff der „Technischen Varianz“ von dem der „Biologischen Varianz“ abgegrenzt. In der Genexpressionsanalyse mit Microarrays wird unter technischer Varianz der traditionell hohe Messfehler verstanden. Die Gründe hierfür scheinen jedoch mannigfaltig zu sein. Höchst umstritten ist hierbei der Effekt von Kreuzhybridisierungen, also unspezifischen Bindungen von RNA-Fragmenten an die Sonden des Arrays. Einige Forscher halten diesen Effekt für die maßgebliche Fehlerquelle, andere beurteilen ihn als vernachlässigbar. In den ersten zwei Arbeiten wird gezeigt, dass Kreuzhybridisierungen in der Tat erheblich für den Messfehler bei Microarray-Experimenten verantwortlich sind. Gleichzeitig werden, mit einem Satz neuer Chip Definition Files und einer Handreichung zum Design neuer Microarrays, Werkzeuge zum Umgang mit unspezifischen Bindungen zur Verfügung gestellt. Varianz, die auf tatsächlich vorhandenen biologischen Unterschieden basiert, wird biologische Varianz genannt. Bei der Auswertung eines Genexpressionsexperiments werden mittels Analyse der Streuungsparameter mögliche Markertranskripte identifiziert, die bei einer üblichen mittelwertbasierten Auswertung nicht gefunden werden. Durch Mapping der Transkripte auf KEGG-Pathways kann ausgeschlossen werden, dass es sich um falsch positive Treffer handelt. In der vierten Arbeit wird eine Ähnlichkeitsanalyse mit Hilfe von Korrelationskoeffizienten durchgeführt. Durch Auswertung mit der Korrelation nach Kendall können Hypothesen über den funktionalen Pathway in der induzierten Abwehr von Pflanzen gewonnen werden

    G-stack modulated probe intensities on expression arrays - sequence corrections and signal calibration

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The brightness of the probe spots on expression microarrays intends to measure the abundance of specific mRNA targets. Probes with runs of at least three guanines (G) in their sequence show abnormal high intensities which reflect rather probe effects than target concentrations. This G-bias requires correction prior to downstream expression analysis.</p> <p>Results</p> <p>Longer runs of three or more consecutive G along the probe sequence and in particular triple degenerated G at its solution end ((<it>GGG</it>)<sub>1</sub>-effect) are associated with exceptionally large probe intensities on GeneChip expression arrays. This intensity bias is related to non-specific hybridization and affects both perfect match and mismatch probes. The (<it>GGG</it>)<sub>1</sub>-effect tends to increase gradually for microarrays of later GeneChip generations. It was found for DNA/RNA as well as for DNA/DNA probe/target-hybridization chemistries. Amplification of sample RNA using T7-primers is associated with strong positive amplitudes of the G-bias whereas alternative amplification protocols using random primers give rise to much smaller and partly even negative amplitudes.</p> <p>We applied positional dependent sensitivity models to analyze the specifics of probe intensities in the context of all possible short sequence motifs of one to four adjacent nucleotides along the 25meric probe sequence. Most of the longer motifs are adequately described using a nearest-neighbor (NN) model. In contrast, runs of degenerated guanines require explicit consideration of next nearest neighbors (GGG terms). Preprocessing methods such as vsn, RMA, dChip, MAS5 and gcRMA only insufficiently remove the G-bias from data.</p> <p>Conclusions</p> <p>Positional and motif dependent sensitivity models accounts for sequence effects of oligonucleotide probe intensities. We propose a positional dependent NN+GGG hybrid model to correct the intensity bias associated with probes containing poly-G motifs. It is implemented as a single-chip based calibration algorithm for GeneChips which can be applied in a pre-correction step prior to standard preprocessing.</p

    AffyMAPSDetector: a software tool to characterize Affymetrix GeneChip™ expression arrays with respect to SNPs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Affymetrix gene expression arrays incorporate paired perfect match (PM) and mismatch (MM) probes to distinguish true signals from those arising from cross-hybridization events. A MM signal often shows greater intensity than a PM signal; we propose that one underlying cause is the presence of allelic variants arising from single nucleotide polymorphisms (SNPs). To annotate and characterize SNP contributions to anomalous probe binding behavior we have developed a software tool called AffyMAPSDetector.</p> <p>Results</p> <p>AffyMAPSDetector can be used to describe any Affymetrix expression GeneChip™ with respect to SNPs. When AffyMAPSDetector was run on GeneChip™ HG-U95Av2 against dbSNP-build-123, we found 7286 probes (belonging to 2,582 probesets) containing SNPs, out of which 325 probes contained at least one SNP at position 13. Against dbSNP-build-126, 8758 probes (belonging to 3,002 probesets) contained SNPs, of which 409 probes contained at least one SNP at position 13. Therefore, depending on the expressed allele, the MM probe can sometimes be the transcript complement. This information was used to characterize probe measurements reported in a published, well-replicated lung adenocarcinoma study. The total intensity distributions showed that the SNP-containing probes had a larger negative mean intensity difference (PM-MM) and greater range of the difference than did probes without SNPs. In the sample replicates, SNP-containing probes with reproducible intensity ratios were identified, allowing selection of SNP probesets that yielded unique sample signatures. At the gene expression level, use of the (MM-PM) value for SNP-containing probes resulted in different Presence/Absence calls for some genes. Such a change in status of the genes has the clear potential for influencing downstream clustering and classification results.</p> <p>Conclusion</p> <p>Output from this tool characterizes SNP-containing probes on GeneChip™ microarrays, thus improving our understanding of factors contributing to expression measurements. The pattern of SNP binding examined so far indicates distinct behavior of the SNP-containing probes and has the potential to help us identify new SNPs. Knowing which probes contain SNPs provides flexibility in determining whether to include or exclude them from gene-expression intensity calculations; selected sets of SNP-containing probes produce sample-unique signatures.</p> <p>AffyMAPSDetector information is available at <url>http://www.binf.gmu.edu/weller/BMC_bioinformatics/AffyMapsDetector/index.html</url></p
    corecore