31 research outputs found

    Inconsistencies over time in 5% of NetAffx probe-to-gene annotations

    Get PDF
    BACKGROUND: DNA microarray probes are designed to match particular mRNA transcripts, often based on expressed sequences like ESTs, or cDNAs, many times incomplete. As a result, the relations between probes and genes can change as the sequence data are updated. However, it is frequent that the reported results of microarray analyses are given just as lists of genes without any reference to the underlying probes. RESULTS: We show for a particular commercial microarray design that the number of probes associated to some genes change with time. These changes concern approximately 5% of the probe sets across the history of annotation releases over a two year span. CONCLUSION: We recommend to report probe set identifiers when publishing microarray results, and to submit those analyses to microarray public databases to ensure that the interpretation of the data is updated with the latest set of annotations

    K2D2: Estimation of protein secondary structure from circular dichroism spectra

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Circular dichroism spectroscopy is a widely used technique to analyze the secondary structure of proteins in solution. Predictive methods use the circular dichroism spectra from proteins of known tertiary structure to assess the secondary structure contents of a protein with unknown structure given its circular dichroism spectrum.</p> <p>Results</p> <p>We developed K2D2, a method with an associated web server to estimate protein secondary structure from circular dichroism spectra. The method uses a self-organized map of spectra from proteins with known structure to deduce a map of protein secondary structure that is used to do the predictions.</p> <p>Conclusion</p> <p>The K2D2 server is publicly accessible at <url>http://www.ogic.ca/projects/k2d2/</url>. It accepts as input a circular dichroism spectrum and outputs the estimated secondary structure content (alpha-helix and beta-strand) of the corresponding protein, as well as an estimated measure of error.</p

    Linking genes to diseases: it's all in the data

    Get PDF
    Genome-wide association analyses on large patient cohorts are generating large sets of candidate disease genes. This is coupled with the availability of ever-increasing genomic databases and a rapidly expanding repository of biomedical literature. Computational approaches to disease-gene association attempt to harness these data sources to identify the most likely disease gene candidates for further empirical analysis by translational researchers, resulting in efficient identification of genes of diagnostic, prognostic and therapeutic value. Existing computational methods analyze gene structure and sequence, functional annotation of candidate genes, characteristics of known disease genes, gene regulatory networks, protein-protein interactions, data from animal models and disease phenotype. To date, a few studies have successfully applied computational analysis of clinical phenotype data for specific diseases and shown genetic associations. In the near future, computational strategies will be facilitated by improved integration of clinical and computational research, and by increased availability of clinical phenotype data in a format accessible to computational approaches

    Amplification of the Gene Ontology annotation of Affymetrix probe sets

    Get PDF
    BACKGROUND: The annotations of Affymetrix DNA microarray probe sets with Gene Ontology terms are carefully selected for correctness. This results in very accurate but incomplete annotations which is not always desirable for microarray experiment evaluation. RESULTS: Here we present a protocol to amplify the set of Gene Ontology annotations associated to Affymetrix DNA microarray probe sets using information from related databases. CONCLUSION: Predicted novel annotations and the evidence producing them can be accessed at Probe2GO: . Scripts are available on demand

    Information extraction from full text scientific articles: Where are the keywords?

    Get PDF
    BACKGROUND: To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant. RESULTS: In this work we addressed those questions showing that the keyword content of the different sections of a standard scientific article (abstract, introduction, methods, results, and discussion) is very heterogeneous. CONCLUSIONS: Although the abstract contains the best ratio of keywords per total of words, other sections of the article may be a better source of biologically relevant data

    Oct4 Targets Regulatory Nodes to Modulate Stem Cell Function

    Get PDF
    Stem cells are characterized by two defining features, the ability to self-renew and to differentiate into highly specialized cell types. The POU homeodomain transcription factor Oct4 (Pou5f1) is an essential mediator of the embryonic stem cell state and has been implicated in lineage specific differentiation, adult stem cell identity, and cancer. Recent description of the regulatory networks which maintain β€˜ES’ have highlighted a dual role for Oct4 in the transcriptional activation of genes required to maintain self-renewal and pluripotency while concomitantly repressing genes which facilitate lineage specific differentiation. However, the molecular mechanism by which Oct4 mediates differential activation or repression at these loci to either maintain stem cell identity or facilitate the emergence of alternate transcriptional programs required for the realization of lineage remains to be elucidated. To further investigate Oct4 function, we employed gene expression profiling together with a robust statistical analysis to identify genes highly correlated to Oct4. Gene Ontology analysis to categorize overrepresented genes has led to the identification of themes which may prove essential to stem cell identity, including chromatin structure, nuclear architecture, cell cycle control, DNA repair, and apoptosis. Our experiments have identified previously unappreciated roles for Oct4 for firstly, regulating chromatin structure in a state consistent with self-renewal and pluripotency, and secondly, facilitating the expression of genes that keeps the cell poised to respond to cues that lead to differentiation. Together, these data define the mechanism by which Oct4 orchestrates cellular regulatory pathways to enforce the stem cell state and provides important insight into stem cell function and cancer

    Systematic Association of Genes to Phenotypes by Genome and Literature Mining

    Get PDF
    One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene–phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases

    Recent developments in StemBase: a tool to study gene expression in human and murine stem cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation.</p> <p>Findings</p> <p>Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments.</p> <p>Conclusion</p> <p>StemBase can be used to study gene expression in human and murine stem cells and is available at <url>http://www.stembase.ca</url>.</p

    Gene function in early mouse embryonic stem cell differentiation

    Get PDF
    BACKGROUND: Little is known about the genes that drive embryonic stem cell differentiation. However, such knowledge is necessary if we are to exploit the therapeutic potential of stem cells. To uncover the genetic determinants of mouse embryonic stem cell (mESC) differentiation, we have generated and analyzed 11-point time-series of DNA microarray data for three biologically equivalent but genetically distinct mESC lines (R1, J1, and V6.5) undergoing undirected differentiation into embryoid bodies (EBs) over a period of two weeks. RESULTS: We identified the initial 12 hour period as reflecting the early stages of mESC differentiation and studied probe sets showing consistent changes of gene expression in that period. Gene function analysis indicated significant up-regulation of genes related to regulation of transcription and mRNA splicing, and down-regulation of genes related to intracellular signaling. Phylogenetic analysis indicated that the genes showing the largest expression changes were more likely to have originated in metazoans. The probe sets with the most consistent gene changes in the three cell lines represented 24 down-regulated and 12 up-regulated genes, all with closely related human homologues. Whereas some of these genes are known to be involved in embryonic developmental processes (e.g. Klf4, Otx2, Smn1, Socs3, Tagln, Tdgf1), our analysis points to others (such as transcription factor Phf21a, extracellular matrix related Lama1 and Cyr61, or endoplasmic reticulum related Sc4mol and Scd2) that have not been previously related to mESC function. The majority of identified functions were related to transcriptional regulation, intracellular signaling, and cytoskeleton. Genes involved in other cellular functions important in ESC differentiation such as chromatin remodeling and transmembrane receptors were not observed in this set. CONCLUSION: Our analysis profiles for the first time gene expression at a very early stage of mESC differentiation, and identifies a functional and phylogenetic signature for the genes involved. The data generated constitute a valuable resource for further studies. All DNA microarray data used in this study are available in the StemBase database of stem cell gene expression data [1] and in the NCBI's GEO database

    Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes

    Get PDF
    Genome-wide experimental methods to identify disease genes, such as linkage analysis and association studies, generate increasingly large candidate gene sets for which comprehensive empirical analysis is impractical. Computational methods employ data from a variety of sources to identify the most likely candidate disease genes from these gene sets. Here, we review seven independent computational disease gene prioritization methods, and then apply them in concert to the analysis of 9556 positional candidate genes for type 2 diabetes (T2D) and the related trait obesity. We generate and analyse a list of nine primary candidate genes for T2D genes and five for obesity. Two genes, LPL and BCKDHA, are common to these two sets. We also present a set of secondary candidates for T2D (94 genes) and for obesity (116 genes) with 58 genes in common to both diseases
    corecore