26 research outputs found

    MLTrends: Graphing MEDLINE term usage over time

    Get PDF
    The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas and use of scientific language may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors’ MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. Terms considered are individual words or quoted phrases which may be combined using Boolean operators. MLTrends can plot the number of records in MEDLINE per year whose titles or abstracts match each queried term for multiple terms simultaneously. The MEDLINE database is stored and indexed on the MLTrends server allowing queries to be completed and graphs generated in less than one second. Queries may be performed on all titles and/or abstracts in MEDLINE and can include stop words. The resulting graphs may be normalized by total publications or words per year to facilitate term usage comparison between years. This makes MLTrends a powerful tool for rapid evaluation of the evolution of biomedical research and language in a graphical way. MLTrends may be used at: http://www.ogic.ca/mltrend

    A General Model of Codon Bias Due to GC Mutational Bias

    Get PDF
    Background - In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns. // Principal Findings - In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively. // Conclusions - The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured

    Taxonomic colouring of phylogenetic trees of protein sequences

    Get PDF
    BACKGROUND: Phylogenetic analyses of protein families are used to define the evolutionary relationships between homologous proteins. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to those sequences. However, there is no online tool to facilitate this interpretation, for example, by automatically attaching taxonomic information to the nodes of a tree, or by interactively colouring the branches of a tree according to any combination of taxonomic divisions. This is especially problematic if the tree contains on the order of hundreds of sequences, which, given the accelerated increase in the size of the protein sequence databases, is a situation that is becoming common. RESULTS: We have developed PhyloView, a web based tool for colouring phylogenetic trees upon arbitrary taxonomic properties of the species represented in a protein sequence phylogenetic tree. Provided that the tree contains SwissProt, SpTrembl, or GenBank protein identifiers, the tool retrieves the taxonomic information from the corresponding database. A colour picker displays a summary of the findings and allows the user to associate colours to the leaves of the tree according to any number of taxonomic partitions. Then, the colours are propagated to the branches of the tree. CONCLUSION: PhyloView can be used at . A tutorial, the software with documentation, and GPL licensed source code, can be accessed at the same web address

    ChIP on SNP-chip for genome-wide analysis of human histone H4 hyperacetylation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>SNP microarrays are designed to genotype Single Nucleotide Polymorphisms (SNPs). These microarrays report hybridization of DNA fragments and therefore can be used for the purpose of detecting genomic fragments.</p> <p>Results</p> <p>Here, we demonstrate that a SNP microarray can be effectively used in this way to perform chromatin immunoprecipitation (ChIP) on chip as an alternative to tiling microarrays. We illustrate this novel application by mapping whole genome histone H4 hyperacetylation in human myoblasts and myotubes. We detect clusters of hyperacetylated histone H4, often spanning across up to 300 kilobases of genomic sequence. Using complementary genome-wide analyses of gene expression by DNA microarray we demonstrate that these clusters of hyperacetylated histone H4 tend to be associated with expressed genes.</p> <p>Conclusion</p> <p>The use of a SNP array for a ChIP-on-chip application (ChIP on SNP-chip) will be of great value to laboratories whose interest is the determination of general rules regarding the relationship of specific chromatin modifications to transcriptional status throughout the genome and to examine the asymmetric modification of chromatin at heterozygous loci.</p

    Gene function in early mouse embryonic stem cell differentiation

    Get PDF
    BACKGROUND: Little is known about the genes that drive embryonic stem cell differentiation. However, such knowledge is necessary if we are to exploit the therapeutic potential of stem cells. To uncover the genetic determinants of mouse embryonic stem cell (mESC) differentiation, we have generated and analyzed 11-point time-series of DNA microarray data for three biologically equivalent but genetically distinct mESC lines (R1, J1, and V6.5) undergoing undirected differentiation into embryoid bodies (EBs) over a period of two weeks. RESULTS: We identified the initial 12 hour period as reflecting the early stages of mESC differentiation and studied probe sets showing consistent changes of gene expression in that period. Gene function analysis indicated significant up-regulation of genes related to regulation of transcription and mRNA splicing, and down-regulation of genes related to intracellular signaling. Phylogenetic analysis indicated that the genes showing the largest expression changes were more likely to have originated in metazoans. The probe sets with the most consistent gene changes in the three cell lines represented 24 down-regulated and 12 up-regulated genes, all with closely related human homologues. Whereas some of these genes are known to be involved in embryonic developmental processes (e.g. Klf4, Otx2, Smn1, Socs3, Tagln, Tdgf1), our analysis points to others (such as transcription factor Phf21a, extracellular matrix related Lama1 and Cyr61, or endoplasmic reticulum related Sc4mol and Scd2) that have not been previously related to mESC function. The majority of identified functions were related to transcriptional regulation, intracellular signaling, and cytoskeleton. Genes involved in other cellular functions important in ESC differentiation such as chromatin remodeling and transmembrane receptors were not observed in this set. CONCLUSION: Our analysis profiles for the first time gene expression at a very early stage of mESC differentiation, and identifies a functional and phylogenetic signature for the genes involved. The data generated constitute a valuable resource for further studies. All DNA microarray data used in this study are available in the StemBase database of stem cell gene expression data [1] and in the NCBI's GEO database

    Recent developments in StemBase: a tool to study gene expression in human and murine stem cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation.</p> <p>Findings</p> <p>Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments.</p> <p>Conclusion</p> <p>StemBase can be used to study gene expression in human and murine stem cells and is available at <url>http://www.stembase.ca</url>.</p

    Detection of Alpha-Rod Protein Repeats Using a Neural Network and Application to Huntingtin

    Get PDF
    A growing number of solved protein structures display an elongated structural domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel alpha-helices. Alpha-rods are flexible and expose a large surface, which makes them suitable for protein interaction. Although most likely originating by tandem duplication of a two-helix unit, their detection using sequence similarity between repeats is poor. Here, we show that alpha-rod repeats can be detected using a neural network. The network detects more repeats than are identified by domain databases using multiple profiles, with a low level of false positives (<10%). We identify alpha-rod repeats in approximately 0.4% of proteins in eukaryotic genomes. We then investigate the results for all human proteins, identifying alpha-rod repeats for the first time in six protein families, including proteins STAG1-3, SERAC1, and PSMD1-2 & 5. We also characterize a short version of these repeats in eight protein families of Archaeal, Bacterial, and Fungal species. Finally, we demonstrate the utility of these predictions in directing experimental work to demarcate three alpha-rods in huntingtin, a protein mutated in Huntington's disease. Using yeast two hybrid analysis and an immunoprecipitation technique, we show that the huntingtin fragments containing alpha-rods associate with each other. This is the first definition of domains in huntingtin and the first validation of predicted interactions between fragments of huntingtin, which sets up directions toward functional characterization of this protein. An implementation of the repeat detection algorithm is available as a Web server with a simple graphical output: http://www.ogic.ca/projects/ard. This can be further visualized using BiasViz, a graphic tool for representation of multiple sequence alignments

    BIDCHIPS: bias decomposition and removal from ChIP-seq data clarifies true binding signal and its functional correlates

    Get PDF
    Abstract Background Unraveling transcriptional regulatory networks is a central problem in molecular biology and, in this quest, chromatin immunoprecipitation and sequencing (ChIP-seq) technology has given us the unprecedented ability to identify sites of protein-DNA binding and histone modification genome wide. However, multiple systemic and procedural biases hinder harnessing the full potential of this technology. Previous studies have addressed this problem, but a thorough characterization of different, interacting biases on ChIP-seq signals is still lacking. Results Here, we present a novel framework where the genome-wide ChIP-seq signal is viewed as being quantifiably influenced by different, measurable sources of bias, which can then be computationally subtracted away. We use a compendium of 123 human ENCODE ChIP-seq datasets to build regression models that tell us how much of a ChIP-seq signal can be attributed to mappability, GC-content, chromatin accessibility, and factors represented in input DNA and IgG controls. When we use the model to separate out these non-binding influences from the ChIP-seq signal, we obtain a purified signal that associates better to TF-DNA-binding motifs than do other measures of peak significance. We also carry out a multiscale analysis that reveals how ChIP-seq signal biases differ across different scales. Finally, we investigate previously reported associations between gene expression and ChIP-seq signals at transcription start sites. We show that our model can be used to discriminate ChIP-seq signals that are truly related to gene expression from those that are merely correlated by virtue of bias—in particular, chromatin accessibility bias, which shows up in ChIP-seq signals and also relates to gene expression. Conclusions Our study provides new insights into the behavior of ChIP-seq signal biases and proposes a novel mitigation framework that improves results compared to existing techniques. With ChIP-seq now being the central technology for studying transcriptional regulation, it is most crucial to accurately characterize, quantify, and adjust for the genome-wide effects of biases affecting ChIP-seq. Our study also emphasizes that properly accounting for confounders in ChIP-seq data is of paramount importance for obtaining biologically accurate insights into the workings of the complex regulatory mechanisms in living organisms. R and MATLAB packages implementing the framework can be obtained from http://www.perkinslab.ca/Software.html
    corecore