17 research outputs found

    A functional genomics catalogue of activated transcription factors during pathogenesis of pneumococcal disease

    Get PDF
    Background: Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL)controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. Results: A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r2) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil. Conclusions: This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s)

    Comparative GO: a web application for comparative Gene Ontology and Gene Ontology-based gene selection in bacteria

    Get PDF
    Extent: 8p.The primary means of classifying new functions for genes and proteins relies on Gene Ontology (GO), which defines genes/proteins using a controlled vocabulary in terms of their Molecular Function, Biological Process and Cellular Component. The challenge is to present this information to researchers to compare and discover patterns in multiple datasets using visually comprehensible and user-friendly statistical reports. Importantly, while there are many GO resources available for eukaryotes, there are none suitable for simultaneous, graphical and statistical comparison between multiple datasets. In addition, none of them supports comprehensive resources for bacteria. By using Streptococcus pneumoniae as a model, we identified and collected GO resources including genes, proteins, taxonomy and GO relationships from NCBI, UniProt and GO organisations. Then, we designed database tables in PostgreSQL database server and developed a Java application to extract data from source files and loaded into database automatically. We developed a PHP web application based on Model-View-Control architecture, used a specific data structure as well as current and novel algorithms to estimate GO graphs parameters. We designed different navigation and visualization methods on the graphs and integrated these into graphical reports. This tool is particularly significant when comparing GO groups between multiple samples (including those of pathogenic bacteria) from different sources simultaneously. Comparing GO protein distribution among up- or down-regulated genes from different samples can improve understanding of biological pathways, and mechanism(s) of infection. It can also aid in the discovery of genes associated with specific function(s) for investigation as a novel vaccine or therapeutic targets.Mario Fruzangohar, Esmaeil Ebrahimie, Abiodun D. Ogunniyi, Layla K. Mahdi, James C. Paton, David L. Adelso

    Biomedical literature mining.

    Get PDF
    Thousands of biomedical articles are published every year containing many newly discovered biological interactions and functions. Manually reading and classifying this information is a difficult and laborious task. Literature mining contains mechanisms and tools to automate the process of extracting biological relationships, storing them in biological databases and finally analyse and present them in a biological meaningful way. In the first stage of literature mining, articles are parsed and get segmented, sentences separated, tokenized and finally annotated by part of speech tags (POS). POS tagging is the most challenging part because the training corpus is relatively small compared to the large number of biological names therefore limiting the lexicon. There are a number of solutions to address this problem including extending the lexicon manually or using character features of the word. There is no empirical comparison between different solutions. So we developed a complete list of tools including article parser, segmentation, sentence detector, sentence tokeniser, POS tagger and finally noun phrase detector using JAVA and PostgreSQL technologies. We tailored these tools for biomedical texts, and empirically compared them with other tools and we demonstrated increased efficiency of our tools compared to others. Once biological relationships are extracted they are ready to be stored in databases to be used and shared by others. There a wide range of databases that store annotation data related to genes, proteins and other biological entities. Among them Gene Ontology annotation database is the key database that connects all the other biological entities through a standard vocabulary together. In fact a Gene Ontology (GO) is a controlled vocabulary to annotate proteins based on their molecular function, biological process and cellular components. There are a number of public databases that provide data regarding GO and GO-protein relationships. We collected all relevant data from several public databases and built our specialized updatable GO database on the PostgreSQL platform. GO classification in a particular sample of genes (up/down regulated) or whole genome of a species can reveal the biological mechanisms related to its activity. Moreover, comparing the GO classification of a species under different biological conditions can elucidate its biological pathways, which can result in the discovery of novel genes to be used in therapies. We developed a web server using the PHP MVC framework connected to our specialized GO database. In this web server we developed novel visual and statistical methods to perform GO comparisons among multiple samples and genomes. We also included transcriptome based gene expression levels in GO analysis, resulting in novel meaningful biological reports. This also made comparison of whole genome gene expression across multiple biological conditions possible. Furthermore, we devised a method to dynamically construct and visualize GO regulatory networks for any gene set sample. Such a network can reveal regulatory relationships between genes helping to explain the correlated expression of genes. The topology of such a network classifies genes based on their connections, and can be used as a new method to detect important genes based on their function as well as their connectivity in the network. We demonstrated the efficiency of our developed methods in our web server by several case studies using previously published transcriptome data.Thesis (Ph.D.) -- University of Adelaide, School of Molecular and Biomedical Science, 201

    Snapshot of GSEA enrichment result related to biological process detected in AD.

    No full text
    <p>Snapshot of GSEA enrichment result related to biological process detected in AD.</p

    Snapshot of GSEA enrichment result related to molecular function detected in AD.

    No full text
    <p>Snapshot of GSEA enrichment result related to molecular function detected in AD.</p

    suffix versus tags for each suffix in <i>suffix</i> database table.

    No full text
    <p>suffix versus tags for each suffix in <i>suffix</i> database table.</p

    Statistics in all 15 samples for each parameter set.

    No full text
    <p>Statistics in all 15 samples for each parameter set.</p

    Words incorrectly tagged with all methods.

    No full text
    <p>Words incorrectly tagged with all methods.</p
    corecore