18 research outputs found

    GO-WORDS: An Entropic Approach to Semantic Decomposition of Gene Ontology Terms

    Get PDF
    The Gene Ontology (GO) has a large and growing number of terms that constitute its vocabulary. An entropy-based approach is presented to automate the characterization of the compositional semantics of GO terms. The motivation is to extend the machine-readability of GO and to offer insights for the continued maintenance and growth of GO. A proto-type implementation illustrates the benefits of the approach

    Finding disease similarity based on implicit semantic similarity

    Get PDF
    AbstractGenomics has contributed to a growing collection of gene–function and gene–disease annotations that can be exploited by informatics to study similarity between diseases. This can yield insight into disease etiology, reveal common pathophysiology and/or suggest treatment that can be appropriated from one disease to another. Estimating disease similarity solely on the basis of shared genes can be misleading as variable combinations of genes may be associated with similar diseases, especially for complex diseases. This deficiency can be potentially overcome by looking for common biological processes rather than only explicit gene matches between diseases. The use of semantic similarity between biological processes to estimate disease similarity could enhance the identification and characterization of disease similarity. We present functions to measure similarity between terms in an ontology, and between entities annotated with terms drawn from the ontology, based on both co-occurrence and information content. The similarity measure is shown to outperform other measures used to detect similarity. A manually curated dataset with known disease similarities was used as a benchmark to compare the estimation of disease similarity based on gene-based and Gene Ontology (GO) process-based comparisons. The detection of disease similarity based on semantic similarity between GO Processes (Recall=55%, Precision=60%) performed better than using exact matches between GO Processes (Recall=29%, Precision=58%) or gene overlap (Recall=88% and Precision=16%). The GO-Process based disease similarity scores on an external test set show statistically significant Pearson correlation (0.73) with numeric scores provided by medical residents. GO-Processes associated with similar diseases were found to be significantly regulated in gene expression microarray datasets of related diseases

    An Intelligent Online System for Enhanced Recruitment of Patients for Clinical Research

    Get PDF
    Computational Infrastructure and Informatics Poster SessionThe recruitment and retention of subjects for clinical research has been identified as one of the bottlenecks in the development of new drugs and treatments by the healthcare industry. The Kansas City Area Life Sciences Institute has been instrumental in bringing together the Midwest Psychiatric Research Group and researchers from the School of Computing and Engineering at the University of Missouri-Kansas City to address this important problem. The resulting academic-corporate partnership has been funded by a 2-year Small Business Innovation Research Grant of $518,298 awarded by the National Institute of Mental Health at the National Institutes of Health. The project is based on developing and employing a novel internet-based system to enhance the voluntary enrollment of research subjects for studies conducted by Clinical Research Organizations. This will proactively engage patients and their caregivers who desire to be informed about clinical trials that might be relevant for their specific diagnoses, disease states and other characteristics. An important goal of the project is to facilitate accurate matches between the requirements of a clinical research study and the profile of research volunteers. To achieve this, state of the art knowledge representation and search techniques are being employed. Phase I of the project is focused on the development of a system for recruitment for clinical research trials on “Generalized Anxiety Disorder,” with eventual expansion to the inclusion of volunteers for studies on other mental health disorders

    Tandem machine learning for the identification of genes regulated by transcription factors

    Get PDF
    BACKGROUND: The identification of promoter regions that are regulated by a given transcription factor has traditionally relied upon the identification and distributions of binding sites recognized by the factor. In this study, we have developed a tandem machine learning approach for the identification of regulatory target genes based on these parameters and on the corresponding binding site information contents that measure the affinities of the factor for these cognate elements. RESULTS: This method has been validated using models of DNA binding sites recognized by the xenobiotic-sensitive nuclear receptor, PXR/RXRα, for target genes within the human genome. An information theory-based weight matrix was first derived and refined from known PXR/RXRα binding sites. The promoter region of candidate genes was scanned with the weight matrix. A novel information density-based clustering algorithm was then used to identify clusters of information rich sites. Finally, transformed data representing metrics of location, strength and clustering of binding sites were used for classification of promoter regions using an ensemble approach involving neural networks, decision trees and Naïve Bayesian classification. The method was evaluated on a set of 24 known target genes and 288 genes known not to be regulated by PXR/RXRα. We report an average accuracy (proportion of correctly classified promoter regions) of 71%, sensitivity of 73%, and specificity of 70%, based on multiple cross-validation and the leave-one-out strategy. The performance on a test set of 13 genes showed that 10 were correctly classified. CONCLUSION: We have developed a machine learning approach for the successful detection of gene targets for transcription factors with high accuracy. The method has been validated for the transcription factor PXR/RXRα and has the potential to be extended to other transcription factors

    An informatics search for the low-molecular weight chromium-binding peptide

    Get PDF
    BACKGROUND: The amino acid composition of a low molecular weight chromium binding peptide (LMWCr), isolated from bovine liver, is reportedly E:G:C:D::4:2:2:2, though its sequence has not been discovered. There is some controversy surrounding the exact biochemical forms and the action of Cr(III) in biological systems; the topic has been the subject of many experimental reports and continues to be investigated. Clarification of Cr-protein interactions will further understanding Cr(III) biochemistry and provide a basis for novel therapies based on metallocomplexes or small molecules. RESULTS: A genomic search of the non-redundant database for all possible decapeptides of the reported composition yields three exact matches, EDGEECDCGE, DGEECDCGEE and CEGGCEEDDE. The first two sequences are found in ADAM 19 (A Disintegrin and Metalloproteinase domain 19) proteins in man and mouse; the last is found in a protein kinase in rice (Oryza sativa). A broader search for pentameric sequences (and assuming a disulfide dimer) corresponding to the stoichiometric ratio E:D:G:C::2:1:1:1, within the set of human proteins and the set of proteins in, or related to, the insulin signaling pathway, yields a match at an acidic region in the α-subunit of the insulin receptor (-EECGD-, residues 175–184). A synthetic peptide derived from this sequence binds chromium(III) and forms a metal-peptide complex that has properties matching those reported for isolated LMWCr and Cr(III)-containing peptide fractions. CONCLUSION: The search for an acidic decameric sequence indicates that LMWCr may not be a contiguous sequence. The identification of a distinct pentameric sequence in a significant insulin-signaling pathway protein suggests a possible identity for the LMWCr peptide. This identification clarifies directions for further investigation of LMWCr peptide fractions, chromium bio-coordination chemistry and a possible role in the insulin signaling pathway. Implications for models of chromium action in the insulin-signaling pathway are discussed

    Exposure to NO2, CO, and PM2.5 is linked to regional DNA methylation differences in asthma.

    Get PDF
    Background:DNA methylation of CpG sites on genetic loci has been linked to increased risk of asthma in children exposed to elevated ambient air pollutants (AAPs). Further identification of specific CpG sites and the pollutants that are associated with methylation of these CpG sites in immune cells could impact our understanding of asthma pathophysiology. In this study, we sought to identify some CpG sites in specific genes that could be associated with asthma regulation (Foxp3 and IL10) and to identify the different AAPs for which exposure prior to the blood draw is linked to methylation levels at these sites. We recruited subjects from Fresno, California, an area known for high levels of AAPs. Blood samples and responses to questionnaires were obtained (n = 188), and in a subset of subjects (n = 33), repeat samples were collected 2 years later. Average measures of AAPs were obtained for 1, 15, 30, 90, 180, and 365 days prior to each blood draw to estimate the short-term vs. long-term effects of the AAP exposures. Results:Asthma was significantly associated with higher differentially methylated regions (DMRs) of the Foxp3 promoter region (p = 0.030) and the IL10 intronic region (p = 0.026). Additionally, at the 90-day time period (90 days prior to the blood draw), Foxp3 methylation was positively associated with NO2, CO, and PM2.5 exposures (p = 0.001, p = 0.001, and p = 0.012, respectively). In the subset of subjects retested 2 years later (n = 33), a positive association between AAP exposure and methylation was sustained. There was also a negative correlation between the average Foxp3 methylation of the promoter region and activated Treg levels (p = 0.039) and a positive correlation between the average IL10 methylation of region 3 of intron 4 and IL10 cytokine expression (p = 0.030). Conclusions:Short-term and long-term exposures to high levels of CO, NO2, and PM2.5 were associated with alterations in differentially methylated regions of Foxp3. IL10 methylation showed a similar trend. For any given individual, these changes tend to be sustained over time. In addition, asthma was associated with higher differentially methylated regions of Foxp3 and IL10
    corecore