72 research outputs found

    Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered.</p> <p>Results</p> <p>We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results.</p> <p>Conclusions</p> <p>We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: <url>http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp</url></p

    Use the Spear as a Shield: A Novel Adversarial Example based Privacy-Preserving Technique against Membership Inference Attacks

    Full text link
    Recently, the membership inference attack poses a serious threat to the privacy of confidential training data of machine learning models. This paper proposes a novel adversarial example based privacy-preserving technique (AEPPT), which adds the crafted adversarial perturbations to the prediction of the target model to mislead the adversary's membership inference model. The added adversarial perturbations do not affect the accuracy of target model, but can prevent the adversary from inferring whether a specific data is in the training set of the target model. Since AEPPT only modifies the original output of the target model, the proposed method is general and does not require modifying or retraining the target model. Experimental results show that the proposed method can reduce the inference accuracy and precision of the membership inference model to 50%, which is close to a random guess. Further, for those adaptive attacks where the adversary knows the defense mechanism, the proposed AEPPT is also demonstrated to be effective. Compared with the state-of-the-art defense methods, the proposed defense can significantly degrade the accuracy and precision of membership inference attacks to 50% (i.e., the same as a random guess) while the performance and utility of the target model will not be affected

    BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature

    Get PDF
    With the rapid decrease in cost of genome sequencing, the classification of gene function is becoming a primary problem. Such classification has been performed by human curators who read biological literature to extract evidence. BeeSpace Navigator is a prototype software for exploratory analysis of gene function using biological literature. The software supports an automatic analogue of the curator process to extract functions, with a simple interface intended for all biologists. Since extraction is done on selected collections that are semantically indexed into conceptual spaces, the curation can be task specific. Biological literature containing references to gene lists from expression experiments can be analyzed to extract concepts that are computational equivalents of a classification such as Gene Ontology, yielding discriminating concepts that differentiate gene mentions from other mentions. The functions of individual genes can be summarized from sentences in biological literature, to produce results resembling a model organism database entry that is automatically computed. Statistical frequency analysis based on literature phrase extraction generates offline semantic indexes to support these gene function services. The website with BeeSpace Navigator is free and open to all; there is no login requirement at www.beespace.illinois.edu for version 4. Materials from the 2010 BeeSpace Software Training Workshop are available at www.beespace.illinois.edu/bstwmaterials.php

    Molecular Evolution and Stress and Phytohormone Responsiveness of SUT Genes in Gossypium hirsutum

    Get PDF
    Sucrose transporters (SUTs) play key roles in allocating the translocation of assimilates from source to sink tissues. Although the characteristics and biological roles of SUTs have been intensively investigated in higher plants, this gene family has not been functionally characterized in cotton. In this study, we performed a comprehensive analysis of SUT genes in the tetraploid cotton Gossypium hirsutum. A total of 18 G. hirsutum SUT genes were identified and classified into three groups based on their evolutionary relationships. Up to eight SUT genes in G. hirsutum were placed in the dicot-specific SUT1 group, while four and six SUT genes were, respectively, clustered into SUT4 and SUT2 groups together with members from both dicot and monocot species. The G. hirsutum SUT genes within the same group displayed similar exon/intron characteristics, and homologous genes in G. hirsutum At and Dt subgenomes, G. arboreum, and G. raimondii exhibited one-to-one relationships. Additionally, the duplicated genes in the diploid and polyploid cotton species have evolved through purifying selection, suggesting the strong conservation of SUT loci in these species. Expression analysis in different tissues indicated that SUT genes might play significant roles in cotton fiber elongation. Moreover, analyses of cis-acting regulatory elements in promoter regions and expression profiling under different abiotic stress and exogenous phytohormone treatments implied that SUT genes, especially GhSUT6A/D, might participate in plant responses to diverse abiotic stresses and phytohormones. Our findings provide valuable information for future studies on the evolution and function of SUT genes in cotton

    Assessing the quality of primary healthcare in seven Chinese provinces with unannounced standardised patients: protocol of a cross-sectional survey.

    Get PDF
    INTRODUCTION: Primary healthcare (PHC) serves as the cornerstone for the attainment of universal health coverage (UHC). Efforts to promote UHC should focus on the expansion of access and on healthcare quality. However, robust quality evidence has remained scarce in China. Common quality assessment methods such as chart abstraction, patient rating and clinical vignette use indirect information that may not represent real practice. This study will send standardised patients (SP or healthy person trained to consistently simulate the medical history, physical symptoms and emotional characteristics of a real patient) unannounced to PHC providers to collect quality information and represent real practice. METHODS AND ANALYSIS: 1981 SP-clinician visits will be made to a random sample of PHC providers across seven provinces in China. SP cases will be developed for 10 tracer conditions in PHC. Each case will include a standard script for the SP to use and a quality checklist that the SP will complete after the clinical visit to indicate diagnostic and treatment activities performed by the clinician. Patient-centredness will be assessed according to the Patient Perception of Patient-Centeredness Rating Scale by the SP. SP cases and the checklist will be developed through a standard protocol and assessed for content, face and criterion validity, and test-retest and inter-rater reliability before its full use. Various descriptive analyses will be performed for the survey results, such as a tabulation of quality scores across geographies and provider types. ETHICS AND DISSEMINATION: This study has been reviewed and approved by the Institutional Review Board of the School of Public Health of Sun Yat-sen University (#SYSU 2017-011). Results will be actively disseminated through print and social media, and SP tools will be made available for other researchers

    Finding Related Entities by Retrieving Relations: UIUC at TREC 2009 Entity Track

    Get PDF
    published or submitted for publicationnot peer reviewe

    Evaluation of methods for relative comparison of retrieval systems based on clickthroughs

    No full text
    The Cranfield evaluation method has some disadvantages, including its high cost in labor and inadequacy for evaluating interactive retrieval techniques. As a very promising alternative, automatic comparison of retrieval systems based on observed clicking behavior of users has recently been studied. Several methods have been proposed, but there has so far been no systematic way to assess which strategy is better, making it difficult to choose a good method for real applications. In this paper, we propose a general way to evaluate these relative comparison methods with two measures: utility to users(UtU) and effectiveness of differentiation(EoD). We evaluate two state of the art methods by systematically simulating different retrieval scenarios. Inspired by the weakness of these methods revealed through our evaluation, we further propose a novel method by considering the positions of clicked documents. Experiment results show that our new method performs better than the existing methods. Copyright 2009 ACM.EI
    • ā€¦
    corecore