1 research outputs found

    Improving Feature Representation of Natural Language Gene Functional Annotations Using Automatic Term Expansion

    No full text
    Abstract—Despite increasing work for describing gene functions using controlled vocabulary, natural language style gene functional annotations are most easily available and are most widely used by biologists. And intelligent analysis of these data in large scale is of great importance in the post-genome era. While the vector space model (VSM) based TF*IDF feature representation is widely adopted for text document analysis, it has significant limitations when applied to these data, primarily due to the high conciseness and high noisiness of the functional annotations. To improve TF*IDF feature representation, this paper proposes two automatic term expansion (ATE) methods based on query expansion (QE) in information retrieval (IR) theory. The effectiveness of ATE was examined through its application to the measurement of pattern proximity of gene functional annotations. Our comparative results show that ATE is effective in retrieving functionally correlated genes corresponding to a random query gene on this particular data type, and has the capability to produce more accurate measurement of the pattern similarity, with reference to genes’ biological functions. I
    corecore