Search CORE

5 research outputs found

Composition to Structure:Statistical Mechanics for Glass Modeling

Author: Bødker Mikkel Sandfeld
Publication venue: Aalborg University
Publication date: 01/01/2021
Field of study

VBN

シミュレーション　ニ　ヨル　コベツ　カンジャ　データ　二　モトヅク　メタ　アナリシス　ノ　ホウホウ

Author: Yamaguchi Yusuke
ヤマグチユウスケ
山口祐介
Publication venue
Publication date
Field of study

Osaka University Knowledge Archive

Application of improved automated text mining to transcriptome datasets

Author: Leong Hui Sun
Publication venue
Publication date: 01/01/2009
Field of study

A major challenge in microarray data analysis is the functional interpretation of gene lists. A common approach to address this is over-representation analysis (ORA), which uses the hypergeometric test (or its variants) to evaluate whether a particular functionally-defined group of genes is represented more than expected by chance within a gene list. Existing applications of ORA have been largely limited to controlled vocabularies such as Gene Ontology (GO) terms and KEGG pathways. Therefore, this work aims at determining whether ORA can be applied to a wider mining of free-text. Initial explorations using the classical hypergeometric distribution to analyse tokens from PubMed abstracts revealed a hitherto unexpected feature: gene lists derived from typical microarray experiment tend to have more annotation (PubMed abstracts) associated with them than would be expected by chance. This bias, a result of patterns of research activity within the biomedical community, is a major problem for the classical hypergeometric test-based ORA approach, as it cannot account for such bias. The negative effect of annotation bias is a marked over-representation of many common (and likely uninformative) terms, interspersed with terms that appear to convey real biological insight. Several solutions have been developed to address this issue. The first is based on the use of a permutation test, but this nonparametric approach is hampered by being computationally intensive. Two computationally tractable approaches were subsequently developed, which are based on the detection of outliers and the extended hypergeometric distribution. The performances of the proposed text-based ORA approaches were demonstrated on a wide range of published datasets covering different species. A comparison with existing tools that use GO terms suggests that mining PubMed abstracts can reveal additional biological insight that may not be possible by mining pre-defined ontologies alone.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Application of improved automated text mining to transcriptome datasets

Author: Leong Hui Sun
Publication venue
Publication date
Field of study

Online Research @ Cardiff

Sampling Methods for Wallenius' and Fisher's Noncentral Hypergeometric Distributions

Author: Fog Agner
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2008
Field of study

Online Research Database In Technology