28,490 research outputs found
Evaluation of linear classifiers on articles containing pharmacokinetic evidence of drug-drug interactions
Background. Drug-drug interaction (DDI) is a major cause of morbidity and
mortality. [...] Biomedical literature mining can aid DDI research by
extracting relevant DDI signals from either the published literature or large
clinical databases. However, though drug interaction is an ideal area for
translational research, the inclusion of literature mining methodologies in DDI
workflows is still very preliminary. One area that can benefit from literature
mining is the automatic identification of a large number of potential DDIs,
whose pharmacological mechanisms and clinical significance can then be studied
via in vitro pharmacology and in populo pharmaco-epidemiology. Experiments. We
implemented a set of classifiers for identifying published articles relevant to
experimental pharmacokinetic DDI evidence. These documents are important for
identifying causal mechanisms behind putative drug-drug interactions, an
important step in the extraction of large numbers of potential DDIs. We
evaluate performance of several linear classifiers on PubMed abstracts, under
different feature transformation and dimensionality reduction methods. In
addition, we investigate the performance benefits of including various
publicly-available named entity recognition features, as well as a set of
internally-developed pharmacokinetic dictionaries. Results. We found that
several classifiers performed well in distinguishing relevant and irrelevant
abstracts. We found that the combination of unigram and bigram textual features
gave better performance than unigram features alone, and also that
normalization transforms that adjusted for feature frequency and document
length improved classification. For some classifiers, such as linear
discriminant analysis (LDA), proper dimensionality reduction had a large impact
on performance. Finally, the inclusion of NER features and dictionaries was
found not to help classification.Comment: Pacific Symposium on Biocomputing, 201
Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy
Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians. © 2006Bekhuis; licensee BioMed Central Ltd
Temporal Topic Analysis with Endogenous and Exogenous Processes
We consider the problem of modeling temporal textual data taking endogenous
and exogenous processes into account. Such text documents arise in real world
applications, including job advertisements and economic news articles, which
are influenced by the fluctuations of the general economy. We propose a
hierarchical Bayesian topic model which imposes a "group-correlated"
hierarchical structure on the evolution of topics over time incorporating both
processes, and show that this model can be estimated from Markov chain Monte
Carlo sampling methods. We further demonstrate that this model captures the
intrinsic relationships between the topic distribution and the time-dependent
factors, and compare its performance with latent Dirichlet allocation (LDA) and
two other related models. The model is applied to two collections of documents
to illustrate its empirical performance: online job advertisements from
DirectEmployers Association and journalists' postings on BusinessInsider.com
- …