Search CORE

19,348 research outputs found

Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy

Author: Bekhuis T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/04/2006
Field of study

Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians. © 2006Bekhuis; licensee BioMed Central Ltd

Springer - Publisher Connector

PubMed Central

D-Scholarship@Pitt

Recommended from our members

Mining the Web for Medical Hypothesis: A Proof-of-Concept System

Author: Maclean Diana
Seltzer Margo I.
Publication venue
Publication date: 14/05/2012
Field of study

As the prevalence of blogs, discussion forums, and online news services continues to grow, so too does the portion of this Web content that relates to health and medicine. We propose that everyday, medically-oriented Web content is a valuable and viable data source for medical hypothesis generation and testing, despite its being noisy. In this paper, we present a proof-of-concept system supporting this notion. We construct a corpus comprising news articles relating to the drugs Vioxx, Naproxen and Ibuprofen, that were published between 1998-2002. Using this corpus, we show that there was a signiﬁcant link between Vioxx and the concept “Myocardial Infarction” well before the drug was withdrawn from the market in 2004. Indeed, within the Vioxx-related content, the concept ranks amongst the top 3.3% in terms of importance. When compared with the Naproxen and Ibuprofen control literatures, the term occurs signiﬁcantly more frequently in the Vioxx-related content.Engineering and Applied Science

Harvard University - DASH

A Linked Data Approach to Sharing Workflows and Workflow Results

Author: Bechhofer S
Margaria T
Marshall MS
Missier P
Newman DR
Roos M
Roure DD
Steffen B
Zhao J
Publication venue
Publication date: 01/01/2010
Field of study

A bioinformatics analysis pipeline is often highly elaborate, due to the inherent complexity of biological systems and the variety and size of datasets. A digital equivalent of the ‘Materials and Methods’ section in wet laboratory publications would be highly beneficial to bioinformatics, for evaluating evidence and examining data across related experiments, while introducing the potential to find associated resources and integrate them as data and services. We present initial steps towards preserving bioinformatics ‘materials and methods’ by exploiting the workflow paradigm for capturing the design of a data analysis pipeline, and RDF to link the workflow, its component services, run-time provenance, and a personalized biological interpretation of the results. An example shows the reproduction of the unique graph of an analysis procedure, its results, provenance, and personal interpretation of a text mining experiment. It links data from Taverna, myExperiment.org, BioCatalogue.org, and ConceptWiki.org. The approach is relatively ‘light-weight’ and unobtrusive to bioinformatics users

Southampton (e-Prints Soton)

Crossref

University of Birmingham Research Portal

Oxford University Research Archive

The University of Manchester - Institutional Repository

Deer Herd Management Using the Internet: A Comparative Study of California Targeted By Data Mining the Internet

Author: Webb G. Kent
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2013
Field of study

An ongoing project to investigate the use of the internet as an information source for decision support identified the decline of the California deer population as a significant issue. Using Google Alerts, an automated keyword search tool, text and numerical data were collected from a daily internet search and categorized by region and topic to allow for identification of information trends. This simple data mining approach determined that California is one of only four states that do not currently report total, finalized deer harvest (kill) data online and that it is the only state that has reduced the amount of information made available over the internet in recent years. Contradictory information identified by the internet data mining prompted the analysis described in this paper indicating that the graphical information presented on the California Fish and Wildlife website significantly understates the severity of the deer population decline over the past 50 years. This paper presents a survey of how states use the internet in their deer management programs and an estimate of the California deer population over the last 100 years. It demonstrates how any organization can use the internet for data collection and discovery

CiteSeerX

Directory of Open Access Journals

SJSU ScholarWorks

HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

Author: Berners-Lee T.
Brumby D. P.
Bush V.
Celma O.
Gore S.
Oakley J.
Sinnott R. W.
West R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendations. In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our approach utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to leverage the sensitivity of Bayes factors on the prior for comparing hypotheses with each other. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including website navigation, business reviews and online music played. Our work expands the repertoire of methods available for studying human trails on the Web.Comment: Published in the proceedings of WWW'1

arXiv.org e-Print Archive

Crossref

MAnnheim DOCument Server

Publikationsserver der RWTH Aachen University