Search CORE

Institutional Knowledge at Singapore Management University

Teeside University's Research Repository

Wageningen University & Research Publications

Oxford University Research Archive

Sussex Research Online

ScholarBank@NUS

Dynamic summarization of bibliographic-based data

Author: A Haase
A Yamanashi
AR Aronson
C Fraser
C Sneiderman
CB Ahlers
DA Lindberg
DB Johnson
DL Sackett
E Riloff
E Riloff
H Kilicoglu
John F Hurdle
M Fiszman
M Fiszman
M Fiszman
MA Rochester
ML Chambliss
R Khoury
S Cole
S Golder
S Karimi
S Kullback
S Peri
T Bekhuis
T Elizabeth Workman
TC Rindflesch
TE Workman
U Hahn
WR Hersh
Y Lin
Y Niu
Publication venue: BioMed Central
Publication date: 01/02/2011
Field of study

Abstract Background Traditional information retrieval techniques typically return excessive output when directed at large bibliographic databases. Natural Language Processing applications strive to extract salient content from the excessive data. Semantic MEDLINE, a National Library of Medicine (NLM) natural language processing application, highlights relevant information in PubMed data. However, Semantic MEDLINE implements manually coded schemas, accommodating few information needs. Currently, there are only five such schemas, while many more would be needed to realistically accommodate all potential users. The aim of this project was to develop and evaluate a statistical algorithm that automatically identifies relevant bibliographic data; the new algorithm could be incorporated into a dynamic schema to accommodate various information needs in Semantic MEDLINE, and eliminate the need for multiple schemas. Methods We developed a flexible algorithm named Combo that combines three statistical metrics, the Kullback-Leibler Divergence (KLD), Riloff's RlogF metric (RlogF), and a new metric called PredScal, to automatically identify salient data in bibliographic text. We downloaded citations from a PubMed search query addressing the genetic etiology of bladder cancer. The citations were processed with SemRep, an NLM rule-based application that produces semantic predications. SemRep output was processed by Combo, in addition to the standard Semantic MEDLINE genetics schema and independently by the two individual KLD and RlogF metrics. We evaluated each summarization method using an existing reference standard within the task-based context of genetic database curation. Results Combo asserted 74 genetic entities implicated in bladder cancer development, whereas the traditional schema asserted 10 genetic entities; the KLD and RlogF metrics individually asserted 77 and 69 genetic entities, respectively. Combo achieved 61% recall and 81% precision, with an F-score of 0.69. The traditional schema achieved 23% recall and 100% precision, with an F-score of 0.37. The KLD metric achieved 61% recall, 70% precision, with an F-score of 0.65. The RlogF metric achieved 61% recall, 72% precision, with an F-score of 0.66. Conclusions Semantic MEDLINE summarization using the new Combo algorithm outperformed a conventional summarization schema in a genetic database curation task. It potentially could streamline information acquisition for other needs without having to hand-build multiple saliency schemas.</p

Clustering cliques for graph-based summarization of the biomedical research literature

Author: A Naud
A Nenkova
A Ozgür
A Pons-Porrata
AR Aronson
AT McCray
AT McCray
Bartlomiej Wilkowski
C Wartena
Dongwook Shin
F Lerch
G Erkan
G Liu
GC Stein
H Kilicoglu
H Kilicoglu
H Yu
H Zhang
Han Zhang
I Mani
I Yoo
J Ah-Pine
J Goodwin
J Yang
JB Kruskal
K Sparck Jones
KW Boyack
L Smith
LH Reeve
LH Reeve
M Bundschus
M Fiszman
M Fiszman
M Kan
M Lee
Marcelo Fiszman
MG Everett
MJ Norusis
O Bodenreider
P Langfelder
P Tan
PJ Rousseeuw
R Mihalcea
SP Borgatti
T Matsunage
TC Rindflesch
TC Rindflesch
Thomas C Rindflesch
V an der Spek P Klusener S
V Batagelj
VD Blondel
X Liu
X Zhang
Y Yamamoto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. CONCLUSIONS: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively

Online Research Database In Technology

Supporting systematic reviews using LDA-based document representations

Author: AM Cohen
AM Cohen
BC Wallace
BC Wallace
C Counsell
CC Chang
D Demner-Fushman
DM Blei
E Linstead
F Boudin
FR Octaviano
G Maskeri
J García Adeva
K Frantzi
K Henderson
K Romero Felizardo
L Hunter
M Barza
M Fiszman
M Miwa
MW Berry
O Frunza
R Akbani
RA Redner
S Ananiadou
S Arora
S Jonnalagadda
S Kotsiantis
S Matwin
SK Lukins
T Bekhuis
T Bekhuis
T Bekhuis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). METHODS: We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation. RESULTS: Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain. CONCLUSIONS: A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13643-015-0117-0) contains supplementary material, which is available to authorized users

Edge Hill University Research Information Repository

The University of Manchester - Institutional Repository

Constructing a semantic predication gold standard from the biomedical literature

Author: A Jimeno
A Névéol
A Roberts
AR Aronson
AT McCray
B Rosario
C Bizer
C Friedman
C Nédellec
CB Ahlers
D Hristovski
D Maglott
D Rebholz-Schuhmann
G Hripcsak
Graciela Rosemblat
H Kilicoglu
Halil Kilicoglu
J Björne
J Cohen
JD Kim
JD Kim
JD Kim
JD Kim
JP Pestian
L Tanabe
LH Smith
M Bada
M Fiszman
Marcelo Fiszman
O Bodenreider
P Thompson
R Bunescu
S Pyysalo
T Cohen
T Wattarujeekrit
TC Rindflesch
TC Rindflesch
Thomas C Rindflesch
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology. Results We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations. Conclusions While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.</p

Molecular Gastronomy in Spain

Author: A. Vercet
Barata M. J.
Bardají T.
Fundación Alícia
García del Moral R.
García P.
J. C. Arboleya
J. Martínez-Monzo
J. Ruiz
M. D. Garrido
Mans C.
Martínez-Monzó J.
O’Valle P.
P. García-Segovia
Pujol D.
Pérez-Conesa J.
Roca J.
Ruiz J.
Ruiz J.
Ruiz J.
Ruiz J.
S. Fiszman
S. Laguarda
This H.
V. Palacios
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2014
Field of study

[EN] Beyond the overwhelming international success of Ferrán Adria, Spain has been one of the countries with a more active implication in molecular gastronomy as a scientific discipline but also in the use of ingredients, technologies, and equipment from the scientificand technological universe in the culinary area. Nowadays, this is a well-established discipline in Spain, with a number of research groups covering related topics, several companies commercializing appliances and additives worldwide, and renowned international chefs and many restaurants and companies committed to the collaboration with scientists for facing the future of Spanish gastronomyThe authors would like to thank the Ministerio de Ciencia e Innovación (Spain) for funding the Collaborative Network “INDAGA” (AGL2007-28589- E/ALI; AGL2009-05765-E), which enabled their collaboration.García Segovia, P.; Garrido, MD.; Vercet Tormo, A.; Arboleya, JC.; Fiszman Dal Santo, S.; Martínez Monzó, J.; Laguarda, S.... (2014). Molecular Gastronomy in Spain. Journal of Culinary Science and Technology. 12(4):279-293. https://doi.org/10.1080/15428052.2014.914813S27929312

Copenhagen University Research Information System

RiuNet

Subtle changes in the flavour and texture of a drink enhance expectations of satiety

Author: A Drewnowski
A Tournier
AJ Crum
AJ Stunkard
AL Koliandris
B Piqueras-Fiszman
BA Cassady
C de Graaf
CK Martin
DA Booth
DJ Shide
DM Mourao
E Almiron-Roig
E Almiron-Roig
F Shama
HJ Leidy
J Blundell
JE Cecil
JE Cecil
JM Brunstrom
JM Brunstrom
JM Brunstrom
JM Brunstrom
KJ Rudenga
L Marciani
LL Birch
M Mars
MC Bourne
MF Picciano
MR Yeomans
MR Yeomans
MR Yeomans
N Zijlstra
N Zijlstra
OW Wooley
P Sherman
PS Hogenkamp
R Mattes
RA de Wijk
RD Mattes
RD Mattes
RD Mattes
RD Mattes
RD Mattes
S Ibrugger
SA Giduck
SC Woods
SE Shaffer
SE Swithers
SE Swithers
SV Kirkmeyer
SW Ng
T Hulshof
TL Davidson
Y Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: The consumption of liquid calories has been implicated in the development of obesity and weight gain. Energy-containing drinks are often reported to have a weak satiety value: one explanation for this is that because of their fluid texture they are not expected to have much nutritional value. It is important to consider what features of these drinks can be manipulated to enhance their expected satiety value. Two studies investigated the perception of subtle changes in a drink’s viscosity, and the extent to which thick texture and creamy flavour contribute to the generation of satiety expectations. Participants in the first study rated the sensory characteristics of 16 fruit yogurt drinks of increasing viscosity. In study two, a new set of participants evaluated eight versions of the fruit yogurt drink, which varied in thick texture, creamy flavour and energy content, for sensory and hedonic characteristics and satiety expectations. Results: In study one, participants were able to perceive small changes in drink viscosity that were strongly related to the actual viscosity of the drinks. In study two, the thick versions of the drink were expected to be more filling and have a greater expected satiety value, independent of the drink’s actual energy content. A creamy flavour enhanced the extent to which the drink was expected to be filling, but did not affect its expected satiety. Conclusions: These results indicate that subtle manipulations of texture and creamy flavour can increase expectations that a fruit yogurt drink will be filling and suppress hunger, irrespective of the drink’s energy content. A thicker texture enhanced expectations of satiety to a greater extent than a creamier flavour, and may be one way to improve the anticipated satiating value of energy-containing beverages

Sussex Research Online

Explore Bristol Research

A critical review of PASBio's argument structures for biomedical verbs

Author: A Meyers
A Rzhetsky
AE Goldberg
AT McCray
B Smith
C Friedman
CF Baker
CF Baker
CJ Fillmore
CJ Fillmore
D Dowty
JD Kim
K Bretonnel Cohen
K Kipper-Schuler
Lawrence Hunter
M Fiszman
M Minsky
M Palmer
MF Porter
N Sager
N Sager
PD Stetson
PK Shah
R McDonald
RTH Tsai
S Kulick
T Wattarujeekrit
WC Chou
Y Kogan
Y Tateisi
Z Harris
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Propositional representations of biomedical knowledge are a critical component of most aspects of semantic mining in biomedicine. However, the proper set of propositions has yet to be determined. Recently, the PASBio project proposed a set of propositions and argument structures for biomedical verbs. This initial set of representations presents an opportunity for evaluating the suitability of predicate-argument structures as a scheme for representing verbal semantics in the biomedical domain. Here, we quantitatively evaluate several dimensions of the initial PASBio propositional structure repository. RESULTS: We propose a number of metrics and heuristics related to arity, role labelling, argument realization, and corpus coverage for evaluating large-scale predicate-argument structure proposals. We evaluate the metrics and heuristics by applying them to PASBio 1.0. CONCLUSION: PASBio demonstrates the suitability of predicate-argument structures for representing aspects of the semantics of biomedical verbs. Metrics related to theta-criterion violations and to the distribution of arguments are able to detect flaws in semantic representations, given a set of predicate-argument structures and a relatively small corpus annotated with them

Automation of a problem list using natural language processing

Author: AR Aronson
AR Aronson
AR Aronson
AT McCray
AT McCray
C Friedman
C Friedman
C Friedman
C Friedman
C Friedman
C Friedman
CA Knirsch
CA Sneiderman
CD Manning
D Zingmond
DL Ranum
E Bayegan
E Chi
G Hripcsak
G Hripcsak
G Paterson
G Shadow
GF Cooper
H Bludau
H Goldberg
H Goldberg
H Wasserman
H Xu
HJ Scherpbier
Institute of Medicine (U.S.)
International Organization for Standardization
J Nivre
J Starmer
J Zelingher
JC Reichert
JEF Friedl
JR Campbell
JR Campbell
JS Elkins
JW Hales
K Heitmann
K Thompson
L Christensen
LL Weed
LL Weed
LT Kohn
LW Wright
M Fiszman
M Fiszman
M Fiszman
M Weeber
ML Muller
MS Donaldson
MS Tuttle
N Sager
NL Jain
P Haug
P Nadkerni
P Spyns
Peter J Haug
PF Brennan
PG Mutalik
PJ Haug
PJ Haug
PJ Haug
PL Elkin
Q Zou
RH Dolin
S Meystre
SB Koehler
SC Kleene
SJ Wang
SM Huff
Stephane Meystre
T Payne
TC Rindflesch
TC Rindflesch
W Pratt
W Pratt
WW Chapman
Y Huang
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The medical problem list is an important part of the electronic medical record in development in our institution. To serve the functions it is designed for, the problem list has to be as accurate and timely as possible. However, the current problem list is usually incomplete and inaccurate, and is often totally unused. To alleviate this issue, we are building an environment where the problem list can be easily and effectively maintained. METHODS: For this project, 80 medical problems were selected for their frequency of use in our future clinical field of evaluation (cardiovascular). We have developed an Automated Problem List system composed of two main components: a background and a foreground application. The background application uses Natural Language Processing (NLP) to harvest potential problem list entries from the list of 80 targeted problems detected in the multiple free-text electronic documents available in our electronic medical record. These proposed medical problems drive the foreground application designed for management of the problem list. Within this application, the extracted problems are proposed to the physicians for addition to the official problem list. RESULTS: The set of 80 targeted medical problems selected for this project covered about 5% of all possible diagnoses coded in ICD-9-CM in our study population (cardiovascular adult inpatients), but about 64% of all instances of these coded diagnoses. The system contains algorithms to detect first document sections, then sentences within these sections, and finally potential problems within the sentences. The initial evaluation of the section and sentence detection algorithms demonstrated a sensitivity and positive predictive value of 100% when detecting sections, and a sensitivity of 89% and a positive predictive value of 94% when detecting sentences. CONCLUSION: The global aim of our project is to automate the process of creating and maintaining a problem list for hospitalized patients and thereby help to guarantee the timeliness, accuracy and completeness of this information

University of Salford Institutional Repository

Effects of meal variety on expected satiation : evidence for a 'perceived volume' heuristic

Author: Bellisle
Bobroff
Brondel
Brunstrom
Brunstrom
Brunstrom
Brunstrom
Brunstrom
Brunstrom
Brunstrom
Cacioppo
Danielle Ferriday
Fay
Festinger
Field
Greenhouse
Gregory S. Keenan
Hardman
Hinton
Irvine
Jeffrey M. Brunstrom
Kruglanski
Lattimore
McCrory
Meiselman
Morrison
Payne
Payne
Pew
Piqueras-Fiszman
Raynor
Remick
Rogers
Rolls
Rolls
Rolls
Russo
Scheibehenne
Schulte-Mecklenbeck
Tversky
Tversky
Van Strien
Vickers
Webster
Wilkinson
Wilkinson
Zakay
Publication venue: 'Elsevier BV'
Publication date: 01/06/2015
Field of study

Meal variety has been shown to increase energy intake in humans by an average of 29%. Historically, research exploring the mechanism underlying this effect has focused on physiological and psychological processes that terminate a meal (e.g., sensory-specific satiety). We sought to explore whether meal variety stimulates intake by influencing pre-meal planning. We know that individuals use prior experience with a food to estimate the extent to which it will deliver fullness. These ‘expected satiation’ judgments may be straightforward when only one meal component needs to be considered, but it remains unclear how prospective satiation is estimated when a meal comprises multiple items. We hypothesised that people simplify the task by using a heuristic, or ‘cognitive shortcut.’ Specifically, as within-meal variety increases, expected satiation tends to be based on the perceived volume of food(s) rather than on prior experience. In each trial, participants (N = 68) were shown a plate of food with six buffet food items. Across trials the number of different foods varied in the range one to six. In separate tasks, the participants provided an estimate of their combined expected satiation and volume. When meal variety was high, judgments of perceived volume and expected satiation ‘converged.’ This is consistent with a common underlying response strategy. By contrast, the low variety meals produced dissociable responses, suggesting that judgments of expected satiation were not governed solely by perceived volume. This evidence for a ‘volume heuristic’ was especially clear in people who were less familiar with the meal items. Together, these results are important because they expose a novel process by which meal variety might increase food intake in humans

Elsevier - Publisher Connector