352 research outputs found
What's unusual in online disease outbreak news?
Background: Accurate and timely detection of public health events of
international concern is necessary to help support risk assessment and response
and save lives. Novel event-based methods that use the World Wide Web as a
signal source offer potential to extend health surveillance into areas where
traditional indicator networks are lacking. In this paper we address the issue
of systematically evaluating online health news to support automatic alerting
using daily disease-country counts text mined from real world data using
BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration
detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance
against expert moderated ProMED-mail postings. Results: We report sensitivity,
specificity, positive predictive value (PPV), negative predictive value (NPV),
mean alerts/100 days and F1, at 95% confidence interval (CI) for 287
ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period.
Results indicate that W2 had the best F1 with a slight benefit for day of week
effect over C2. In drill down analysis we indicate issues arising from the
granular choice of country-level modeling, sudden drops in reporting due to day
of week effects and reporting bias. Automatic alerting has been implemented in
BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news
alerts have the potential to enhance manual analytical methods by increasing
throughput, timeliness and detection rates. Systematic evaluation of health
news aberrations is necessary to push forward our understanding of the complex
relationship between news report volumes and case numbers and to select the
best performing features and algorithms
Storage of Natural Language Sentences in a Hopfield Network
This paper look at how the Hopfield neural network can be used to store and
recall patterns constructed from natural language sentences. As a pattern
recognition and storage tool, the Hopfield neural network has received much
attention. This attention however has been mainly in the field of statistical
physics due to the model's simple abstraction of spin glass systems. A
discussion is made of the differences, shown as bias and correlation, between
natural language sentence patterns and the randomly generated ones used in
previous experiments. Results are given for numerical simulations which show
the auto-associative competence of the network when trained with natural
language patterns.Comment: latex, 10 pages with 2 tex figures and a .bib file, uses nemlap.sty,
to appear in Proceedings of NeMLaP-
Towards cross-lingual alerting for bursty epidemic events
Background: Online news reports are increasingly becoming a source for event
based early warning systems that detect natural disasters. Harnessing the
massive volume of information available from multilingual newswire presents as
many challenges as opportunities due to the patterns of reporting complex
spatiotemporal events. Results: In this article we study the problem of
utilising correlated event reports across languages. We track the evolution of
16 disease outbreaks using 5 temporal aberration detection algorithms on
text-mined events classified according to disease and outbreak country. Using
ProMED reports as a silver standard, comparative analysis of news data for 13
languages over a 129 day trial period showed improved sensitivity, F1 and
timeliness across most models using cross-lingual events. We report a detailed
case study analysis for Cholera in Angola 2010 which highlights the challenges
faced in correlating news events with the silver standard. Conclusions: The
results show that automated health surveillance using multilingual text mining
has the potential to turn low value news into high value alerts if informed
choices are used to govern the selection of models and data sources. An
implementation of the C2 alerting algorithm using multilingual news is
available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup
Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses
Systems that exploit publicly available user generated content such as
Twitter messages have been successful in tracking seasonal influenza. We
developed a novel filtering method for Influenza-Like-Illnesses (ILI)-related
messages using 587 million messages from Twitter micro-blogs. We first filtered
messages based on syndrome keywords from the BioCaster Ontology, an extant
knowledge model of laymen's terms. We then filtered the messages according to
semantic features such as negation, hashtags, emoticons, humor and geography.
The data covered 36 weeks for the US 2009 influenza season from 30th August
2009 to 8th May 2010. Results showed that our system achieved the highest
Pearson correlation coefficient of 98.46% (p-value<2.2e-16), an improvement of
3.98% over the previous state-of-the-art method. The results indicate that
simple NLP-based enhancements to existing approaches to mine Twitter data can
increase the value of this inexpensive resource.Comment: 10 pages, 5 figures, IEEE HISB 2012 conference, Sept 27-28, 2012, La
Jolla, California, U
Synonym set extraction from the biomedical literature by lexical pattern discovery
<p>Abstract</p> <p>Background</p> <p>Although there are a large number of thesauri for the biomedical domain many of them lack coverage in terms and their variant forms. Automatic thesaurus construction based on patterns was first suggested by Hearst <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, but it is still not clear how to automatically construct such patterns for different semantic relations and domains. In particular it is not certain which patterns are useful for capturing synonymy. The assumption of extant resources such as parsers is also a limiting factor for many languages, so it is desirable to find patterns that do not use syntactical analysis. Finally to give a more consistent and applicable result it is desirable to use these patterns to form synonym sets in a sound way.</p> <p>Results</p> <p>We present a method that automatically generates regular expression patterns by expanding seed patterns in a heuristic search and then develops a feature vector based on the occurrence of term pairs in each developed pattern. This allows for a binary classifications of term pairs as synonymous or non-synonymous. We then model this result as a probability graph to find synonym sets, which is equivalent to the well-studied problem of finding an optimal set cover. We achieved 73.2% precision and 29.7% recall by our method, out-performing hand-made resources such as MeSH and Wikipedia.</p> <p>Conclusion</p> <p>We conclude that automatic methods can play a practical role in developing new thesauri or expanding on existing ones, and this can be done with only a small amount of training data and no need for resources such as parsers. We also concluded that the accuracy can be improved by grouping into synonym sets.</p
Analysis of syntactic and semantic features for fine-grained event-spatial understanding in outbreak news reports
<p>Abstract</p> <p>Background</p> <p>Previous studies have suggested that epidemiological reasoning needs a fine-grained modelling of events, especially their spatial and temporal attributes. While the temporal analysis of events has been intensively studied, far less attention has been paid to their spatial analysis. This article aims at filling the gap concerning automatic event-spatial attribute analysis in order to support health surveillance and epidemiological reasoning.</p> <p>Results</p> <p>In this work, we propose a methodology that provides a detailed analysis on each event reported in news articles to recover the most specific locations where it occurs. Various features for recognizing spatial attributes of the events were studied and incorporated into the models which were trained by several machine learning techniques. The best performance for spatial attribute recognition is very promising; 85.9% F-score (86.75% precision/85.1% recall).</p> <p>Conclusions</p> <p>We extended our work on event-spatial attribute recognition by focusing on machine learning techniques, which are CRF, SVM, and Decision tree. Our approach avoided the costly development of an external knowledge base by employing the feature sources that can be acquired locally from the analyzed document. The results showed that the CRF model performed the best. Our study indicated that the nearest location and previous event location are the most important features for the CRF and SVM model, while the location extracted from the verb's subject is the most important to the Decision tree model.</p
Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation
Automatically recognising medical con- cepts mentioned in social media messages (e.g. tweets) enables several applications for enhancing health quality of people in a community, e.g. real-time monitoring of infectious diseases in population. How- ever, the discrepancy between the type of language used in social media and med- ical ontologies poses a major challenge. Existing studies deal with this challenge by employing techniques, such as lexi- cal term matching and statistical machine translation. In this work, we handle the medical concept normalisation at the se- mantic level. We investigate the use of neural networks to learn the transition be- tween laymanâs language used in social media messages and formal medical lan- guage used in the descriptions of medi- cal concepts in a standard ontology. We evaluate our approaches using three differ- ent datasets, where social media texts are extracted from Twitter messages and blog posts. Our experimental results show that our proposed approaches significantly and consistently outperform existing effective baselines, which achieved state-of-the-art performance on several medical concept normalisation tasks, by up to 44%
Contrastive Search Is What You Need For Neural Text Generation
Generating text with autoregressive language models (LMs) is of great
importance to many natural language processing (NLP) applications. Previous
solutions for this task often produce text that contains degenerative
expressions or lacks semantic consistency. Recently, Su et al. introduced a new
decoding method, contrastive search, based on the isotropic representation
space of the language model and obtained new state of the art on various
benchmarks. Additionally, Su et al. argued that the representations of
autoregressive LMs (e.g. GPT-2) are intrinsically anisotropic which is also
shared by previous studies. Therefore, to ensure the language model follows an
isotropic distribution, Su et al. proposed a contrastive learning scheme,
SimCTG, which calibrates the language model's representations through
additional training.
In this study, we first answer the question: "Are autoregressive LMs really
anisotropic?". To this end, we extensively evaluate the isotropy of LMs across
16 major languages. Surprisingly, we find that the anisotropic problem only
exists in the two specific English GPT-2-small/medium models. On the other
hand, all other evaluated LMs are naturally isotropic which is in contrast to
the conclusion drawn by previous studies. Based on our findings, we further
assess the contrastive search decoding method using off-the-shelf LMs on four
generation tasks across 16 languages. Our experimental results demonstrate that
contrastive search significantly outperforms previous decoding methods without
any additional training. More notably, on 12 out of the 16 evaluated languages,
contrastive search performs comparably with human-level performances as judged
by human evaluations. Our code and other related resources are publicly
available at https://github.com/yxuansu/Contrastive_Search_Is_What_You_Need.Comment: TMLR'2
- âŚ