14,640 research outputs found

    BioCaster: detecting public health rumors with a Web-based text mining system

    Get PDF
    Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between Layman's terms and formal-coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles

    i-JEN: Visual interactive Malaysia crime news retrieval system

    Get PDF
    Supporting crime news investigation involves a mechanism to help monitor the current and past status of criminal events. We believe this could be well facilitated by focusing on the user interfaces and the event crime model aspects. In this paper we discuss on a development of Visual Interactive Malaysia Crime News Retrieval System (i-JEN) and describe the approach, user studies and planned, the system architecture and future plan. Our main objectives are to construct crime-based event; investigate the use of crime-based event in improving the classification and clustering; develop an interactive crime news retrieval system; visualize crime news in an effective and interactive way; integrate them into a usable and robust system and evaluate the usability and system performance. The system will serve as a news monitoring system which aims to automatically organize, retrieve and present the crime news in such a way as to support an effective monitoring, searching, and browsing for the target users groups of general public, news analysts and policemen or crime investigators. The study will contribute to the better understanding of the crime data consumption in the Malaysian context as well as the developed system with the visualisation features to address crime data and the eventual goal of combating the crimes

    What's unusual in online disease outbreak news?

    Get PDF
    Background: Accurate and timely detection of public health events of international concern is necessary to help support risk assessment and response and save lives. Novel event-based methods that use the World Wide Web as a signal source offer potential to extend health surveillance into areas where traditional indicator networks are lacking. In this paper we address the issue of systematically evaluating online health news to support automatic alerting using daily disease-country counts text mined from real world data using BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance against expert moderated ProMED-mail postings. Results: We report sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), mean alerts/100 days and F1, at 95% confidence interval (CI) for 287 ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period. Results indicate that W2 had the best F1 with a slight benefit for day of week effect over C2. In drill down analysis we indicate issues arising from the granular choice of country-level modeling, sudden drops in reporting due to day of week effects and reporting bias. Automatic alerting has been implemented in BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news alerts have the potential to enhance manual analytical methods by increasing throughput, timeliness and detection rates. Systematic evaluation of health news aberrations is necessary to push forward our understanding of the complex relationship between news report volumes and case numbers and to select the best performing features and algorithms

    Towards cross-lingual alerting for bursty epidemic events

    Get PDF
    Background: Online news reports are increasingly becoming a source for event based early warning systems that detect natural disasters. Harnessing the massive volume of information available from multilingual newswire presents as many challenges as opportunities due to the patterns of reporting complex spatiotemporal events. Results: In this article we study the problem of utilising correlated event reports across languages. We track the evolution of 16 disease outbreaks using 5 temporal aberration detection algorithms on text-mined events classified according to disease and outbreak country. Using ProMED reports as a silver standard, comparative analysis of news data for 13 languages over a 129 day trial period showed improved sensitivity, F1 and timeliness across most models using cross-lingual events. We report a detailed case study analysis for Cholera in Angola 2010 which highlights the challenges faced in correlating news events with the silver standard. Conclusions: The results show that automated health surveillance using multilingual text mining has the potential to turn low value news into high value alerts if informed choices are used to govern the selection of models and data sources. An implementation of the C2 alerting algorithm using multilingual news is available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup

    High-level feature detection from video in TRECVid: a 5-year retrospective of achievements

    Get PDF
    Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video
    corecore