25 research outputs found

    Analysis of syntactic and semantic features for fine-grained event-spatial understanding in outbreak news reports

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Previous studies have suggested that epidemiological reasoning needs a fine-grained modelling of events, especially their spatial and temporal attributes. While the temporal analysis of events has been intensively studied, far less attention has been paid to their spatial analysis. This article aims at filling the gap concerning automatic event-spatial attribute analysis in order to support health surveillance and epidemiological reasoning.</p> <p>Results</p> <p>In this work, we propose a methodology that provides a detailed analysis on each event reported in news articles to recover the most specific locations where it occurs. Various features for recognizing spatial attributes of the events were studied and incorporated into the models which were trained by several machine learning techniques. The best performance for spatial attribute recognition is very promising; 85.9% F-score (86.75% precision/85.1% recall).</p> <p>Conclusions</p> <p>We extended our work on event-spatial attribute recognition by focusing on machine learning techniques, which are CRF, SVM, and Decision tree. Our approach avoided the costly development of an external knowledge base by employing the feature sources that can be acquired locally from the analyzed document. The results showed that the CRF model performed the best. Our study indicated that the nearest location and previous event location are the most important features for the CRF and SVM model, while the location extracted from the verb's subject is the most important to the Decision tree model.</p

    Handbook for ICT Projects for Rural Areas

    Get PDF
    This handbook identifies guidelines and fundamental requirements that can be of use to project managers and teams who are keen on initiating ICT projects in rural areas. Contents are based on the experiences by the authors when rolling out ICT projects in remote areas within Asia Pacific. The handbook is an accumulation of ideas and experiences from SHARE projects, an initiative driven by Telecommunication Technology Committee Japan (TTC) Japan, in which four countries, namely Malaysia, Indonesia, Thailand and the Philippines, have rolled out various technology-based projects in remote and rural locations. The book describes a narrative of guidelines, which are organised according to phases of development for a technology-enabled solution. The writing of the handbook takes into account the unique considerations for accommodating to local needs and competencies in remote and rural communities

    Structuring an event ontology for disease outbreak detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper describes the design of an event ontology being developed for application in the machine understanding of infectious disease-related events reported in natural language text. This event ontology is designed to support timely detection of disease outbreaks and rapid judgment of their alerting status by 1) bridging a gap between layman's language used in disease outbreak reports and public health experts' deep knowledge, and 2) making multi-lingual information available.</p> <p>Construction and content</p> <p>This event ontology integrates a model of experts' knowledge for disease surveillance, and at the same time sets of linguistic expressions which denote disease-related events, and formal definitions of events. In this ontology, rather general event classes, which are suitable for application to language-oriented tasks such as recognition of event expressions, are placed on the upper-level, and more specific events of the experts' interest are in the lower level. Each class is related to other classes which represent participants of events, and linked with multi-lingual synonym sets and axioms.</p> <p>Conclusions</p> <p>We consider that the design of the event ontology and the methodology introduced in this paper are applicable to other domains which require integration of natural language information and machine support for experts to assess them. The first version of the ontology, with about 40 concepts, will be available in March 2008.</p

    Decision Support System for the Response to Infectious Disease Emergencies Based on WebGIS and Mobile Services in China

    Get PDF
    Background: For years, emerging infectious diseases have appeared worldwide and threatened the health of people. The emergence and spread of an infectious-disease outbreak are usually unforeseen, and have the features of suddenness and uncertainty. Timely understanding of basic information in the field, and the collection and analysis of epidemiological information, is helpful in making rapid decisions and responding to an infectious-disease emergency. Therefore, it is necessary to have an unobstructed channel and convenient tool for the collection and analysis of epidemiologic information in the field. Methodology/Principal Findings: Baseline information for each county in mainland China was collected and a database was established by geo-coding information on a digital map of county boundaries throughout the country. Google Maps was used to display geographic information and to conduct calculations related to maps, and the 3G wireless network was used to transmit information collected in the field to the server. This study established a decision support system for the response to infectious-disease emergencies based on WebGIS and mobile services (DSSRIDE). The DSSRIDE provides functions including data collection, communication and analyses in real time, epidemiological detection, the provision of customized epidemiological questionnaires and guides for handling infectious disease emergencies, and the querying of professional knowledge in the field. These functions of the DSSRIDE could be helpful for epidemiological investigations in the field and the handling of infectious-disease emergencies. Conclusions/Significance: The DSSRIDE provides a geographic information platform based on the Google Maps application programming interface to display information of infectious disease emergencies, and transfers information between workers in the field and decision makers through wireless transmission based on personal computers, mobile phones and personal digital assistants. After a 2-year practice and application in infectious disease emergencies, the DSSRIDE is becoming a useful platform and is a useful tool for investigations in the field carried out by response sections and individuals. The system is suitable for use in developing countries and low-income districts

    Document Zoning for Enhancing Spatial and Temporal Understanding in Web-based Health Surveillance Systems

    No full text
    Public concern over the spread of infectious diseases such as avian H5N1 influenza and swine flu (H1N1) influenza A has underscored the importance of health surveillancesystems for the speedy and precise detection of disease outbreaks. However, two key barriers faced by the current web-based health surveillance systems are their inability to(a) understand complex geo-temporal attributes of events and (b) to obtain the levels ofgeo-temporal recognition. In this thesis, I develop a novel framework as a means toovercome these limitations. This framework is called spatiotemporal zoning. The objective of the spatiotemporal zoning scheme is to enable language echnologysoftware to partition text into segments based on the spatiotemporal characteristics of itscontent. Each segment, which is called a text zone, contains a set of events that occurredat the same geographical location in the same time frame. The capability of associatingevents reported in each text segment with the most specific spatial and temporalinformation available in news reports enables simple techniques to be employed fordetecting specific outbreak locations. These techniques could be, for example, textclassification to detect text segments that indicate outbreak situations. At the same time,false alarms about past outbreaks can be avoided by taking the temporal informationabout the events into consideration. I created a representative corpus in order to demonstrate that spatiotemporal zoningcan be automatically and manually applied to unrestricted text. The corpus consisted of100 news articles from multiple news agencies reporting on various disease outbreaks in different parts of the world. To study the reliability of spatiotemporal zoning, an experiment was conducted in which three annotators were recruited to annotate the same set of documents according to the annotation guidelines and the agreement between these annotators was then analyzed. Several statistical measures, namely kappa, Krippendorff’s alpha (α), and the percentage agreement, were used for quantitatively measuring the agreement. The results showed that the level of agreement kappa was more than 0.9 on average for event type and temporal attribute annotations, and it was only a slight lower for annotating spatial attributes. The task of spatiotemporal zoning can be separated into 3 main steps. (1)Document pre-processing: This step provides the basic elements for zone attribute analysis and was done automatically using natural language processing software . (2) Zone attribute annotation: Each event-predicate is analyzed to recognize its class, spatial and temporal attributes. (3) Zone boundary generation: This step is done based on the attribute values of each event-predicate. For spatiotemporal zone annotation, the study of automatic zone attribute annotation was done for each group of zone attributes, i.e., event type recognition, temporal attributes recognition, and spatial attribute recognition. To automatically classify event expressions, i.e. zone type recognition, Conditional Random Fields (CRFs) was used to incorporate various sets of text features into a classifier. To recognize spatial information, several approaches, ranging from simple techniques such as the commonly used heuristic-based approach to the more sophisticated achine learning approach, were experimented. I also explored various feature sets and feature encoding strategies in order to determine the best ones for recognizing spatial attributes. For temporal attribute recognition, I took a rule-based approach to recognizing an event\u27s temporal information. However, one of the problems is that in many cases the same event is repeatedly mentioned whereas the time of its occurrence is stated only once. To improve the system\u27s ability to recognize the temporal information, I employed a simple heuristic that helps to identify linguistic expressions referring to the same events. The above studies that I undertook prove that spatiotemporal zoning is reliable. Moreover, the results from automatic zone attribute recognition show that this scheme can be done automatically with a reliable level of performance

    Thai Named Entity Extraction by incorporating Maximum Entropy Model with Simple Heuristic Information

    No full text
    The role of Named entity (NE) extraction is very important in many NLP tasks, such as information extraction, etc. In Thai, the problems of NE extraction are much more difficult due to the characteristics of Thai language, that are lack of orthographical information to signal NEs, and no boundary indicator between words. In this paper, we present Thai NE extraction system by using Maximum Entropy model, with heuristic information and dictionary. Our system is divided into three steps. The first step is to identify the boundary of candidate NE that composes of many words by using heuristic rules, dictionary and statistic of word cooccurrence. The second step is NE extraction by using Maximum Entropy model. The final step is to extract the undiscovered NE by matching the extracted NEs against the rest of document. On Thai political news test data, the evaluation of the system shows that the Fmeasures of person, location, and organization names are 90.44%, 82.16 % and 89.87 % respectively.

    A framework for enhancing spatial and temporal granularity in report-based health surveillance systems

    No full text
    Abstract Background Current public concern over the spread of infectious diseases has underscored the importance of health surveillance systems for the speedy detection of disease outbreaks. Several international report-based monitoring systems have been developed, including GPHIN, Argus, HealthMap, and BioCaster. A vital feature of these report-based systems is the geo-temporal encoding of outbreak-related textual data. Until now, automated systems have tended to use an ad-hoc strategy for processing geo-temporal information, normally involving the detection of locations that match pre-determined criteria, and the use of document publication dates as a proxy for disease event dates. Although these strategies appear to be effective enough for reporting events at the country and province levels, they may be less effective at discovering geo-temporal information at more detailed levels of granularity. In order to improve the capabilities of current Web-based health surveillance systems, we introduce the design for a novel scheme called spatiotemporal zoning. Method The proposed scheme classifies news articles into zones according to the spatiotemporal characteristics of their content. In order to study the reliability of the annotation scheme, we analyzed the inter-annotator agreements on a group of human annotators for over 1000 reported events. Qualitative and quantitative evaluation is made on the results including the kappa and percentage agreement. Results The reliability evaluation of our scheme yielded very promising inter-annotator agreement, more than a 0.9 kappa and a 0.9 percentage agreement for event type annotation and temporal attributes annotation, respectively, with a slight degradation for the spatial attribute. However, for events indicating an outbreak situation, the annotators usually had inter-annotator agreements with the lowest granularity location. Conclusions We developed and evaluated a novel spatiotemporal zoning annotation scheme. The results of the scheme evaluation indicate that our annotated corpus and the proposed annotation scheme are reliable and could be effectively used for developing an automatic system. Given the current advances in natural language processing techniques, including the availability of language resources and tools, we believe that a reliable automatic spatiotemporal zoning system can be achieved. In the next stage of this work, we plan to develop an automatic zoning system and evaluate its usability within an operational health surveillance system.</p
    corecore