2,438 research outputs found

    Data-Centric Epidemic Forecasting: A Survey

    Full text link
    The COVID-19 pandemic has brought forth the importance of epidemic forecasting for decision makers in multiple domains, ranging from public health to the economy as a whole. While forecasting epidemic progression is frequently conceptualized as being analogous to weather forecasting, however it has some key differences and remains a non-trivial task. The spread of diseases is subject to multiple confounding factors spanning human behavior, pathogen dynamics, weather and environmental conditions. Research interest has been fueled by the increased availability of rich data sources capturing previously unobservable facets and also due to initiatives from government public health and funding agencies. This has resulted, in particular, in a spate of work on 'data-centered' solutions which have shown potential in enhancing our forecasting capabilities by leveraging non-traditional data sources as well as recent innovations in AI and machine learning. This survey delves into various data-driven methodological and practical advancements and introduces a conceptual framework to navigate through them. First, we enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting, capturing various factors like symptomatic online surveys, retail and commerce, mobility, genomics data and more. Next, we discuss methods and modeling paradigms focusing on the recent data-driven statistical and deep-learning based methods as well as on the novel class of hybrid models that combine domain knowledge of mechanistic models with the effectiveness and flexibility of statistical approaches. We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems including decision-making informed by forecasts. Finally, we highlight some challenges and open problems found across the forecasting pipeline.Comment: 67 pages, 12 figure

    Data mashups: potential contribution to decision support on climate change and health.

    Get PDF
    notes: PMCID: PMC3945564This is a freely-available open access publication. Please cite the published version which is available via the DOI link in this record.Linking environmental, socioeconomic and health datasets provides new insights into the potential associations between climate change and human health and wellbeing, and underpins the development of decision support tools that will promote resilience to climate change, and thus enable more effective adaptation. This paper outlines the challenges and opportunities presented by advances in data collection, storage, analysis, and access, particularly focusing on "data mashups". These data mashups are integrations of different types and sources of data, frequently using open application programming interfaces and data sources, to produce enriched results that were not necessarily the original reason for assembling the raw source data. As an illustration of this potential, this paper describes a recently funded initiative to create such a facility in the UK for use in decision support around climate change and health, and provides examples of suitable sources of data and the purposes to which they can be directed, particularly for policy makers and public health decision makers.UK Medical Research CouncilUK Natural Environment Research CouncilEuropean Regional Development Fund Programme 2007 to 2013European Social Fund Convergence Programme for Cornwall and the Isles of Scill

    Decision Support System for the Response to Infectious Disease Emergencies Based on WebGIS and Mobile Services in China

    Get PDF
    Background: For years, emerging infectious diseases have appeared worldwide and threatened the health of people. The emergence and spread of an infectious-disease outbreak are usually unforeseen, and have the features of suddenness and uncertainty. Timely understanding of basic information in the field, and the collection and analysis of epidemiological information, is helpful in making rapid decisions and responding to an infectious-disease emergency. Therefore, it is necessary to have an unobstructed channel and convenient tool for the collection and analysis of epidemiologic information in the field. Methodology/Principal Findings: Baseline information for each county in mainland China was collected and a database was established by geo-coding information on a digital map of county boundaries throughout the country. Google Maps was used to display geographic information and to conduct calculations related to maps, and the 3G wireless network was used to transmit information collected in the field to the server. This study established a decision support system for the response to infectious-disease emergencies based on WebGIS and mobile services (DSSRIDE). The DSSRIDE provides functions including data collection, communication and analyses in real time, epidemiological detection, the provision of customized epidemiological questionnaires and guides for handling infectious disease emergencies, and the querying of professional knowledge in the field. These functions of the DSSRIDE could be helpful for epidemiological investigations in the field and the handling of infectious-disease emergencies. Conclusions/Significance: The DSSRIDE provides a geographic information platform based on the Google Maps application programming interface to display information of infectious disease emergencies, and transfers information between workers in the field and decision makers through wireless transmission based on personal computers, mobile phones and personal digital assistants. After a 2-year practice and application in infectious disease emergencies, the DSSRIDE is becoming a useful platform and is a useful tool for investigations in the field carried out by response sections and individuals. The system is suitable for use in developing countries and low-income districts

    Development and Applications of Similarity Measures for Spatial-Temporal Event and Setting Sequences

    Get PDF
    Similarity or distance measures between data objects are applied frequently in many fields or domains such as geography, environmental science, biology, economics, computer science, linguistics, logic, business analytics, and statistics, among others. One area where similarity measures are particularly important is in the analysis of spatiotemporal event sequences and associated environs or settings. This dissertation focuses on developing a framework of modeling, representation, and new similarity measure construction for sequences of spatiotemporal events and corresponding settings, which can be applied to different event data types and used in different areas of data science. The first core part of this dissertation presents a matrix-based spatiotemporal event sequence representation that unifies punctual and interval-based representation of events. This framework supports different event data types and provides support for data mining and sequence classification and clustering. The similarity measure is based on the modified Jaccard index with temporal order constraints and accommodates different event data types. This approach is demonstrated through simulated data examples and the performance of the similarity measures is evaluated with a k-nearest neighbor algorithm (k-NN) classification test on synthetic datasets. These similarity measures are incorporated into a clustering method and successfully demonstrate the usefulness in a case study analysis of event sequences extracted from space time series of a water quality monitoring system. This dissertation further proposes a new similarity measure for event setting sequences, which involve the space and time in which events occur. While similarity measures for spatiotemporal event sequences have been studied, the settings and setting sequences have not yet been considered. While modeling event setting sequences, spatial and temporal scales are considered to define the bounds of the setting and incorporate dynamic variables along with static variables. Using a matrix-based representation and an extended Jaccard index, new similarity measures are developed to allow for the use of all variable data types. With these similarity measures coupled with other multivariate statistical analysis approaches, results from a case study involving setting sequences and pollution event sequences associated with the same monitoring stations, support the hypothesis that more similar spatial-temporal settings or setting sequences may generate more similar events or event sequences. To test the scalability of STES similarity measure in a larger dataset and an extended application in different fields, this dissertation compares and contrasts the prospective space-time scan statistic with the STES similarity approach for identifying COVID-19 hotspots. The COVID-19 pandemic has highlighted the importance of detecting hotspots or clusters of COVID-19 to provide decision makers at various levels with better information for managing distribution of human and technical resources as the outbreak in the USA continues to grow. The prospective space-time scan statistic has been used to help identify emerging disease clusters yet results from this approach can encounter strategic limitations imposed by the spatial constraints of the scanning window. The STES-based approach adapted for this pandemic context computes the similarity of evolving normalized COVID-19 daily cases by county and clusters these to identify counties with similarly evolving COVID-19 case histories. This dissertation analyzes the spread of COVID-19 within the continental US through four periods beginning from late January 2020 using the COVID-19 datasets maintained by John Hopkins University, Center for Systems Science and Engineering (CSSE). Results of the two approaches can complement with each other and taken together can aid in tracking the progression of the pandemic. Overall, the dissertation highlights the importance of developing similarity measures for analyzing spatiotemporal event sequences and associated settings, which can be applied to different event data types and used for data mining, sequence classification, and clustering

    The Design of Interactive Visualizations and Analytics for Public Health Data

    Get PDF
    Public health data plays a critical role in ensuring the health of the populace. Professionals use data as they engage in efforts to improve and protect the health of communities. For the public, data influences their ability to make health-related decisions. Health literacy, which is the ability of an individual to access, understand, and apply health data, is a key determinant of health. At present, people seeking to use public health data are confronted with a myriad of challenges some of which relate to the nature and structure of the data. Interactive visualizations are a category of computational tools that can support individuals as they seek to use public health data. With interactive visualizations, individuals can access underlying data, change how data is represented, manipulate various visual elements, and in certain tools control and perform analytic tasks. That being said, currently, in public health, simple visualizations, which fail to effectively support the exploration of large sets of data, are predominantly used. The goal of this dissertation is to demonstrate the benefit of sophisticated interactive visualizations and analytics. As improperly designed visualizations can negatively impact users’ discourse with data, there is a need for frameworks to help designers think systematically about design issues. Furthermore, there is a need to demonstrate how such frameworks can be utilized. This dissertation includes a process by which designers can create health visualizations. Using this process, five novel visualizations were designed to facilitate making sense of public health data. Three studies were conducted with the visualizations. The first study explores how computational models can be used to make sense of the discourse of health on a social media platform. The second study investigates the use of instructional materials to improve visualization literacy. Visualization literacy is important because even when visualizations are designed properly, there still exists a gap between how a tool works and users’ perceptions of how the tool should work. The last study examines the efficacy of visualizations to improve health literacy. Overall then, this dissertation provides designers with a deeper understanding of how to systematically design health visualizations

    Multi-Modal Embeddings for Isolating Cross-Platform Coordinated Information Campaigns on Social Media

    Full text link
    Coordinated multi-platform information operations are implemented in a variety of contexts on social media, including state-run disinformation campaigns, marketing strategies, and social activism. Characterized by the promotion of messages via multi-platform coordination, in which multiple user accounts, within a short time, post content advancing a shared informational agenda on multiple platforms, they contribute to an already confusing and manipulated information ecosystem. To make things worse, reliable datasets that contain ground truth information about such operations are virtually nonexistent. This paper presents a multi-modal approach that identifies the social media messages potentially engaged in a coordinated information campaign across multiple platforms. Our approach incorporates textual content, temporal information and the underlying network of user and messages posted to identify groups of messages with unusual coordination patterns across multiple social media platforms. We apply our approach to content posted on four platforms related to the Syrian Civil Defence organization known as the White Helmets: Twitter, Facebook, Reddit, and YouTube. Results show that our approach identifies social media posts that link to news YouTube channels with similar factuality score, which is often an indication of coordinated operations.Comment: To appear in the 5th Multidisciplinary International Symposium on Disinformation in Open Online Media (MISDOOM 2023

    Multiple-Aspect Analysis of Semantic Trajectories

    Get PDF
    This open access book constitutes the refereed post-conference proceedings of the First International Workshop on Multiple-Aspect Analysis of Semantic Trajectories, MASTER 2019, held in conjunction with the 19th European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2019, in WĂĽrzburg, Germany, in September 2019. The 8 full papers presented were carefully reviewed and selected from 12 submissions. They represent an interesting mix of techniques to solve recurrent as well as new problems in the semantic trajectory domain, such as data representation models, data management systems, machine learning approaches for anomaly detection, and common pathways identification
    • …
    corecore