4,146 research outputs found

    Scalable Bayesian modeling, monitoring and analysis of dynamic network flow data

    Full text link
    Traffic flow count data in networks arise in many applications, such as automobile or aviation transportation, certain directed social network contexts, and Internet studies. Using an example of Internet browser traffic flow through site-segments of an international news website, we present Bayesian analyses of two linked classes of models which, in tandem, allow fast, scalable and interpretable Bayesian inference. We first develop flexible state-space models for streaming count data, able to adaptively characterize and quantify network dynamics efficiently in real-time. We then use these models as emulators of more structured, time-varying gravity models that allow formal dissection of network dynamics. This yields interpretable inferences on traffic flow characteristics, and on dynamics in interactions among network nodes. Bayesian monitoring theory defines a strategy for sequential model assessment and adaptation in cases when network flow data deviates from model-based predictions. Exploratory and sequential monitoring analyses of evolving traffic on a network of web site-segments in e-commerce demonstrate the utility of this coupled Bayesian emulation approach to analysis of streaming network count data.Comment: 29 pages, 16 figure

    What’s Happening Around the World? A Survey and Framework on Event Detection Techniques on Twitter

    Full text link
    © 2019, Springer Nature B.V. In the last few years, Twitter has become a popular platform for sharing opinions, experiences, news, and views in real-time. Twitter presents an interesting opportunity for detecting events happening around the world. The content (tweets) published on Twitter are short and pose diverse challenges for detecting and interpreting event-related information. This article provides insights into ongoing research and helps in understanding recent research trends and techniques used for event detection using Twitter data. We classify techniques and methodologies according to event types, orientation of content, event detection tasks, their evaluation, and common practices. We highlight the limitations of existing techniques and accordingly propose solutions to address the shortcomings. We propose a framework called EDoT based on the research trends, common practices, and techniques used for detecting events on Twitter. EDoT can serve as a guideline for developing event detection methods, especially for researchers who are new in this area. We also describe and compare data collection techniques, the effectiveness and shortcomings of various Twitter and non-Twitter-based features, and discuss various evaluation measures and benchmarking methodologies. Finally, we discuss the trends, limitations, and future directions for detecting events on Twitter

    Early detection of health changes in the elderly using in-home multi-sensor data streams

    Get PDF
    The rapid aging of the population worldwide requires increased attention from health care providers and the entire society. For the elderly to live independently, many health issues related to old age, such as frailty and risk of falling, need increased attention and monitoring. When monitoring daily routines for older adults, it is desirable to detect the early signs of health changes before serious health events, such as hospitalizations, happen, so that timely and adequate preventive care may be provided. By deploying multi-sensor systems in homes of the elderly, we can track trajectories of daily behaviors in a feature space defined using the sensor data. In this work, we investigate a methodology for learning data distribution from streaming data and tracking the evolution of the behavior trajectories over long periods (years) using high dimensional streaming clustering and provide very early indicators of changes in health. If we assume that habitual behaviors correspond to clusters in feature space and diseases produce a change in behavior, albeit not highly specific, tracking trajectory deviations can provide hints of early illness. Retrospectively, we visualize the streaming clustering results and track how the behavior clusters evolve in feature space with the help of two dimension-reduction algorithms, Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE). Moreover, our tracking algorithm in the original high dimensional feature space generates early health warning alerts if a negative trend is detected in the behavior trajectory. We validated our algorithm on synthetic data, real-world data and tested it on a pilot dataset of four TigerPlace residents monitored with a collection of motion, bed, and depth sensors over ten years. We used the TigerPlace electronic health records (EHR) to understand the residents' behavior patterns and to evaluate and explain the health warnings generated by our algorithm. The results obtained on the TigerPlace dataset show that most of the warnings produced by our algorithm can be linked to health events documented in the EHR, providing strong support for a prospective deployment of the approach.Includes bibliographical references

    From Wearable Sensors to Smart Implants – Towards Pervasive and Personalised Healthcare

    No full text
    <p>Objective: This article discusses the evolution of pervasive healthcare from its inception for activity recognition using wearable sensors to the future of sensing implant deployment and data processing. Methods: We provide an overview of some of the past milestones and recent developments, categorised into different generations of pervasive sensing applications for health monitoring. This is followed by a review on recent technological advances that have allowed unobtrusive continuous sensing combined with diverse technologies to reshape the clinical workflow for both acute and chronic disease management. We discuss the opportunities of pervasive health monitoring through data linkages with other health informatics systems including the mining of health records, clinical trial databases, multi-omics data integration and social media. Conclusion: Technical advances have supported the evolution of the pervasive health paradigm towards preventative, predictive, personalised and participatory medicine. Significance: The sensing technologies discussed in this paper and their future evolution will play a key role in realising the goal of sustainable healthcare systems.</p> <p> </p

    The GC3 framework : grid density based clustering for classification of streaming data with concept drift.

    Get PDF
    Data mining is the process of discovering patterns in large sets of data. In recent years there has been a paradigm shift in how the data is viewed. Instead of considering the data as static and available in databases, data is now regarded as a stream as it continuously flows into the system. One of the challenges posed by the stream is its dynamic nature, which leads to a phenomenon known as Concept Drift. This causes a need for stream mining algorithms which are adaptive incremental learners capable of evolving and adjusting to the changes in the stream. Several models have been developed to deal with Concept Drift. These systems are discussed in this thesis and a new system, the GC3 framework is proposed. The GC3 framework leverages the advantages of the Gris Density based Clustering and the Ensemble based classifiers for streaming data, to be able to detect the cause of the drift and deal with it accordingly. In order to demonstrate the functionality and performance of the framework a synthetic data stream called the TJSS stream is developed, which embodies a variety of drift scenarios, and the model’s behavior is analyzed over time. Experimental evaluation with the synthetic stream and two real world datasets demonstrated high prediction capability of the proposed system with a small ensemble size and labeling ratio. Comparison of the methodology with a traditional static model with no drifts detection capability and with existing ensemble techniques for stream classification, showed promising results. Also, the analysis of data structures maintained by the framework provided interpretability into the dynamics of the drift over time. The experimentation analysis of the GC3 framework shows it to be promising for use in dynamic drifting environments where concepts can be incrementally learned in the presence of only partially labeled data

    Data-Centric Epidemic Forecasting: A Survey

    Full text link
    The COVID-19 pandemic has brought forth the importance of epidemic forecasting for decision makers in multiple domains, ranging from public health to the economy as a whole. While forecasting epidemic progression is frequently conceptualized as being analogous to weather forecasting, however it has some key differences and remains a non-trivial task. The spread of diseases is subject to multiple confounding factors spanning human behavior, pathogen dynamics, weather and environmental conditions. Research interest has been fueled by the increased availability of rich data sources capturing previously unobservable facets and also due to initiatives from government public health and funding agencies. This has resulted, in particular, in a spate of work on 'data-centered' solutions which have shown potential in enhancing our forecasting capabilities by leveraging non-traditional data sources as well as recent innovations in AI and machine learning. This survey delves into various data-driven methodological and practical advancements and introduces a conceptual framework to navigate through them. First, we enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting, capturing various factors like symptomatic online surveys, retail and commerce, mobility, genomics data and more. Next, we discuss methods and modeling paradigms focusing on the recent data-driven statistical and deep-learning based methods as well as on the novel class of hybrid models that combine domain knowledge of mechanistic models with the effectiveness and flexibility of statistical approaches. We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems including decision-making informed by forecasts. Finally, we highlight some challenges and open problems found across the forecasting pipeline.Comment: 67 pages, 12 figure

    A study assessing the characteristics of big data environments that predict high research impact: application of qualitative and quantitative methods

    Full text link
    BACKGROUND: Big data offers new opportunities to enhance healthcare practice. While researchers have shown increasing interest to use them, little is known about what drives research impact. We explored predictors of research impact, across three major sources of healthcare big data derived from the government and the private sector. METHODS: This study was based on a mixed methods approach. Using quantitative analysis, we first clustered peer-reviewed original research that used data from government sources derived through the Veterans Health Administration (VHA), and private sources of data from IBM MarketScan and Optum, using social network analysis. We analyzed a battery of research impact measures as a function of the data sources. Other main predictors were topic clusters and authors’ social influence. Additionally, we conducted key informant interviews (KII) with a purposive sample of high impact researchers who have knowledge of the data. We then compiled findings of KIIs into two case studies to provide a rich understanding of drivers of research impact. RESULTS: Analysis of 1,907 peer-reviewed publications using VHA, IBM MarketScan and Optum found that the overall research enterprise was highly dynamic and growing over time. With less than 4 years of observation, research productivity, use of machine learning (ML), natural language processing (NLP), and the Journal Impact Factor showed substantial growth. Studies that used ML and NLP, however, showed limited visibility. After adjustments, VHA studies had generally higher impact (10% and 27% higher annualized Google citation rates) compared to MarketScan and Optum (p<0.001 for both). Analysis of co-authorship networks showed that no single social actor, either a community of scientists or institutions, was dominating. Other key opportunities to achieve high impact based on KIIs include methodological innovations, under-studied populations and predictive modeling based on rich clinical data. CONCLUSIONS: Big data for purposes of research analytics has grown within the three data sources studied between 2013 and 2016. Despite important challenges, the research community is reacting favorably to the opportunities offered both by big data and advanced analytic methods. Big data may be a logical and cost-efficient choice to emulate research initiatives where RCTs are not possible

    Situation inference and context recognition for intelligent mobile sensing applications

    Get PDF
    The usage of smart devices is an integral element in our daily life. With the richness of data streaming from sensors embedded in these smart devices, the applications of ubiquitous computing are limitless for future intelligent systems. Situation inference is a non-trivial issue in the domain of ubiquitous computing research due to the challenges of mobile sensing in unrestricted environments. There are various advantages to having robust and intelligent situation inference from data streamed by mobile sensors. For instance, we would be able to gain a deeper understanding of human behaviours in certain situations via a mobile sensing paradigm. It can then be used to recommend resources or actions for enhanced cognitive augmentation, such as improved productivity and better human decision making. Sensor data can be streamed continuously from heterogeneous sources with different frequencies in a pervasive sensing environment (e.g., smart home). It is difficult and time-consuming to build a model that is capable of recognising multiple activities. These activities can be performed simultaneously with different granularities. We investigate the separability aspect of multiple activities in time-series data and develop OPTWIN as a technique to determine the optimal time window size to be used in a segmentation process. As a result, this novel technique reduces need for sensitivity analysis, which is an inherently time consuming task. To achieve an effective outcome, OPTWIN leverages multi-objective optimisation by minimising the impurity (the number of overlapped windows of human activity labels on one label space over time series data) while maximising class separability. The next issue is to effectively model and recognise multiple activities based on the user&#039;s contexts. Hence, an intelligent system should address the problem of multi-activity and context recognition prior to the situation inference process in mobile sensing applications. The performance of simultaneous recognition of human activities and contexts can be easily affected by the choices of modelling approaches to build an intelligent model. We investigate the associations of these activities and contexts at multiple levels of mobile sensing perspectives to reveal the dependency property in multi-context recognition problem. We design a Mobile Context Recognition System, which incorporates a Context-based Activity Recognition (CBAR) modelling approach to produce effective outcome from both multi-stage and multi-target inference processes to recognise human activities and their contexts simultaneously. Upon our empirical evaluation on real-world datasets, the CBAR modelling approach has significantly improved the overall accuracy of simultaneous inference on transportation mode and human activity of mobile users. The accuracy of activity and context recognition can also be influenced progressively by how reliable user annotations are. Essentially, reliable user annotation is required for activity and context recognition. These annotations are usually acquired during data capture in the world. We research the needs of reducing user burden effectively during mobile sensor data collection, through experience sampling of these annotations in-the-wild. To this end, we design CoAct-nnotate --- a technique that aims to improve the sampling of human activities and contexts by providing accurate annotation prediction and facilitates interactive user feedback acquisition for ubiquitous sensing. CoAct-nnotate incorporates a novel multi-view multi-instance learning mechanism to perform more accurate annotation prediction. It also includes a progressive learning process (i.e., model retraining based on co-training and active learning) to improve its predictive performance over time. Moving beyond context recognition of mobile users, human activities can be related to essential tasks that the users perform in daily life. Conversely, the boundaries between the types of tasks are inherently difficult to establish, as they can be defined differently from the individuals&#039; perspectives. Consequently, we investigate the implication of contextual signals for user tasks in mobile sensing applications. To define the boundary of tasks and hence recognise them, we incorporate such situation inference process (i.e., task recognition) into the proposed Intelligent Task Recognition (ITR) framework to learn users&#039; Cyber-Physical-Social activities from their mobile sensing data. By recognising the engaged tasks accurately at a given time via mobile sensing, an intelligent system can then offer proactive supports to its user to progress and complete their tasks. Finally, for robust and effective learning of mobile sensing data from heterogeneous sources (e.g., Internet-of-Things in a mobile crowdsensing scenario), we investigate the utility of sensor data in provisioning their storage and design QDaS --- an application agnostic framework for quality-driven data summarisation. This allows an effective data summarisation by performing density-based clustering on multivariate time series data from a selected source (i.e., data provider). Thus, the source selection process is determined by the measure of data quality. Nevertheless, this framework allows intelligent systems to retain comparable predictive results by its effective learning on the compact representations of mobile sensing data, while having a higher space saving ratio. This thesis contains novel contributions in terms of the techniques that can be employed for mobile situation inference and context recognition, especially in the domain of ubiquitous computing and intelligent assistive technologies. This research implements and extends the capabilities of machine learning techniques to solve real-world problems on multi-context recognition, mobile data summarisation and situation inference from mobile sensing. We firmly believe that the contributions in this research will help the future study to move forward in building more intelligent systems and applications
    • …
    corecore