24 research outputs found
Optimal Parameter Exploration for Online Change-Point Detection in Activity Monitoring Using Genetic Algorithms
In recent years, smart phones with inbuilt sensors have become popular devices to facilitate activity recognition. The sensors capture a large amount of data, containing meaningful events, in a short period of time. The change points in this data are used to specify transitions to distinct events and can be used in various scenarios such as identifying change in a patient’s vital signs in the medical domain or requesting activity labels for generating real-world labeled activity datasets. Our work focuses on change-point detection to identify a transition from one activity to another. Within this paper, we extend our previous work on multivariate exponentially weighted moving average (MEWMA) algorithm by using a genetic algorithm (GA) to identify the optimal set of parameters for online change-point detection. The proposed technique finds the maximum accuracy and F_measure by optimizing the different parameters of the MEWMA, which subsequently identifies the exact location of the change point from an existing activity to a new one. Optimal parameter selection facilitates an algorithm to detect accurate change points and minimize false alarms. Results have been evaluated based on two real datasets of accelerometer data collected from a set of different activities from two users, with a high degree of accuracy from 99.4% to 99.8% and F_measure of up to 66.7%
Recommended from our members
Improving surveillance and prediction of emerging and re-emerging infectious diseases
Infectious diseases are emerging at an unprecedent rate in recent years, such as the flu pandemic initialized from Mexico in 2009, the 2014 Ebola epidemic in West Africa, and the 2016-2017 expansion of Zika across Americas. They rarely happened previously and thus lack resources and data to detect and predict their spread. This highlights the challenges in emerging an re-emerging infectious disease surveillance. In the dissertation, I mainly put efforts in developing methods for early detection of such diseases, and assessing predictive power of various models in early phase of an epidemic. In Chapter 2, I developed a two-layer early detection framework which provides early warning of emerging epidemics based on the idea of anomaly detection. The framework could evaluate and identify data sources to achieve the best performance automatically from available data, such as data from the Internet and public health surveillance systems. I demonstrated the framework using historical influenza data in the US, and found that the optimal combination of predictors includes data sources from Google search query and Wikipedia page view. The optimized system is able to detect the onset of seasonal influenza outbreaks an average of 16.4 weeks in advance, and the second wave of the 2009 flu pandemic 5 weeks ahead. In Chapter 3, I extended the framework in Chapter 2 to identify large dengue outbreaks from small ones. The results show that the framework could personalize optimal combinations of predictors for different locations, and an optimal combination for one location might not perform well for other locations. In Chapter 4, I investigated the contribution of different population structures to total epidemic incidence, peak intensity and timing, and also explored the ability of various models with different population structures in predicting epidemic dynamics. The results suggest that heterogeneous contact pattern and direct contacts dominate the evolution of epidemics, and a homogeneous model is not able to provide reliable prediction for an epidemic. In summary, my dissertation not only provides method frameworks for building early detection systems for emerging and re-emerging infectious diseases, but also gives insight to the effects of various models in predicting epidemics.Cellular and Molecular Biolog
Change Point Detection for Streaming Data Using Support Vector Methods
Sequential multiple change point detection concerns the identification of multiple points in time where the systematic behavior of a statistical process changes. A special case of this problem, called online anomaly detection, occurs when the goal is to detect the first change and then signal an alert to an analyst for further investigation. This dissertation concerns the use of methods based on kernel functions and support vectors to detect changes. A variety of support vector-based methods are considered, but the primary focus concerns Least Squares Support Vector Data Description (LS-SVDD). LS-SVDD constructs a hypersphere in a kernel space to bound a set of multivariate vectors using a closed-form solution. The mathematical tractability of the LS-SVDD facilitates closed-form updates for the LS-SVDD Lagrange multipliers. The update formulae concern either adding or removing a block of observations from an existing LS-SVDD description, respectively, and thus LS-SVDD can be constructed or updated sequentially which makes it attractive for online problems with sequential data streams. LS-SVDD is applied to a variety of scenarios including online anomaly detection and sequential multiple change point detection
Contributions to statistical methods of process monitoring and adjustment
Ph.DDOCTOR OF PHILOSOPH
A study of new and advanced control charts for two categories of time related processes
Ph.DDOCTOR OF PHILOSOPH
Anomaly Detection in Time Series: Theoretical and Practical Improvements for Disease Outbreak Detection
The automatic collection and increasing availability of health data provides a new opportunity for techniques to monitor this information. By monitoring pre-diagnostic data sources, such as over-the-counter cough medicine sales or emergency room chief complaints of cough, there exists the potential to detect disease outbreaks earlier than traditional laboratory disease confirmation results. This research is particularly important for a modern, highly-connected society, where the onset of disease outbreak can be swift and deadly, whether caused by a naturally occurring global pandemic such as swine flu or a targeted act of bioterrorism. In this dissertation, we first describe the problem and current state of research in disease outbreak detection, then provide four main additions to the field.
First, we formalize a framework for analyzing health series data and detecting anomalies: using forecasting methods to predict the next day's value, subtracting the forecast to create residuals, and finally using detection algorithms on the residuals. The formalized framework indicates the link between the forecast accuracy of the forecast method and the performance of the detector, and can be used to quantify and analyze the performance of a variety of heuristic methods.
Second, we describe improvements for the forecasting of health data series. The application of weather as a predictor, cross-series covariates, and ensemble forecasting each provide improvements to forecasting health data.
Third, we describe improvements for detection. This includes the use of multivariate statistics for anomaly detection and additional day-of-week preprocessing to aid detection. Most significantly, we also provide a new method, based on the CuScore, for optimizing detection when the impact of the disease outbreak is known. This method can provide an optimal detector for rapid detection, or for probability of detection within a certain timeframe.
Finally, we describe a method for improved comparison of detection methods. We provide tools to evaluate how well a simulated data set captures the characteristics of the authentic series and time-lag heatmaps, a new way of visualizing daily detection rates or displaying the comparison between two methods in a more informative way