15 research outputs found
Detecting Irregular Patterns in IoT Streaming Data for Fall Detection
Detecting patterns in real time streaming data has been an interesting and
challenging data analytics problem. With the proliferation of a variety of
sensor devices, real-time analytics of data from the Internet of Things (IoT)
to learn regular and irregular patterns has become an important machine
learning problem to enable predictive analytics for automated notification and
decision support. In this work, we address the problem of learning an irregular
human activity pattern, fall, from streaming IoT data from wearable sensors. We
present a deep neural network model for detecting fall based on accelerometer
data giving 98.75 percent accuracy using an online physical activity monitoring
dataset called "MobiAct", which was published by Vavoulas et al. The initial
model was developed using IBM Watson studio and then later transferred and
deployed on IBM Cloud with the streaming analytics service supported by IBM
Streams for monitoring real-time IoT data. We also present the systems
architecture of the real-time fall detection framework that we intend to use
with mbientlabs wearable health monitoring sensors for real time patient
monitoring at retirement homes or rehabilitation clinics.Comment: 7 page
Incremental Principal Component Analysis Based Outliers Detection Methods for Spatiotemporal Data Streams
In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams
Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma
A novel algorithm and implementation of real-time identification and tracking
of blob-filaments in fusion reactor data is presented. Similar spatio-temporal
features are important in many other applications, for example, ignition
kernels in combustion and tumor cells in a medical image. This work presents an
approach for extracting these features by dividing the overall task into three
steps: local identification of feature cells, grouping feature cells into
extended feature, and tracking movement of feature through overlapping in
space. Through our extensive work in parallelization, we demonstrate that this
approach can effectively make use of a large number of compute nodes to detect
and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion
simulation data, we observed linear speedup on 1024 processes and completed
blob detection in less than three milliseconds using Edison, a Cray XC30 system
at NERSC.Comment: 14 pages, 40 figure
Understanding Educational Vulnerability in the Context of Disasters Using Visualizations
BACKGROUND: Children are particularly vulnerable to the impact of natural disasters, yet limited scholarly attention has been placed on understanding their needs. The effect disasters may have on children’s educational attainment and achievement, otherwise known as educational vulnerability, is one of the least studied aspects of children’s disaster research. The use of visualizations using open access data repositories can facilitate researchers understanding of children’s educational vulnerability post-disaster.
AIMS: This paper illustrates how visuals can be used to address challenges that researchers may encounter when using educational datasets to evaluate disaster-related educational vulnerability. The challenges addressed include: (1) understanding data quality, (2) evaluating patterns within the data, (3) and evaluating for possible moderating variables.
DATA: This paper uses an example dataset containing educational data collected pre and post Hurricane Ike’s landfall in the Texas Gulf Coast in 2008. The publicly available data originated from the Texas Education Agency (TEA) and was compiled into a historical dataset for the school years 2003-2011. Schools served as the primary unit of analysis (n = 464). Performance on the Texas Assessment of Knowledge and Skills (TAKS) served as a proxy for school academic functioning.
CONCLUSIONS: The use of visualizations serves as a valuable method to aid in the understanding of educational vulnerability in the context of disasters. Visuals can be used to evaluate accuracy during data exploration, identify patterns within the data, and stimulate new questions and hypotheses. Future research should place focus on the utilization of longitudinal educational datasets, which will provide more detailed information regarding students’ educational vulnerability risks and needs
Designing a streaming algorithm for outlier detection in data mining—an incrementa approach
To design an algorithm for detecting outliers over streaming data has become an important task in many common applications, arising in areas such as fraud detections, network analysis, environment monitoring and so forth. Due to the fact that real-time data may arrive in the form of streams rather than batches, properties such as concept drift, temporal context, transiency, and uncertainty need to be considered. In addition, data processing needs to be incremental with limited memory resource, and scalable. These facts create big challenges for existing outlier detection algorithms in terms of their accuracies when they are implemented in an incremental fashion, especially in the streaming environment. To address these problems, we first propose C_KDE_WR, which uses sliding window and kernel function to process the streaming data online, and reports its results demonstrating high throughput on handling real-time streaming data, implemented in a CUDA framework on Graphics Processing Unit (GPU). We also present another algorithm, C_LOF, based on a very popular and effective outlier detection algorithm called Local Outlier Factor (LOF) which unfortunately works only on batched data. Using a novel incremental approach that compensates the drawback of high complexity in LOF, we show how to implement it in a streaming context and to obtain results in a timely manner. Like C_KDE_WR, C_LOF also employs sliding-window and statistical-summary to help making decision based on the data in the current window. It also addresses all those challenges of streaming data as addressed in C_KDE_WR. In addition, we report the comparative evaluation on the accuracy of C_KDE_WR with the state-of-the-art SOD_GPU using Precision, Recall and F-score metrics. Furthermore, a t-test is also performed to demonstrate the significance of the improvement. We further report the testing results of C_LOF on different parameter settings and drew ROC and PR curve with their area under the curve (AUC) and Average Precision (AP) values calculated respectively. Experimental results show that C_LOF can overcome the masquerading problem, which often exists in outlier detection on streaming data. We provide complexity analysis and report experiment results on the accuracy of both C_KDE_WR and C_LOF algorithms in order to evaluate their effectiveness as well as their efficiencies
Data Stream Clustering: A Review
Number of connected devices is steadily increasing and these devices
continuously generate data streams. Real-time processing of data streams is
arousing interest despite many challenges. Clustering is one of the most
suitable methods for real-time data stream processing, because it can be
applied with less prior information about the data and it does not need labeled
instances. However, data stream clustering differs from traditional clustering
in many aspects and it has several challenging issues. Here, we provide
information regarding the concepts and common characteristics of data streams,
such as concept drift, data structures for data streams, time window models and
outlier detection. We comprehensively review recent data stream clustering
algorithms and analyze them in terms of the base clustering technique,
computational complexity and clustering accuracy. A comparison of these
algorithms is given along with still open problems. We indicate popular data
stream repositories and datasets, stream processing tools and platforms. Open
problems about data stream clustering are also discussed.Comment: Has been accepted for publication in Artificial Intelligence Revie