302 research outputs found

    Extending Complex Event Processing for Advanced Applications

    Get PDF
    Recently numerous emerging applications, ranging from on-line financial transactions, RFID based supply chain management, traffic monitoring to real-time object monitoring, generate high-volume event streams. To meet the needs of processing event data streams in real-time, Complex Event Processing technology (CEP) has been developed with the focus on detecting occurrences of particular composite patterns of events. By analyzing and constructing several real-world CEP applications, we found that CEP needs to be extended with advanced services beyond detecting pattern queries. We summarize these emerging needs in three orthogonal directions. First, for applications which require access to both streaming and stored data, we need to provide a clear semantics and efficient schedulers in the face of concurrent access and failures. Second, when a CEP system is deployed in a sensitive environment such as health care, we wish to mitigate possible privacy leaks. Third, when input events do not carry the identification of the object being monitored, we need to infer the probabilistic identification of events before feed them to a CEP engine. Therefore this dissertation discusses the construction of a framework for extending CEP to support these critical services. First, existing CEP technology is limited in its capability of reacting to opportunities and risks detected by pattern queries. We propose to tackle this unsolved problem by embedding active rule support within the CEP engine. The main challenge is to handle interactions between queries and reactions to queries in the high-volume stream execution. We hence introduce a novel stream-oriented transactional model along with a family of stream transaction scheduling algorithms that ensure the correctness of concurrent stream execution. And then we demonstrate the proposed technology by applying it to a real-world healthcare system and evaluate the stream transaction scheduling algorithms extensively using real-world workload. Second, we are the first to study the privacy implications of CEP systems. Specifically we consider how to suppress events on a stream to reduce the disclosure of sensitive patterns, while ensuring that nonsensitive patterns continue to be reported by the CEP engine. We formally define the problem of utility-maximizing event suppression for privacy preservation. We then design a suite of real-time solutions that eliminate private pattern matches while maximizing the overall utility. Our first solution optimally solves the problem at the event-type level. The second solution, at event-instance level, further optimizes the event-type level solution by exploiting runtime event distributions using advanced pattern match cardinality estimation techniques. Our experimental evaluation over both real-world and synthetic event streams shows that our algorithms are effective in maximizing utility yet still efficient enough to offer near real time system responsiveness. Third, we observe that in many real-world object monitoring applications where the CEP technology is adopted, not all sensed events carry the identification of the object whose action they report on, so called €œnon-ID-ed€� events. Such non-ID-ed events prevent us from performing object-based analytics, such as tracking, alerting and pattern matching. We propose a probabilistic inference framework to tackle this problem by inferring the missing object identification associated with an event. Specifically, as a foundation we design a time-varying graphic model to capture correspondences between sensed events and objects. Upon this model, we elaborate how to adapt the state-of-the-art Forward-backward inference algorithm to continuously infer probabilistic identifications for non-ID-ed events. More important, we propose a suite of strategies for optimizing the performance of inference. Our experimental results, using large-volume streams of a real-world health care application, demonstrate the accuracy, efficiency, and scalability of the proposed technology

    A method to detect and represent temporal patterns from time series data and its application for analysis of physiological data streams

    Get PDF
    In critical care, complex systems and sensors continuously monitor patients??? physiological features such as heart rate, respiratory rate thus generating significant amounts of data every second. This results to more than 2 million records generated per patient in an hour. It???s an immense challenge for anyone trying to utilize this data when making critical decisions about patient care. Temporal abstraction and data mining are two research fields that have tried to synthesize time oriented data to detect hidden relationships that may exist in the data. Various researchers have looked at techniques for generating abstractions from clinical data. However, the variety and speed of data streams generated often overwhelms current systems which are not designed to handle such data. Other attempts have been to understand the complexity in time series data utilizing mining techniques, however, existing models are not designed to detect temporal relationships that might exist in time series data (Inibhunu & McGregor, 2016). To address this challenge, this thesis has proposed a method that extends the existing knowledge discovery frameworks to include components for detecting and representing temporal relationships in time series data. The developed method is instantiated within the knowledge discovery component of Artemis, a cloud based platform for processing physiological data streams. This is a unique approach that utilizes pattern recognition principles to facilitate functions for; (a) temporal representation of time series data with abstractions, (b) temporal pattern generation and quantification (c) frequent patterns identification and (d) building a classification system. This method is applied to a neonatal intensive care case study with a motivating problem that discovery of specific patterns from patient data could be crucial for making improved decisions within patient care. Another application is in chronic care to detect temporal relationships in ambulatory patient data before occurrence of an adverse event. The research premise is that discovery of hidden relationships and patterns in data would be valuable in building a classification system that automatically characterize physiological data streams. Such characterization could aid in detection of new normal and abnormal behaviors in patients who may have life threatening conditions

    Application of Hierarchical Temporal Memory to Anomaly Detection of Vital Signs for Ambient Assisted Living

    Get PDF
    This thesis presents the development of a framework for anomaly detection of vital signs for an Ambient Assisted Living (AAL) health monitoring scenario. It is driven by spatiotemporal reasoning of vital signs that Cortical Learning Algorithms (CLA) based on Hierarchal Temporal Memory (HTM) theory undertakes in an AAL health monitoring scenario to detect anomalous data points preceding cardiac arrest. This thesis begins with a literature review on the existing Ambient intelligence (AmI) paradigm, AAL technologies and anomaly detection algorithms used in a health monitoring scenario. The research revealed the significance of the temporal and spatial reasoning in the vital signs monitoring as the spatiotemporal patterns of vital signs provide a basis to detect irregularities in the health status of elderly people. The HTM theory is yet to be adequately deployed in an AAL health monitoring scenario. Hence HTM theory, network and core operations of the CLA are explored. Despite the fact that standard implementation of the HTM theory comprises of a single-level hierarchy, multiple vital signs, specifically the correlation between them is not sufficiently considered. This insufficiency is of particular significance considering that vital signs are correlated in time and space, which are used in the health monitoring applications for diagnosis and prognosis tasks. This research proposes a novel framework consisting of multi-level HTM networks. The lower level consists of four models allocated to the four vital signs, Systolic Blood Pressure (SBP), Diastolic Blood Pressure (DBP), Heart Rate (HR) and peripheral capillary oxygen saturation (SpO2) in order to learn the spatiotemporal patterns of each vital sign. Additionally, a higher level is introduced to learn spatiotemporal patterns of the anomalous data point detected from the four vital signs. The proposed hierarchical organisation improves the model’s performance by using the semantically improved representation of the sensed data because patterns learned at each level of the hierarchy are reused when combined in novel ways at higher levels. To investigate and evaluate the performance of the proposed framework, several data selection techniques are studied, and accordingly, a total record of 247 elderly patients is extracted from the MIMIC-III clinical database. The performance of the proposed framework is evaluated and compared against several state-of-the-art anomaly detection algorithms using both online and traditional metrics. The proposed framework achieved 83% NAB score which outperforms the HTM and k-NN algorithms by 15%, the HBOS and INFLO SVD by 16% and the k-NN PCA by 21% while the SVM scored 34%. The results prove that multiple HTM networks can achieve better performance when dealing with multi-dimensional data, i.e. data collected from more than one source/sensor

    A dynamic visual analytics framework for complex temporal environments

    Get PDF
    Introduction: Data streams are produced by sensors that sample an external system at a periodic interval. As the cost of developing sensors continues to fall, an increasing number of data stream acquisition systems have been deployed to take advantage of the volume and velocity of data streams. An overabundance of information in complex environments have been attributed to information overload, a state of exposure to overwhelming and excessive information. The use of visual analytics provides leverage over potential information overload challenges. Apart from automated online analysis, interactive visual tools provide significant leverage for human-driven trend analysis and pattern recognition. To facilitate analysis and knowledge discovery in the space of multidimensional big data, research is warranted for an online visual analytic framework that supports human-driven exploration and consumption of complex data streams. Method: A novel framework was developed called the temporal Tri-event parameter based Dynamic Visual Analytics (TDVA). The TDVA framework was instantiated in two case studies, namely, a case study involving a hypothesis generation scenario, and a second case study involving a cohort-based hypothesis testing scenario. Two evaluations were conducted for each case study involving expert participants. This framework is demonstrated in a neonatal intensive care unit case study. The hypothesis generation phase of the pipeline is conducted through a multidimensional and in-depth one subject study using PhysioEx, a novel visual analytic tool for physiologic data stream analysis. The cohort-based hypothesis testing component of the analytic pipeline is validated through CoRAD, a visual analytic tool for performing case-controlled studies. Results: The results of both evaluations show improved task performance, and subjective satisfaction with the use of PhysioEx and CoRAD. Results from the evaluation of PhysioEx reveals insight about current limitations for supporting single subject studies in complex environments, and areas for future research in that space. Results from CoRAD also support the need for additional research to explore complex multi-dimensional patterns across multiple observations. From an information systems approach, the efficacy and feasibility of the TDVA framework is demonstrated by the instantiation and evaluation of PhysioEx and CoRAD. Conclusion: This research, introduces the TDVA framework and provides results to validate the deployment of online dynamic visual analytics in complex environments. The TDVA framework was instantiated in two case studies derived from an environment where dynamic and complex data streams were available. The first instantiation enabled the end-user to rapidly extract information from complex data streams to conduct in-depth analysis. The second allowed the end-user to test emerging patterns across multiple observations. To both ends, this thesis provides knowledge that can be used to improve the visual analytic pipeline in dynamic and complex environments

    Efficient Process Data Warehousing

    Get PDF
    This dissertation presents a data processing architecture for efficient data warehousing from historical data sources. The present work has three primary contributions. The first contribution is the development of a generalized process data warehousing (PDW) architecture that includes multilayer data processing steps to transform raw data streams into useful information that facilitates data-driven decision making. The second contribution is exploring the applicability of the proposed architecture to the case of sparse process data. We have tested the proposed approach in a medical monitoring system, which takes physiological data and predicts the clinical setting in which the data is most likely to be seen. We have performed a set of experiments with real clinical data (from Children’s Hospital of Pittsburgh) that demonstrate the high utility of the present approach. The third contribution is exploring the applicability of the proposed PDW architecture to the case of redundant process data. We have designed and developed a conflict-aware data fusion strategy for the efficient aggregation of historical data. We have elaborated a simulation-based study of the tradeoffs between the data fusion solutions and data accuracy, and have also evaluated the solutions to a large-scale integrated framework (Tycho data) that includes historical data from heterogeneous sources in different subject areas. Finally, we propose and have evaluated a state sequence recovery (SSR) framework, which integrates work from two previous studies, which are both sparse and redundant studies. Our experimental results are based on several algorithms that have been developed and tested in different simulation set-up scenarios under both normal and exponential data distributions

    Data semantic enrichment for complex event processing over IoT Data Streams

    Get PDF
    This thesis generalizes techniques for processing IoT data streams, semantically enrich data with contextual information, as well as complex event processing in IoT applications. A case study for ECG anomaly detection and signal classification was conducted to validate the knowledge foundation

    Evaluating clinical variation in traumatic brain injury data

    Get PDF
    Current methods of clinical guideline development have two large challenges: 1) there is often a long time-lag between the key results and publication into recommended best practice and 2) the measurement of adherence to those guidelines is often qualitative and difficult to standardise into measurable impact. In an age of ever-increasing volumes of accurate data captured at the bedside in specialist intensive care units, this thesis explores the possibility of constructing a technology that can interpret that data and present the results as a quantitative and immediate measure of guideline adherence. Applied to the Traumatic Brain Injury (TBI) domain, and specifically to the management of ICP and CPP, a framework is developed that makes use of process models to measure the adherence of clinicians to three specific TBI guidelines. By combining models constructed from physiological and treatment ICU data, and those constructed from guideline text, a distance is calculated between the two, and patterns of guideline adherence are inferred from this distance. The framework has been developed into an online application capable of producing adherence output on most standardised ICU datasets. This application has been applied to the Brain-IT and MIMIC III repositories and evaluated on the Philips ICCA bedside monitoring system. Patterns of guideline adherence are presented in a variety of ways including minute-by-minute windowing, tables of non-adherence instances, statistical distribution of instances, and a severity chart summarising the impact of non-adherence in a single number

    Data mart based research in heart surgery

    Get PDF
    Arnrich B. Data mart based research in heart surgery. Bielefeld (Germany): Bielefeld University; 2006.The proposed data mart based information system has proven to be useful and effective in the particular application domain of clinical research in heart surgery. In contrast to common data warehouse systems who are focused primarily on administrative, managerial, and executive decision making, the primary objective of the designed and implemented data mart was to provide an ongoing, consolidated and stable research basis. Beside detail-oriented patient data also aggregated data are incorporated in order to fulfill multiple purposes. Due to the chosen concept, this technique integrates the current and historical data from all relevant data sources without imposing any considerable operational or liability contract risk for the existing hospital information systems (HIS). By this means the possible resistance of involved persons in charge can be minimized and the project specific goals effectively met. The challenges of isolated data sources, securing a high data quality, data with partial redundancy and consistency, valuable legacy data in special file formats, and privacy protection regulations are met with the proposed data mart architecture. The applicability was demonstrated in several fields, including (i) to permit easy comprehensive medical research, (ii) to assess preoperative risks of adverse surgical outcomes, (iii) to get insights into historical performance changes, (iv) to monitor surgical results, (v) to improve risk estimation, and (vi) to generate new knowledge from observational studies. The data mart approach allows to turn redundant data from the electronically available hospital data sources into valuable information. On the one hand, redundancies are used to detect inconsistencies within and across HIS. On the other hand, redundancies are used to derive attributes from several data sources which originally did not contain the desired semantic meaning. Appropriate verification tools help to inspect the extraction and transformation processes in order to ensure a high data quality. Based on the verification data stored during data mart assembly, various aspects on the basis of an individual case, a group, or a specific rule can be inspected. Invalid values or inconsistencies must be corrected in the primary source data bases by the health professionals. Due to all modifications are automatically transferred to the data mart system in a subsequent cycle, a consolidated and stable research data base is achieved throughout the system in a persistent manner. In the past, performing comprehensive observational studies at the Heart Institute Lahr had been extremely time consuming and therefore limited. Several attempts had already been conducted to extract and combine data from the electronically available data sources. Dependent on the desired scientific task, the processes to extract and connect the data were often rebuilt and modified. Consequently the semantics and the definitions of the research data changed from one study to the other. Additionally, it was very difficult to maintain an overview of all data variants and derived research data sets. With the implementation of the presented data mart system the most time and effort consuming process with conducting successful observational studies could be replaced and the research basis remains stable and leads to reliable results
    • …
    corecore