4 research outputs found

    Modeling and analysis of disease and risk factors through learning Bayesian networks from observational data

    Full text link
    This paper focuses on identification of the relationships between a disease and its potential risk factors using Bayesian networks in an epidemiologic study, with the emphasis on integrating medical domain knowledge and statistical data analysis. An integrated approach is developed to identify the risk factors associated with patients' occupational histories and is demonstrated using real-world data. This approach includes several steps. First, raw data are preprocessed into a format that is acceptable to the learning algorithms of Bayesian networks. Some important considerations are discussed to address the uniqueness of the data and the challenges of the learning. Second, a Bayesian network is learned from the preprocessed data set by integrating medical domain knowledge and generic learning algorithms. Third, the relationships revealed by the Bayesian network are used for risk factor analysis, including identification of a group of people who share certain common characteristics and have a relatively high probability of developing the disease, and prediction of a person's risk of developing the disease given information on his/her occupational history. Copyright © 2007 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/58076/1/893_ftp.pd

    Online Adaptable Time Series Anomaly Detection with Discrete Wavelet Transforms and Multivariate Gaussian Distributions

    Get PDF
    In this paper we present an unsupervised time series anomaly detection algorithm, which is based on the discrete wavelet transform (DWT) operating fully online. Given streaming data or time series, the algorithm iteratively computes the (causal and decimating) discrete wavelet transform. For individual frequency scales of the current DWT, the algorithm estimates the parameters of a multivariate Gaussian distribution. These parameters are adapted in an online fashion. Based on the multivariate Gaussian distributions, unusual patterns can then be detected across frequency scales, which in certain constellations indicate anomalous behavior. The algorithm is tested on a diverse set of 425 time series. A comparison to several other state-of-the-art online anomaly detectors shows that our algorithm can mostly produce results similar to the best algorithm on each dataset. It produces the highest average F1-score with one standard parameter setting. That is, it works more stable on high- and low-frequency-anomalies than all other algorithms. We believe that the wavelet transform is an important ingredient to achieve this

    Dynamic resource allocation for energy management in data centers

    Get PDF
    In this dissertation we study the problem of allocating computational resources and managing applications in a data center to serve incoming requests in such a way that the energy usage, reliability and quality of service considerations are balanced. The problem is motivated by the growing energy consumption by data centers in the world and their overall inefficiency. This work is focused on designing flexible and robust strategies to manage the resources in such a way that the system is able to meet the service agreements even when the load conditions change. As a first step, we study the control of a Markovian queueing system with controllable number of servers and service rates (M=Mt=kt ) to minimize effort and holding costs. We present structural properties of the optimal policy and suggest an algorithm to find good performance policies even for large cases. Then we present a reactive/proactive approach, and a tailor-made wavelet-based forecasting procedure to determine the resource allocation in a single application setting; the method is tested by simulation with real web traces. The main feature of this method is its robustness and flexibility to meet QoS goals even when the traffic behavior changes. The system was tested by simulating a system with a time service factor QoS agreement. Finally, we consider the multi-application setting and develop a novel load consolidation strategy (of combining applications that are traditionally hosted on different servers) to reduce the server-load variability and the number of booting cycles in order to obtain a better capacity allocation
    corecore