67 research outputs found

    Anomaly Detection and Exploratory Causal Analysis for SAP HANA

    Get PDF
    Nowadays, the good functioning of the equipment, networks and systems will be the key for the business of a company to continue operating because it is never avoidable for the companies to use information technology to support their business in the era of big data. However, the technology is never infallible, faults that give rise to sometimes critical situations may appear at any time. To detect and prevent failures, it is very essential to have a good monitoring system which is responsible for controlling the technology used by a company (hardware, networks and communications, operating systems or applications, among others) in order to analyze their operation and performance, and to detect and alert about possible errors. The aim of this thesis is thus to further advance the field of anomaly detection and exploratory causal inference which are two major research areas in a monitoring system, to provide efficient algorithms with regards to the usability, maintainability and scalability. The analyzed results can be viewed as a starting point for the root cause analysis of the system performance issues and to avoid falls in the system or minimize the time of resolution of the issues in the future. The algorithms were performed on the historical data of SAP HANA database at last and the results gained in this thesis indicate that the tools have succeeded in providing some useful information for diagnosing the performance issues of the system

    Transfer Entropy Estimation and Directional Coupling Change Detection in Biomedical Time Series

    Get PDF
    Background: The detection of change in magnitude of directional coupling between two non-linear time series is a common subject of interest in the biomedical domain, including studies involving the respiratory chemoreflex system. Although transfer entropy is a useful tool in this avenue, no study to date has investigated how different transfer entropy estimation methods perform in typical biomedical applications featuring small sample size and presence of outliers. Methods: With respect to detection of increased coupling strength, we compared three transfer entropy estimation techniques using both simulated time series and respiratory recordings from lambs. The following estimation methods were analyzed: fixed-binning with ranking, kernel density estimation (KDE), and the Darbellay-Vajda (D-V) adaptive partitioning algorithm extended to three dimensions. In the simulated experiment, sample size was varied from 50 to 200, while coupling strength was increased. In order to introduce outliers, the heavy-tailed Laplace distribution was utilized. In the lamb experiment, the objective was to detect increased respiratoryrelated chemosensitivity to O[subscript 2] and CO[subscript 2] induced by a drug, domperidone. Specifically, the separate influence of end-tidal PO[subscript 2] and PCO[subscript 2] on minute ventilation ([dot over V][subscript E]) before and after administration of domperidone was analyzed. Results: In the simulation, KDE detected increased coupling strength at the lowest SNR among the three methods. In the lamb experiment, D-V partitioning resulted in the statistically strongest increase in transfer entropy post-domperidone for PO2 → [dot over V][subscript E]. In addition, D-V partitioning was the only method that could detect an increase in transfer entropy for PCO[subscript 2] → [dot over V][subscript E], in agreement with experimental findings. Conclusions: Transfer entropy is capable of detecting directional coupling changes in non-linear biomedical time series analysis featuring a small number of observations and presence of outliers. The results of this study suggest that fixed-binning, even with ranking, is too primitive, and although there is no clear winner between KDE and D-V partitioning, the reader should note that KDE requires more computational time and extensive parameter selection than D-V partitioning. We hope this study provides a guideline for selection of an appropriate transfer entropy estimation method.National Institutes of Health (U.S.) (Grant R01-EB001659)National Institutes of Health (U.S.) (Grant R01- HL73146)National Institutes of Health (U.S.) (Grant HL085188-01A2)National Institutes of Health (U.S.) (Grant HL090897-01A2)National Institutes of Health (U.S.) (Grant K24 HL093218-01A1)National Institutes of Health (U.S.) (Cooperative Agreement U01-EB-008577)National Institutes of Health (U.S.) (Training Grant T32-HL07901))American Heart Association (Grant 0840159N

    Dynamics of Information Diffusion and Social Sensing

    Full text link
    Statistical inference using social sensors is an area that has witnessed remarkable progress and is relevant in applications including localizing events for targeted advertising, marketing, localization of natural disasters and predicting sentiment of investors in financial markets. This chapter presents a tutorial description of four important aspects of sensing-based information diffusion in social networks from a communications/signal processing perspective. First, diffusion models for information exchange in large scale social networks together with social sensing via social media networks such as Twitter is considered. Second, Bayesian social learning models and risk averse social learning is considered with applications in finance and online reputation systems. Third, the principle of revealed preferences arising in micro-economics theory is used to parse datasets to determine if social sensors are utility maximizers and then determine their utility functions. Finally, the interaction of social sensors with YouTube channel owners is studied using time series analysis methods. All four topics are explained in the context of actual experimental datasets from health networks, social media and psychological experiments. Also, algorithms are given that exploit the above models to infer underlying events based on social sensing. The overview, insights, models and algorithms presented in this chapter stem from recent developments in network science, economics and signal processing. At a deeper level, this chapter considers mean field dynamics of networks, risk averse Bayesian social learning filtering and quickest change detection, data incest in decision making over a directed acyclic graph of social sensors, inverse optimization problems for utility function estimation (revealed preferences) and statistical modeling of interacting social sensors in YouTube social networks.Comment: arXiv admin note: text overlap with arXiv:1405.112

    Heterogeneous Sensor Signal Processing for Inference with Nonlinear Dependence

    Get PDF
    Inferring events of interest by fusing data from multiple heterogeneous sources has been an interesting and important topic in recent years. Several issues related to inference using heterogeneous data with complex and nonlinear dependence are investigated in this dissertation. We apply copula theory to characterize the dependence among heterogeneous data. In centralized detection, where sensor observations are available at the fusion center (FC), we study copula-based fusion. We design detection algorithms based on sample-wise copula selection and mixture of copulas model in different scenarios of the true dependence. The proposed approaches are theoretically justified and perform well when applied to fuse acoustic and seismic sensor data for personnel detection. Besides traditional sensors, the access to the massive amount of social media data provides a unique opportunity for extracting information about unfolding events. We further study how sensor networks and social media complement each other in facilitating the data-to-decision making process. We propose a copula-based joint characterization of multiple dependent time series from sensors and social media. As a proof-of-concept, this model is applied to the fusion of Google Trends (GT) data and stock/flu data for prediction, where the stock/flu data serves as a surrogate for sensor data. In energy constrained networks, local observations are compressed before they are transmitted to the FC. In these cases, conditional dependence and heterogeneity complicate the system design particularly. We consider the classification of discrete random signals in Wireless Sensor Networks (WSNs), where, for communication efficiency, only local decisions are transmitted. We derive the necessary conditions for the optimal decision rules at the sensors and the FC by introducing a hidden random variable. An iterative algorithm is designed to search for the optimal decision rules. Its convergence and asymptotical optimality are also proved. The performance of the proposed scheme is illustrated for the distributed Automatic Modulation Classification (AMC) problem. Censoring is another communication efficient strategy, in which sensors transmit only informative observations to the FC, and censor those deemed uninformative . We design the detectors that take into account the spatial dependence among observations. Fusion rules for censored data are proposed with continuous and discrete local messages, respectively. Their computationally efficient counterparts based on the key idea of injecting controlled noise at the FC before fusion are also investigated. In this thesis, with heterogeneous and dependent sensor observations, we consider not only inference in parallel frameworks but also the problem of collaborative inference where collaboration exists among local sensors. Each sensor forms coalition with other sensors and shares information within the coalition, to maximize its inference performance. The collaboration strategy is investigated under a communication constraint. To characterize the influence of inter-sensor dependence on inference performance and thus collaboration strategy, we quantify the gain and loss in forming a coalition by introducing the copula-based definitions of diversity gain and redundancy loss for both estimation and detection problems. A coalition formation game is proposed for the distributed inference problem, through which the information contained in the inter-sensor dependence is fully explored and utilized for improved inference performance

    Scalable Learning Adaptive to Unknown Dynamics and Graphs

    Get PDF
    University of Minnesota Ph.D. dissertation.June 2019. Major: Electrical/Computer Engineering. Advisor: Georgios B. Giannakis. 1 computer file (PDF); xii, 174 pages.With the scale of information growing every day, the key challenges in machine learning include the high-dimensionality and sheer volume of feature vectors that may consist of real and categorical data, as well as the speed and the typically streaming format of data acquisition that may also entail outliers and misses. The latter may be present, either unintentionally or intentionally, in order to cope with scalability, privacy, and adversarial behavior. These challenges provide ample opportunities for algorithmic and analytical innovations in online and nonlinear subspace learning approaches. Among the available nonlinear learning tools, those based on kernels have merits that are well documented. However, most rely on a preselected kernel, whose prudent choice presumes task-specific prior information that is generally not available. It is also known that kernel-based methods do not scale well with the size or dimensionality of the data at hand. Besides data science, the urgent need for scalable tools is a core issue also in network science that has recently emerged as a means of collectively understanding the behavior of complex interconnected entities. The rich spectrum of application domains comprises communication, social, financial, gene-regulatory, brain, and power networks, to name a few. Prominent tasks in all network science applications are those of topology identification and inference of nodal processes evolving over graphs. Most contemporary graph-driven inference approaches rely on linear and static models that are simple and tractable, but also presume that the nodal processes are directly observable. To cope with these challenges, the present thesis first introduces a novel online categorical subspace learning approach to track the latent structure of categorical data `on the fly.' Leveraging the random feature approximation, it then develops an adaptive online multi-kernel learning approach (termed AdaRaker), which accounts not only for data-driven learning of the kernel combination, but also for the unknown dynamics. Performance analysis is provided in terms of both static and dynamic regrets to quantify the novel learning function approximation. In addition, the thesis introduces a kernel-based topology identification approach that can even account for nonlinear dependencies among nodes and across time. To cope with nodal processes that may not be directly observable in certain applications, tensor-based algorithms that leverage piecewise stationary statistics of nodal processes are developed, and pertinent identifiability conditions are established. To facilitate real-time operation and inference of time-varying networks, an adaptive tensor decomposition based scheme is put forth to track the topologies of time-varying networks. Last but not least, the present thesis offers a unifying framework to deal with various learning tasks over possibly dynamic networks. These tasks include dimensionality reduction, classification, and clustering. Tests on both synthetic and real datasets from the aforementioned application domains are carried out to showcase the effectiveness of the novel algorithms throughout
    • …
    corecore