67 research outputs found
Anomaly Detection and Exploratory Causal Analysis for SAP HANA
Nowadays, the good functioning of the equipment, networks and systems will be the key for the business of a company to continue operating because it is never avoidable for the companies to use information technology to support their business in the era of big data. However, the technology is never infallible, faults that give rise to sometimes critical situations may appear at any time. To detect and prevent failures, it is very essential to have a good monitoring system which is responsible for controlling the technology used by a company (hardware, networks and communications, operating systems or applications, among others) in order to analyze their operation and performance, and to detect and alert about possible errors. The aim of this thesis is thus to further advance the field of anomaly detection and exploratory causal inference which are two major research areas in a monitoring system, to provide efficient algorithms with regards to the usability, maintainability and scalability. The analyzed results can be viewed as a starting point for the root cause analysis of the system performance issues and to avoid falls in the system or minimize the time of resolution of the issues in the future. The algorithms were performed on the historical data of SAP HANA database at last and the results gained in this thesis indicate that the tools have succeeded in providing some useful information for diagnosing the performance issues of the system
Transfer Entropy Estimation and Directional Coupling Change Detection in Biomedical Time Series
Background: The detection of change in magnitude of directional coupling
between two non-linear time series is a common subject of interest in the
biomedical domain, including studies involving the respiratory chemoreflex system.
Although transfer entropy is a useful tool in this avenue, no study to date has
investigated how different transfer entropy estimation methods perform in typical
biomedical applications featuring small sample size and presence of outliers.
Methods: With respect to detection of increased coupling strength, we compared
three transfer entropy estimation techniques using both simulated time series and
respiratory recordings from lambs. The following estimation methods were analyzed:
fixed-binning with ranking, kernel density estimation (KDE), and the Darbellay-Vajda
(D-V) adaptive partitioning algorithm extended to three dimensions. In the simulated
experiment, sample size was varied from 50 to 200, while coupling strength was
increased. In order to introduce outliers, the heavy-tailed Laplace distribution was
utilized. In the lamb experiment, the objective was to detect increased respiratoryrelated chemosensitivity to O[subscript 2] and CO[subscript 2] induced by a drug, domperidone. Specifically, the separate influence of end-tidal PO[subscript 2] and PCO[subscript 2] on minute ventilation ([dot over V][subscript E]) before and after administration of domperidone was analyzed.
Results: In the simulation, KDE detected increased coupling strength at the lowest
SNR among the three methods. In the lamb experiment, D-V partitioning resulted in
the statistically strongest increase in transfer entropy post-domperidone for
PO2 → [dot over V][subscript E]. In addition, D-V partitioning was the only method that could detect an increase in transfer entropy for PCO[subscript 2] → [dot over V][subscript E], in agreement with experimental findings.
Conclusions: Transfer entropy is capable of detecting directional coupling changes
in non-linear biomedical time series analysis featuring a small number of
observations and presence of outliers. The results of this study suggest that fixed-binning, even with ranking, is too primitive, and although there is no clear winner
between KDE and D-V partitioning, the reader should note that KDE requires more
computational time and extensive parameter selection than D-V partitioning. We
hope this study provides a guideline for selection of an appropriate transfer entropy
estimation method.National Institutes of Health (U.S.) (Grant R01-EB001659)National Institutes of Health (U.S.) (Grant R01- HL73146)National Institutes of Health (U.S.) (Grant HL085188-01A2)National Institutes of Health (U.S.) (Grant HL090897-01A2)National Institutes of Health (U.S.) (Grant K24 HL093218-01A1)National Institutes of Health (U.S.) (Cooperative Agreement U01-EB-008577)National Institutes of Health (U.S.) (Training Grant T32-HL07901))American Heart Association (Grant 0840159N
Entropy Analysis of Univariate Biomedical Signals:Review and Comparison of Methods
International audienc
Dynamics of Information Diffusion and Social Sensing
Statistical inference using social sensors is an area that has witnessed
remarkable progress and is relevant in applications including localizing events
for targeted advertising, marketing, localization of natural disasters and
predicting sentiment of investors in financial markets. This chapter presents a
tutorial description of four important aspects of sensing-based information
diffusion in social networks from a communications/signal processing
perspective. First, diffusion models for information exchange in large scale
social networks together with social sensing via social media networks such as
Twitter is considered. Second, Bayesian social learning models and risk averse
social learning is considered with applications in finance and online
reputation systems. Third, the principle of revealed preferences arising in
micro-economics theory is used to parse datasets to determine if social sensors
are utility maximizers and then determine their utility functions. Finally, the
interaction of social sensors with YouTube channel owners is studied using time
series analysis methods. All four topics are explained in the context of actual
experimental datasets from health networks, social media and psychological
experiments. Also, algorithms are given that exploit the above models to infer
underlying events based on social sensing. The overview, insights, models and
algorithms presented in this chapter stem from recent developments in network
science, economics and signal processing. At a deeper level, this chapter
considers mean field dynamics of networks, risk averse Bayesian social learning
filtering and quickest change detection, data incest in decision making over a
directed acyclic graph of social sensors, inverse optimization problems for
utility function estimation (revealed preferences) and statistical modeling of
interacting social sensors in YouTube social networks.Comment: arXiv admin note: text overlap with arXiv:1405.112
Heterogeneous Sensor Signal Processing for Inference with Nonlinear Dependence
Inferring events of interest by fusing data from multiple heterogeneous sources has been an interesting and important topic in recent years. Several issues related to inference using heterogeneous data with complex and nonlinear dependence are investigated in this dissertation. We apply copula theory to characterize the dependence among heterogeneous data.
In centralized detection, where sensor observations are available at the fusion center (FC), we study copula-based fusion. We design detection algorithms based on sample-wise copula selection and mixture of copulas model in different scenarios of the true dependence. The proposed approaches are theoretically justified and perform well when applied to fuse acoustic and seismic sensor data for personnel detection. Besides traditional sensors, the access to the massive amount of social media data provides a unique opportunity for extracting information about unfolding events. We further study how sensor networks and social media complement each other in facilitating the data-to-decision making process. We propose a copula-based joint characterization of multiple dependent time series from sensors and social media. As a proof-of-concept, this model is applied to the fusion of Google Trends (GT) data and stock/flu data for prediction, where the stock/flu data serves as a surrogate for sensor data.
In energy constrained networks, local observations are compressed before they are transmitted to the FC. In these cases, conditional dependence and heterogeneity complicate the system design particularly. We consider the classification of discrete random signals in Wireless Sensor Networks (WSNs), where, for communication efficiency, only local decisions are transmitted. We derive the necessary conditions for the optimal decision rules at the sensors and the FC by introducing a hidden random variable. An iterative algorithm is designed to search for the optimal decision rules. Its convergence and asymptotical optimality are also proved. The performance of the proposed scheme is illustrated for the distributed Automatic Modulation Classification (AMC) problem. Censoring is another communication efficient strategy, in which sensors transmit only informative observations to the FC, and censor those deemed uninformative . We design the detectors that take into account the spatial dependence among observations. Fusion rules for censored data are proposed with continuous and discrete local messages, respectively. Their computationally efficient counterparts based on the key idea of injecting controlled noise at the FC before fusion are also investigated.
In this thesis, with heterogeneous and dependent sensor observations, we consider not only inference in parallel frameworks but also the problem of collaborative inference where collaboration exists among local sensors. Each sensor forms coalition with other sensors and shares information within the coalition, to maximize its inference performance. The collaboration strategy is investigated under a communication constraint. To characterize the influence of inter-sensor dependence on inference performance and thus collaboration strategy, we quantify the gain and loss in forming a coalition by introducing the copula-based definitions of diversity gain and redundancy loss for both estimation and detection problems. A coalition formation game is proposed for the distributed inference problem, through which the information contained in the inter-sensor dependence is fully explored and utilized for improved inference performance
Scalable Learning Adaptive to Unknown Dynamics and Graphs
University of Minnesota Ph.D. dissertation.June 2019. Major: Electrical/Computer Engineering. Advisor: Georgios B. Giannakis. 1 computer file (PDF); xii, 174 pages.With the scale of information growing every day, the key challenges in machine learning include the high-dimensionality and sheer volume of feature vectors that may consist of real and categorical data, as well as the speed and the typically streaming format of data acquisition that may also entail outliers and misses. The latter may be present, either unintentionally or intentionally, in order to cope with scalability, privacy, and adversarial behavior. These challenges provide ample opportunities for algorithmic and analytical innovations in online and nonlinear subspace learning approaches. Among the available nonlinear learning tools, those based on kernels have merits that are well documented. However, most rely on a preselected kernel, whose prudent choice presumes task-specific prior information that is generally not available. It is also known that kernel-based methods do not scale well with the size or dimensionality of the data at hand. Besides data science, the urgent need for scalable tools is a core issue also in network science that has recently emerged as a means of collectively understanding the behavior of complex interconnected entities. The rich spectrum of application domains comprises communication, social, financial, gene-regulatory, brain, and power networks, to name a few. Prominent tasks in all network science applications are those of topology identification and inference of nodal processes evolving over graphs. Most contemporary graph-driven inference approaches rely on linear and static models that are simple and tractable, but also presume that the nodal processes are directly observable. To cope with these challenges, the present thesis first introduces a novel online categorical subspace learning approach to track the latent structure of categorical data `on the fly.' Leveraging the random feature approximation, it then develops an adaptive online multi-kernel learning approach (termed AdaRaker), which accounts not only for data-driven learning of the kernel combination, but also for the unknown dynamics. Performance analysis is provided in terms of both static and dynamic regrets to quantify the novel learning function approximation. In addition, the thesis introduces a kernel-based topology identification approach that can even account for nonlinear dependencies among nodes and across time. To cope with nodal processes that may not be directly observable in certain applications, tensor-based algorithms that leverage piecewise stationary statistics of nodal processes are developed, and pertinent identifiability conditions are established. To facilitate real-time operation and inference of time-varying networks, an adaptive tensor decomposition based scheme is put forth to track the topologies of time-varying networks. Last but not least, the present thesis offers a unifying framework to deal with various learning tasks over possibly dynamic networks. These tasks include dimensionality reduction, classification, and clustering. Tests on both synthetic and real datasets from the aforementioned application domains are carried out to showcase the effectiveness of the novel algorithms throughout
- …