7,696 research outputs found

    An exact sinΘ\Theta formula for matrix perturbation analysis and its applications

    Full text link
    In this paper, we establish a useful set of formulae for the sin⁑Θ\sin\Theta distance between the original and the perturbed singular subspaces. These formulae explicitly show that how the perturbation of the original matrix propagates into singular vectors and singular subspaces, thus providing a direct way of analyzing them. Following this, we derive a collection of new results on SVD perturbation related problems, including a tighter bound on the β„“2,∞\ell_{2,\infty} norm of the singular vector perturbation errors under Gaussian noise, a new stability analysis of the Principal Component Analysis and an error bound on the singular value thresholding operator. For the latter two, we consider the most general rectangular matrices with full matrix rank

    Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

    Full text link
    Logs have been widely adopted in software system development and maintenance because of the rich system runtime information they contain. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on intelligent log analytics powered by AI (artificial intelligence) techniques. However, only a small fraction of these techniques have reached successful deployment in industry because of the lack of public log datasets and necessary benchmarking upon them. To fill this significant gap between academia and industry and also facilitate more research on AI-powered log analytics, we have collected and organized loghub, a large collection of log datasets. In particular, loghub provides 17 real-world log datasets collected from a wide range of systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical log usage scenarios, and present a case study on anomaly detection to demonstrate how loghub facilitates the research and practice in this field. Up to the time of this paper writing, loghub datasets have been downloaded over 15,000 times by more than 380 organizations from both industry and academia.Comment: Dateset available at https://zenodo.org/record/322717
    • …
    corecore