115,819 research outputs found

    Online Spectral Clustering on Network Streams

    Get PDF
    Graph is an extremely useful representation of a wide variety of practical systems in data analysis. Recently, with the fast accumulation of stream data from various type of networks, significant research interests have arisen on spectral clustering for network streams (or evolving networks). Compared with the general spectral clustering problem, the data analysis of this new type of problems may have additional requirements, such as short processing time, scalability in distributed computing environments, and temporal variation tracking. However, to design a spectral clustering method to satisfy these requirements certainly presents non-trivial efforts. There are three major challenges for the new algorithm design. The first challenge is online clustering computation. Most of the existing spectral methods on evolving networks are off-line methods, using standard eigensystem solvers such as the Lanczos method. It needs to recompute solutions from scratch at each time point. The second challenge is the parallelization of algorithms. To parallelize such algorithms is non-trivial since standard eigen solvers are iterative algorithms and the number of iterations can not be predetermined. The third challenge is the very limited existing work. In addition, there exists multiple limitations in the existing method, such as computational inefficiency on large similarity changes, the lack of sound theoretical basis, and the lack of effective way to handle accumulated approximate errors and large data variations over time. In this thesis, we proposed a new online spectral graph clustering approach with a family of three novel spectrum approximation algorithms. Our algorithms incrementally update the eigenpairs in an online manner to improve the computational performance. Our approaches outperformed the existing method in computational efficiency and scalability while retaining competitive or even better clustering accuracy. We derived our spectrum approximation techniques GEPT and EEPT through formal theoretical analysis. The well established matrix perturbation theory forms a solid theoretic foundation for our online clustering method. We facilitated our clustering method with a new metric to track accumulated approximation errors and measure the short-term temporal variation. The metric not only provides a balance between computational efficiency and clustering accuracy, but also offers a useful tool to adapt the online algorithm to the condition of unexpected drastic noise. In addition, we discussed our preliminary work on approximate graph mining with evolutionary process, non-stationary Bayesian Network structure learning from non-stationary time series data, and Bayesian Network structure learning with text priors imposed by non-parametric hierarchical topic modeling

    Sequential Gaussian Processes for Online Learning of Nonstationary Functions

    Full text link
    Many machine learning problems can be framed in the context of estimating functions, and often these are time-dependent functions that are estimated in real-time as observations arrive. Gaussian processes (GPs) are an attractive choice for modeling real-valued nonlinear functions due to their flexibility and uncertainty quantification. However, the typical GP regression model suffers from several drawbacks: i) Conventional GP inference scales O(N3)O(N^{3}) with respect to the number of observations; ii) updating a GP model sequentially is not trivial; and iii) covariance kernels often enforce stationarity constraints on the function, while GPs with non-stationary covariance kernels are often intractable to use in practice. To overcome these issues, we propose an online sequential Monte Carlo algorithm to fit mixtures of GPs that capture non-stationary behavior while allowing for fast, distributed inference. By formulating hyperparameter optimization as a multi-armed bandit problem, we accelerate mixing for real time inference. Our approach empirically improves performance over state-of-the-art methods for online GP estimation in the context of prediction for simulated non-stationary data and hospital time series data

    Algorithmic options for joint time-frequency analysis in structural dynamics applications

    Get PDF
    The purpose of this paper is to present recent research efforts by the authors supporting the superiority of joint time-frequency analysis over the traditional Fourier transform in the study of non-stationary signals commonly encountered in the fields of earthquake engineering, and structural dynamics. In this respect, three distinct signal processing techniques appropriate for the representation of signals in the time-frequency plane are considered. Namely, the harmonic wavelet transform, the adaptive chirplet decomposition, and the empirical mode decomposition, are utilized to analyze certain seismic accelerograms, and structural response records. Numerical examples associated with the inelastic dynamic response of a seismically-excited 3-story benchmark steel-frame building are included to show how the mean-instantaneous-frequency, as derived by the aforementioned techniques, can be used as an indicator of global structural damage

    Spectral analysis of stationary random bivariate signals

    Full text link
    A novel approach towards the spectral analysis of stationary random bivariate signals is proposed. Using the Quaternion Fourier Transform, we introduce a quaternion-valued spectral representation of random bivariate signals seen as complex-valued sequences. This makes possible the definition of a scalar quaternion-valued spectral density for bivariate signals. This spectral density can be meaningfully interpreted in terms of frequency-dependent polarization attributes. A natural decomposition of any random bivariate signal in terms of unpolarized and polarized components is introduced. Nonparametric spectral density estimation is investigated, and we introduce the polarization periodogram of a random bivariate signal. Numerical experiments support our theoretical analysis, illustrating the relevance of the approach on synthetic data.Comment: 11 pages, 3 figure
    corecore