88 research outputs found

    Identification of Nonstandard Multifractional Brownian Motions under White Noise by Multiscale Local Variations of Its Sample Paths

    Get PDF
    The Hurst exponent and variance are two quantities that often characterize real-life, high-frequency observations. Such real-life signals are generally measured under noise environments. We develop a multiscale statistical method for simultaneous estimation of a time-changing Hurst exponent H(t) and a variance parameter C in a multifractional Brownian motion model in the presence of white noise. The method is based on the asymptotic behavior of the local variation of its sample paths which applies to coarse scales of the sample paths. This work provides stable and simultaneous estimators of both parameters when independent white noise is present. We also discuss the accuracy of the simultaneous estimators compared with a few selected methods and the stability of computations with regard to adapted wavelet filters

    Functional data mining with multiscale statistical procedures

    Get PDF
    Hurst exponent and variance are two quantities that often characterize real-life, highfrequency observations. We develop the method for simultaneous estimation of a timechanging Hurst exponent H(t) and constant scale (variance) parameter C in a multifractional Brownian motion model in the presence of white noise based on the asymptotic behavior of the local variation of its sample paths. We also discuss the accuracy of the stable and simultaneous estimator compared with a few selected methods and the stability of computations that use adapted wavelet filters. Multifractals have become popular as flexible models in modeling real-life data of high frequency. We developed a method of testing whether the data of high frequency is consistent with monofractality using meaningful descriptors coming from a wavelet-generated multifractal spectrum. We discuss theoretical properties of the descriptors, their computational implementation, the use in data mining, and the effectiveness in the context of simulations, an application in turbulence, and analysis of coding/noncoding regions in DNA sequences. The wavelet thresholding is a simple and effective operation in wavelet domains that selects the subset of wavelet coefficients from a noised signal. We propose the selection of this subset in a semi-supervised fashion, in which a neighbor structure and classification function appropriate for wavelet domains are utilized. The decision to include an unlabeled coefficient in the model depends not only on its magnitude but also on the labeled and unlabeled coefficients from its neighborhood. The theoretical properties of the method are discussed and its performance is demonstrated on simulated examples.Ph.D.Committee Chair: Brani Vidakovic; Committee Member: Justin Romberg; Committee Member: Ming Yuan; Committee Member: Paul Kvam; Committee Member: Xiaoming Hu

    Anomaly Detection Using an Ensemble of Multi-Point LSTMs

    No full text
    As technologies for storing time-series data such as smartwatches and smart factories become common, we are collectively accumulating a great deal of time-series data. With the accumulation of time-series data, the importance of time-series abnormality detection technology that detects abnormal patterns such as Cyber-Intrusion Detection, Fraud Detection, Social Networks Anomaly Detection, and Industrial Anomaly Detection is emerging. In the past, time-series anomaly detection algorithms have mainly focused on processing univariate data. However, with the development of technology, time-series data has become complicated, and corresponding deep learning-based time-series anomaly detection technology has been actively developed. Currently, most industries rely on deep learning algorithms to detect time-series anomalies. In this paper, we propose an anomaly detection algorithm with an ensemble of multi-point LSTMs that can be used in three cases of time-series domains. We propose our anomaly detection model that uses three steps. The first step is a model selection step, in which a model is learned within a user-specified range, and among them, models that are most suitable are automatically selected. In the next step, a collected output vector from M LSTMs is completed by stacking ensemble techniques of the previously selected models. In the final step, anomalies are finally detected using the output vector of the second step. We conducted experiments comparing the performance of the proposed model with other state-of-the-art time-series detection deep learning models using three real-world datasets. Our method shows excellent accuracy, efficient execution time, and a good F1 score for the three datasets, though training the LSTM ensemble naturally requires more time

    Informative Language Encoding by Variational Autoencoders Using Transformer

    No full text
    In natural language processing (NLP), Transformer is widely used and has reached the state-of-the-art level in numerous NLP tasks such as language modeling, summarization, and classification. Moreover, a variational autoencoder (VAE) is an efficient generative model in representation learning, combining deep learning with statistical inference in encoded representations. However, the use of VAE in natural language processing often brings forth practical difficulties such as a posterior collapse, also known as Kullback–Leibler (KL) vanishing. To mitigate this problem, while taking advantage of the parallelization of language data processing, we propose a new language representation model as the integration of two seemingly different deep learning models, which is a Transformer model solely coupled with a variational autoencoder. We compare the proposed model with previous works, such as a VAE connected with a recurrent neural network (RNN). Our experiments with four real-life datasets show that implementation with KL annealing mitigates posterior collapses. The results also show that the proposed Transformer model outperforms RNN-based models in reconstruction and representation learning, and that the encoded representations of the proposed model are more informative than other tested models

    Selection of Support Vector Candidates Using Relative Support Distance for Sustainability in Large-Scale Support Vector Machines

    No full text
    Support vector machines (SVMs) are a well-known classifier due to their superior classification performance. They are defined by a hyperplane, which separates two classes with the largest margin. In the computation of the hyperplane, however, it is necessary to solve a quadratic programming problem. The storage cost of a quadratic programming problem grows with the square of the number of training sample points, and the time complexity is proportional to the cube of the number in general. Thus, it is worth studying how to reduce the training time of SVMs without compromising the performance to prepare for sustainability in large-scale SVM problems. In this paper, we proposed a novel data reduction method for reducing the training time by combining decision trees and relative support distance. We applied a new concept, relative support distance, to select good support vector candidates in each partition generated by the decision trees. The selected support vector candidates improved the training speed for large-scale SVM problems. In experiments, we demonstrated that our approach significantly reduced the training time while maintaining good classification performance in comparison with existing approaches

    Informative Language Encoding by Variational Autoencoders Using Transformer

    No full text
    In natural language processing (NLP), Transformer is widely used and has reached the state-of-the-art level in numerous NLP tasks such as language modeling, summarization, and classification. Moreover, a variational autoencoder (VAE) is an efficient generative model in representation learning, combining deep learning with statistical inference in encoded representations. However, the use of VAE in natural language processing often brings forth practical difficulties such as a posterior collapse, also known as Kullback–Leibler (KL) vanishing. To mitigate this problem, while taking advantage of the parallelization of language data processing, we propose a new language representation model as the integration of two seemingly different deep learning models, which is a Transformer model solely coupled with a variational autoencoder. We compare the proposed model with previous works, such as a VAE connected with a recurrent neural network (RNN). Our experiments with four real-life datasets show that implementation with KL annealing mitigates posterior collapses. The results also show that the proposed Transformer model outperforms RNN-based models in reconstruction and representation learning, and that the encoded representations of the proposed model are more informative than other tested models

    Air Pollution Prediction Using an Ensemble of Dynamic Transfer Models for Multivariate Time Series

    No full text
    Entering a new era of big data, analysis of large amounts of real-time data is important, and air quality data as streaming time series are measured by several different sensors. To this end, numerous methods for time-series forecasting and deep-learning approaches based on neural networks have been used. However, they usually rely on a certain model with a stationary condition, and there are few studies of real-time prediction of dynamic massive multivariate data. Use of a variety of independent variables included in the data is important to improve forecasting performance. In this paper, we proposed a real-time prediction approach based on an ensemble method for multivariate time-series data. The suggested method can select multivariate time-series variables and incorporate real-time updatable autoregressive models in terms of performance. We verified the proposed model using simulated data and applied it to predict air quality measured by five sensors and failures based on real-time performance log data in server systems. We found that the proposed method for air pollution prediction showed effective and stable performance for both short- and long-term prediction tasks. In addition, traditional methods for abnormality detection have focused on present status of objects as either normal or abnormal based on provided data, we protectively predict expected statuses of objects with provided real-time data and implement effective system management in cloud environments through the proposed method

    Identification of Nonstandard Multifractional Brownian Motions under White Noise by Multiscale Local Variations of Its Sample Paths

    No full text
    The Hurst exponent and variance are two quantities that often characterize real-life, high-frequency observations. Such real-life signals are generally measured under noise environments. We develop a multiscale statistical method for simultaneous estimation of a time-changing Hurst exponent ( ) and a variance parameter in a multifractional Brownian motion model in the presence of white noise. The method is based on the asymptotic behavior of the local variation of its sample paths which applies to coarse scales of the sample paths. This work provides stable and simultaneous estimators of both parameters when independent white noise is present. We also discuss the accuracy of the simultaneous estimators compared with a few selected methods and the stability of computations with regard to adapted wavelet filters

    Air Pollution Prediction Using an Ensemble of Dynamic Transfer Models for Multivariate Time Series

    No full text
    Entering a new era of big data, analysis of large amounts of real-time data is important, and air quality data as streaming time series are measured by several different sensors. To this end, numerous methods for time-series forecasting and deep-learning approaches based on neural networks have been used. However, they usually rely on a certain model with a stationary condition, and there are few studies of real-time prediction of dynamic massive multivariate data. Use of a variety of independent variables included in the data is important to improve forecasting performance. In this paper, we proposed a real-time prediction approach based on an ensemble method for multivariate time-series data. The suggested method can select multivariate time-series variables and incorporate real-time updatable autoregressive models in terms of performance. We verified the proposed model using simulated data and applied it to predict air quality measured by five sensors and failures based on real-time performance log data in server systems. We found that the proposed method for air pollution prediction showed effective and stable performance for both short- and long-term prediction tasks. In addition, traditional methods for abnormality detection have focused on present status of objects as either normal or abnormal based on provided data, we protectively predict expected statuses of objects with provided real-time data and implement effective system management in cloud environments through the proposed method

    Finding Effective Item Assignment Plans with Weighted Item Associations Using A Hybrid Genetic Algorithm

    No full text
    By identifying useful relationships between massive datasets, association rule mining can provide new insights to decision-makers. Item assignment models based on association between items are used to place items in a retail or e-commerce environment to increase sales. However, existing models fail to combine these associations with item-specific information, such as profit and purchasing frequency. To find effective assignments with item-specific information, we propose a new hybrid genetic algorithm that incorporates a robust tabu search with a novel rectangular partially matched crossover, focusing on rectangular layouts. Interestingly, we show that our item assignment model is equivalent to popular quadratic assignment NP-hard problems. We show the effectiveness of the proposed algorithm, using benchmark instances from QAPLIB and synthetic databases that represent real-life retail situations, and compare our algorithm with other existing algorithms. We also show that the proposed crossover operator outperforms a few existing ones in both fitness values and search times. The experimental results show that not only does the proposed item assignment model generates a more profitable assignment plan than the other tested models based on association alone but it also obtains better solutions than the other tested algorithms
    corecore