292 research outputs found

    Sparse modelling and estimation for nonstationary time series and high-dimensional data

    Get PDF
    Sparse modelling has attracted great attention as an efficient way of handling statistical problems in high dimensions. This thesis considers sparse modelling and estimation in a selection of problems such as breakpoint detection in nonstationary time series, nonparametric regression using piecewise constant functions and variable selection in high-dimensional linear regression. We first propose a method for detecting breakpoints in the secondorder structure of piecewise stationary time series, assuming that those structural breakpoints are sufficiently scattered over time. Our choice of time series model is the locally stationary wavelet process (Nason et al., 2000), under which the entire second-order structure of a time series is described by wavelet-based local periodogram sequences. As the initial stage of breakpoint detection, we apply a binary segmentation procedure to wavelet periodogram sequences at each scale separately, which is followed by within-scale and across-scales postprocessing steps. We show that the combined methodology achieves consistent estimation of the breakpoints in terms of their total number and locations, and investigate its practical performance using both simulated and real data. Next, we study the problem of nonparametric regression by means of piecewise constant functions, which are known to be flexible in approximating a wide range of function spaces. Among many approaches developed for this purpose, we focus on comparing two well-performing techniques, the taut string (Davies & Kovac, 2001) and the Unbalanced Haar (Fryzlewicz, 2007) methods. While the multiscale nature of the latter is easily observed, it is not so obvious that the former can also be interpreted as multiscale. We provide a unified, multiscale representation for both methods, which offers an insight into the relationship between them as well as suggesting some lessons that both methods can learn from each other. Lastly, one of the most widely-studied applications of sparse modelling and estimation is considered, variable selection in high-dimensional linear regression. High dimensionality of the data brings in many complications including (possibly spurious) non-negligible correlations among the variables, which may result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response, which adaptively takes into account high correlations among the variables. A key ingredient of the proposed tilting procedure is hard-thresholding sample correlation of the design matrix, which enables a data-driven switch between the use of marginal correlation and tilted correlation for each variable. We study the conditions under which this measure can discriminate between relevant and irrelevant variables, and thus be used as a tool for variable selection. In order to exploit these theoretical properties of tilted correlation, we construct an iterative variable screening algorithm and examine its practical performance in a comparative simulation study

    Studies of several curious probabilistic phenomena: unobservable tail exponents in random difference equations, and confusion between models of long-range dependence and changes in regime

    Get PDF
    The dissertation is centered on two research topics. The first topic concerns reduction of bias in estimation of tail exponents in random difference equations (RDEs). The bias is due to deviations from the exact power-law tail, which are quantified by proving a weaker form of the so-called second-order regular variation of distribution tails of RDEs. In particular, the latter suggests that the distribution tails of RDEs have an explicitly known second-order power-law term. By taking this second-order term into account, a number of successful bias-reduced tail exponent estimators are proposed and examined. The second topic concerns the confusion between long-range dependent (LRD) time series and several nonstationary alternatives, such as changes in local mean level superimposed by short-range dependent series. Exploratory and informal tools based on the so-called unbalanced Haar transformation are first suggested and examined to assess the adequacy of LRD models in capturing changes in local mean in real time series. Second, formal statistical procedures are proposed to distinguish between LRD and alternative models, based on estimation of LRD parameter in time series after removing changes in local mean level. Basic asymptotic properties of the tests are studied and applications to several real time series are also discussed

    Detecting linear trend changes in data sequences

    Get PDF
    We propose TrendSegment, a methodology for detecting multiple change-points corresponding to linear trend changes in one dimensional data. A core ingredient of TrendSegment is a new Tail-Greedy Unbalanced Wavelet transform: a conditionally orthonormal, bottom-up transformation of the data through an adaptively constructed unbalanced wavelet basis, which results in a sparse representation of the data. Due to its bottom-up nature, this multiscale decomposition focuses on local features in its early stages and on global features next which enables the detection of both long and short linear trend segments at once. To reduce the computational complexity, the proposed method merges multiple regions in a single pass over the data. We show the consistency of the estimated number and locations of change-points. The practicality of our approach is demonstrated through simulations and two real data examples, involving Iceland temperature data and sea ice extent of the Arctic and the Antarctic. Our methodology is implemented in the R package trendsegmentR, available from CRAN

    Condition Monitoring of a Belt-Based Transmission System for Comau Racer3 Robots

    Get PDF
    This project has been developed in collaboration with Comau Robotics S.p.a and the main goal is the development in China of an Health Monitoring Pro-cess using vibration analysis. This project is connected to the activity of Cost Reduction carried out by the PD Cost Engineering Department in China. The Project is divided in two part: 1. Data Acquisition 2. Data Analysis An Automatic Acquisition of the moni.log file is carried out and is discussed in Chapter 1. As for the Data Analysis is concerned a data driven approach is considered and developed in frequency domain through the FFT transform and in time domain using the Wavelet transform. In Chapter 2 a list of the techiques used nowadays for the Signal Analysis and the Vibration Monitoring is shown in time domain, frequency domain and time-frequency domain. In Chapter 3 the state of art of the Condition Monitoring of all the possible ma-chinery part is carried out from the evaluation of the spectrum of the current and speed. In Chapter 4 are evaluated disturbances that are not related to a fault but be-long to a normal behaviour of the system acting on the measured forces. Motor Torque Ripple and Output Noise Resolution are disturbance dependent on ve-locity and are mentioned in comparison to the one related to the configuration of the Robot. In Chapter 5 a particular study case is assigned: the noise problem due to belt-based power transmission system of the axis three of a Racer 3 Robot in Endu-rance test. The chapter presents the test plan done including all the simula-tions. In Chapter 6 all the results are shown demostrating how the vibration analysis carried out from an external sensor can be confirmed looking at the spectral content of the speed and the current. In the last Chapter the final conclusions and a possible development of this thesis are presented considering both a a Model of Signal and a Model Based approach

    An Introduction to Applications of Wavelet Benchmarking with Seasonal Adjustment

    Get PDF
    Summary Before adjustment, low and high frequency data sets from national accounts are frequently inconsistent. Benchmarking is the procedure used by economic agencies to make such data sets consistent. It typically involves adjusting the high frequency time series (e.g. quarterly data) so that they become consistent with the lower frequency version (e.g. annual data). Various methods have been developed to approach this problem of inconsistency between data sets. The paper introduces a new statistical procedure, namely wavelet benchmarking. Wavelet properties allow high and low frequency processes to be jointly analysed and we show that benchmarking can be formulated and approached succinctly in the wavelet domain. Furthermore the time and frequency localization properties of wavelets are ideal for handling more complicated benchmarking problems. The versatility of the procedure is demonstrated by using simulation studies where we provide evidence showing that it substantially outperforms currently used methods. Finally, we apply this novel method of wavelet benchmarking to official data from the UK's Office for National Statistics.Engineering and Physical Sciences Research CouncilThis is the final version of the article. It first appeared from Wiley via https://doi.org/10.1111/rssa.1224

    Tail-greedy bottom-up data decompositions and fast mulitple change-point detection

    Get PDF
    This article proposes a ‘tail-greedy’, bottom-up transform for one-dimensional data, which results in a nonlinear but conditionally orthonormal, multiscale decomposition of the data with respect to an adaptively chosen Unbalanced Haar wavelet basis. The ‘tail-greediness’of the decomposition algorithm, whereby multiple greedy steps are taken in a single pass through the data, both enables fast computation and makes the algorithm applicable in the problem of consistent estimation of the number and locations of multiple changepoints in data. The resulting agglomerative change-point detection method avoids the disadvantages of the classical divisive binary segmentation, and offers very good practical performance. It is implemented in the R package breakfast, available from CRAN

    Application of Wavelet Analysis in Power Systems

    Get PDF

    Multiscale copy number alteration analysis using wavelets

    Get PDF
    The need for multiscale modelling comes from the fact that it is rare for measured data to contain contributions at a single scale. For example, a typical signal from an experimental process may contain contributions from a variety of sources, such as noise and faults. These features usually occur with different localisation and at different locations in time and frequency. It is also inevitable for copy number DNA sequencing. Identifying Copy Number Alteration (CNA) from a sample cell faces difficulties due to errors, different sizes of reads being recorded, infiltration from normal cells, and different sizes of test and normal genomes. Thus, the representation of the measurements in terms of multiscale offers efficient feature extraction or noise removal from a typical process signal. One of the powerful tools used to extract the multiscale characteristics of the observed data is wavelets. Wavelets are mathematical expansions that are able to transform data from the time domain into different layers of frequency levels. In this thesis, wavelets are used, first, to segment the CNA data into regions of equal copy number and secondly, to extract useful information from the original data for a better prediction of tumour subtypes. For the first purpose, an approach called TGUHm method is presented which applies the tail-greedy unbalanced Haar (TGUH) wavelet transform to perform segmentation of CNA data. The `unbalanced' characteristic of the TGUH approach gives the advantage that the data length does not have to be a power of two as in the traditional discrete Haar wavelet method. An additional benefit is it can address the problem that commonly arises in Haar wavelet estimation where the estimator is more likely to detect jumps at dyadic locations which might not be the actual locations of the jumps/drops in the true underlying CNA pattern. The TGUHm method is then applied to the existing data-driven wavelet-Fisz methodology to deal with the heteroscedastic noise problem that we often find in CNA data. In practice, real CNA data deviate from homoscedastic noise assumption and indicate some dependencies of the variance on the mean value. The proposed method performs variance stabilisation to bring the problem into a homoscedastic model before applying a denoising procedure. The use of the unbalanced Haar wavelet also makes it possible to estimate short segments better than the balanced Haar wavelet-based segmentation methods. Moreover, our simulation study indicates that the proposed methodology has substantial advantages in estimating both short and long-altered segments in copy number data with heteroscedastic error variance. For the second purpose, a wavelet-based classification framework was proposed which employs non-decimated Haar wavelet transform to extract localised differences and means of the original data into several scales. The wavelet transformation decomposes the original data into detail (localised difference) and scaling (localised means) coefficients into different resolution levels. This would bring an advantage to discover hidden features or information which are difficult to find from original data only. Each resolution level corresponds to a different length of wavelet basis and by considering which levels are most useful in a model, the length of the region that is responsible for the prediction could be identified

    Methods for change-point detection with additional interpretability

    Get PDF
    The main purpose of this dissertation is to introduce and critically assess some novel statistical methods for change-point detection that help better understand the nature of processes underlying observable time series. First, we advocate the use of change-point detection for local trend estimation in financial return data and propose a new approach developed to capture the oscillatory behaviour of financial returns around piecewise-constant trend functions. Core of the method is a data-adaptive hierarchically-ordered basis of Unbalanced Haar vectors which decomposes the piecewise-constant trend underlying observed daily returns into a binary-tree structure of one-step constant functions. We illustrate how this framework can provide a new perspective for the interpretation of change points in financial returns. Moreover, the approach yields a family of forecasting operators for financial return series which can be adjusted flexibly depending on the forecast horizon or the loss function. Second, we discuss change-point detection under model misspecification, focusing in particular on normally distributed data with changing mean and variance. We argue that ignoring the presence of changes in mean or variance when testing for changes in, respectively, variance or mean, can affect the application of statistical methods negatively. After illustrating the difficulties arising from this kind of model misspecification we propose a new method to address these using sequential testing on intervals with varying length and show in a simulation study how this approach compares to competitors in mixed-change situations. The third contribution of this thesis is a data-adaptive procedure to evaluate EEG data, which can improve the understanding of an epileptic seizure recording. This change-point detection method characterizes the evolution of frequencyspecific energy as measured on the human scalp. It provides new insights to this high dimensional high frequency data and has attractive computational and scalability features. In addition to contrasting our method with existing approaches, we analyse and interpret the method’s output in the application to a seizure data set

    Topics on Multiresolution Signal Processing and Bayesian Modeling with Applications in Bioinformatics

    Get PDF
    Analysis of multi-resolution signals and time-series data has wide applications in biology, medicine, engineering, etc. In many cases, the large-scale (low-frequency) features of a signal including basic descriptive statistics, trends, smoothed functional estimates, do not carry useful information about the phenomenon of interest. On the other hand, the study of small-scale (high-frequency) features that look like noise may be more informative even though extracting such informative features are not always straightforward. In this dissertation we try to address some of the issues pertaining to high-frequency features extraction and denoising of noisy signals. Another topic studied in this dissertation is focused on the integration of genome data with transatlantic voyage data of enslaved people from Africa to determine the ancestry origin of Afro-Americans. Chapter 2. Assessment of Scaling by Auto-Correlation Shells. In this chapter, we utilize the Auto-Correlation (AC) Shell to propose a feature extraction method that can effectively capture small-scale information of a signal. The AC Shell is a redundant shift-invariant and symmetric representation of the signal that is obtained by using Auto-Correlation function of compactly supported wavelets. The small-scale features are extracted by computing the energy of AC Shell coefficients at different levels of decomposition as well as the slope of the line fitted to these energy values using AC Shell spectra. We discuss the theoretical properties and verify them using extensive simulations. We compare the extracted features from AC Shell with those of Wavelets in terms of bias, variance, and mean square error (MSE). The results indicate that the AC Shell features tend to have smaller variance, hence more reliable. Moreover, to show its effectiveness, we validate our feature extraction method in the context of classification to identify patients with ovarian cancer through the analysis of their blood mass spectrum. For this study, we use the features extracted by AC Shell spectra along with a support vector machine classifier to distinguish control from cancer cases. Chapter 3. Bayesian Binary Regressions in Wavelet-based Function Estimation. Wavelet shrinkage has been widely used in nonparametric statistics and signal processing for a variety of purposes including denoising noisy signals and images, dimension reduction, and variable/feature selection. Although the traditional wavelet shrinkage methods are effective and popular, they have one major drawback. In these methods the shrinkage process only relies on the information of the coefficient being thresholded and the information contained in the neighboring coefficients is ignored. Similarly, the standard AC Shell denoising methods shrink the empirical coefficients independently, by comparing their magnitudes with a threshold value. The information of other coefficients has no influence on behavior of a particular coefficients. However, due to redundant representation of signals and coefficients obtained by AC Shells, the dependency of neighboring coefficients and the amount of shared information between them increases. Therefore, it would be vital to propose a new thresholding approach for AC Shells coefficients that considers the information of neighboring coefficients. In this chapter, we develop a new Bayesian denoising for AC Shell coefficients approach that integrates logistic regression, universal thresholding and Bayesian inference. We validate the proposed method using extensive simulations with various types of smooth and non-smooth signals. The results indicate that for all signal types including the neighbor coefficients would improve the denoising process, resulting in lower MSEs. Moreover, we applied our proposed methodology to a case study of denoising Atomic Force Microscopy (AFM) signals measuring the adhesion strength between two materials at the nano-newton scale to correctly identify the cantilever detachment point. Chapter 4. Bayesian Method in Combining Genetic and Historical Records of Transatlantic Slave Trade in the Americas. In the era between 1515 and 1865, more than 12 million people were enslaved and forced to move from Africa to North and Latin America. The shipping documents have recorded the origin and disembarkation of enslaved people. Traditionally, genealogy study has been done via the exploration of historical records, family tress and birth certificates. Due to recent advancements in the field of genetics, genealogy has been revolutionized and become more accurate. Although these methods can provide continental differentiation, they have poor spatial resolution that makes it hard to localize ancestry assignment as these markers are distributed across different sub-continental regions. To overcome the foregoing drawbacks, in this chapter, we propose a hybrid approach that combines the genetic markers results with the historical records of transatlantic voyage of enslaved people. Addition of the journey data can provide with substantially increased resolution in ancestry assignment, using a Bayesian modeling framework. The proposed Bayesian framework uses the voyage data from historical records available in the transatlantic slave trade database as prior probabilities and combine them with genetic markers of Afro-Americans, considered as the likelihood information to estimate the posterior (updated) probabilities of their ancestry assignments to geographical regions in Africa. We applied the proposed methodology to 60 Afro-American individuals and show that the prior information has increased the assignment probabilities obtained by the posterior distributions for some of the regions.Ph.D
    • …
    corecore