58,292 research outputs found

    Using robust FPCA to identify outliers in functional time series, with applications to the electricity market

    Get PDF
    This study proposes two methods for detecting outliers in functional time series. Both methods take dependence in the data into account and are based on robust functional principal component analysis. One method seeks outliers in the series of projections on the first principal component. The other obtains uncontaminated forecasts for each data set and determines that those observations whose residuals have an unusually high norm are considered outliers. A simulation study shows the performance of these proposed procedures and the need to take dependence in the time series into account. Finally, the usefulness of our methodology is illustrated in two real datasets from the electricity market: daily curves of electricity demand and price in mainland Spain, for the year 2012

    A Functional Data Analysis Approach for the Detection of Air Pollution Episodes and Outliers: A Case Study in Dublin, Ireland

    Get PDF
    Ground level concentrations of nitrogen oxide (NOx) can act as an indicator of air quality in the urban environment. In cities with relatively good air quality, and where NOx concentrations rarely exceed legal limits, adverse health effects on the population may still occur. Therefore, detecting small deviations in air quality and deriving methods of controlling air pollution are challenging. This study presents different data analytical methods which can be used to monitor and effectively evaluate policies or measures to reduce nitrogen oxide (NOx) emissions through the detection of pollution episodes and the removal of outliers. This method helps to identify the sources of pollution more effectively, and enhances the value of monitoring data and exceedances of limit values. It will detect outliers, changes and trend deviations in NO2 concentrations at ground level, and consists of four main steps: classical statistical description techniques, statistical process control techniques, functional analysis and a functional control process. To demonstrate the effectiveness of the outlier detection methodology proposed, it was applied to a complete one-year NO2 dataset for a sub-urban site in Dublin, Ireland in 2013. The findings demonstrate how the functional data approach improves the classical techniques for detecting outliers, and in addition, how this new methodology can facilitate a more thorough approach to defining effect air pollution control measures

    A functional data analysis approach for the detection of air pollution episodes and outliers: a case study in Dublin, Ireland

    Get PDF
    Ground level concentrations of nitrogen oxide (NOx) can act as an indicator of air quality in the urban environment. In cities with relatively good air quality, and where NOx concentrations rarely exceed legal limits, adverse health effects on the population may still occur. Therefore, detecting small deviations in air quality and deriving methods of controlling air pollution are challenging. This study presents different data analytical methods which can be used to monitor and effectively evaluate policies or measures to reduce nitrogen oxide (NOx) emissions through the detection of pollution episodes and the removal of outliers. This method helps to identify the sources of pollution more effectively, and enhances the value of monitoring data and exceedances of limit values. It will detect outliers, changes and trend deviations in NO2 concentrations at ground level, and consists of four main steps: classical statistical description techniques, statistical process control techniques, functional analysis and a functional control process. To demonstrate the effectiveness of the outlier detection methodology proposed, it was applied to a complete one-year NO2 dataset for a sub-urban site in Dublin, Ireland in 2013. The findings demonstrate how the functional data approach improves the classical techniques for detecting outliers, and in addition, how this new methodology can facilitate a more thorough approach to defining effect air pollution control measures.Ministerio de Industria y Competitividad | Ref. RTI2018-096296-B-C2

    Inference and Visualization of Periodic Sequences

    Get PDF
    This dissertation is composed of four articles describing inference and visualization of periodic sequences. In the first article, a nonparametric method is proposed for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator. The second article is the multivariate extension, where we present a CV method of estimating the periods of multiple periodic sequences when data are observed at evenly spaced time points. The basic idea is to borrow information from other correlated sequences to improve estimation of the period of interest. We show that the asymptotic behavior of the bivariate CV is the same as the CV for one sequence, however, for finite samples, the better the periods of the other correlated sequences are estimated, the more substantial improvements can be obtained. The third article proposes an informative exploratory tool, the functional boxplot, for visualizing functional data, as well as its generalization, the enhanced functional boxplot. Based on the center outwards ordering induced by band depth for functional data, the descriptive statistics of a functional boxplot are: the envelope of the 50 percent central region, the median curve and the maximum non-outlying envelope. In addition, outliers can be detected by the 1.5 times the 50 percent central region empirical rule. The last article proposes a simulation-based method to adjust functional boxplots for correlations when visualizing functional and spatio-temporal data, as well as detecting outliers. We start by investigating the relationship between the spatiotemporal dependence and the 1.5 times the 50 percent central region empirical outlier detection rule. Then, we propose to simulate observations without outliers based on a robust estimator of the covariance function of the data. We select the constant factor in the functional boxplot to control the probability of correctly detecting no outliers. Finally, we apply the selected factor to the functional boxplot of the original data

    Machine Learning Methods for Depression Detection Using SMRI and RS-FMRI Images

    Get PDF
    Major Depression Disorder (MDD) is a common disease throughout the world that negatively influences people’s lives. Early diagnosis of MDD is beneficial, so detecting practical biomarkers would aid clinicians in the diagnosis of MDD. Having an automated method to find biomarkers for MDD is helpful even though it is difficult. The main aim of this research is to generate a method for detecting discriminative features for MDD diagnosis based on Magnetic Resonance Imaging (MRI) data. In this research, representational similarity analysis provides a framework to compare distributed patterns and obtain the similarity/dissimilarity of brain regions. Regions are obtained by either data-driven or model-driven methods such as cubes and atlases respectively. For structural MRI (sMRI) similarity of voxels of spatial cubes (data-driven) are explored. For resting-state fMRI (rs-fMRI) images, the similarity of the time series of both cubes (data-driven) and atlases (model-driven) are examined. Moreover, the similarity method of the inverse of Minimum Covariant Determinant is applied that excludes outliers from patterns and finds conditionally independent regions given the rest of regions. Next, a statistical test that is robust to outliers, identifies discriminative similarity features between two groups of MDDs and controls. Therefore, the key contribution is the way to get discriminative features that include obtaining similarity of voxel’s cubes/time series using the inverse of robust covariance along with the statistical test. The experimental results show that obtaining these features along with the Bernoulli Naïve Bayes classifier achieves superior performance compared with other methods. The performance of our method is verified by applying it to three imbalanced datasets. Moreover, the similarity-based methods are compared with deep learning and regional-based approaches for detecting MDD using either sMRI or rs-fMRI. Given that depression is famous to be a connectivity disorder problem, investigating the similarity of the brain’s regions is valuable to understand the behavior of the brain. The combinations of structural and functional brain similarities are explored to investigate the brain’s structural and functional properties together. Moreover, the combination of data-driven (cube) and model-driven (atlas) similarities of rs-fMRI are looked over to evaluate how they affect the performance of the classifier. Besides, discriminative similarities are visualized for both sMRI and rs-fMRI. Also, to measure the informativeness of a cube, the relationship of atlas regions with overlapping cubes and vise versa (cubes with overlapping regions) are explored and visualized. Furthermore, the relationship between brain structure and function has been probed through common similarities between structural and resting-state functional networks

    Identification of Outlying Observations with Quantile Regression for Censored Data

    Full text link
    Outlying observations, which significantly deviate from other measurements, may distort the conclusions of data analysis. Therefore, identifying outliers is one of the important problems that should be solved to obtain reliable results. While there are many statistical outlier detection algorithms and software programs for uncensored data, few are available for censored data. In this article, we propose three outlier detection algorithms based on censored quantile regression, two of which are modified versions of existing algorithms for uncensored or censored data, while the third is a newly developed algorithm to overcome the demerits of previous approaches. The performance of the three algorithms was investigated in simulation studies. In addition, real data from SEER database, which contains a variety of data sets related to various cancers, is illustrated to show the usefulness of our methodology. The algorithms are implemented into an R package OutlierDC which can be conveniently employed in the \proglang{R} environment and freely obtained from CRAN

    Shape Outlier Detection and Visualization for Functional Data: the Outliergram

    Get PDF
    We propose a new method to visualize and detect shape outliers in samples of curves. In functional data analysis we observe curves defined over a given real interval and shape outliers are those curves that exhibit a different shape from the rest of the sample. Whereas magnitude outliers, that is, curves that exhibit atypically high or low values at some points or across the whole interval, are in general easy to identify, shape outliers are often masked among the rest of the curves and thus difficult to detect. In this article we exploit the relation between two depths for functional data to help visualizing curves in terms of shape and to develop an algorithm for shape outlier detection. We illustrate the use of the visualization tool, the outliergram, through several examples and asses the performance of the algorithm on a simulation study. We apply them to the detection of outliers in a children growth dataset in which the girls sample is contaminated with boys curves and viceversa.Comment: 27 pages, 5 figure

    Robust mixtures in the presence of measurement errors

    Full text link
    We develop a mixture-based approach to robust density modeling and outlier detection for experimental multivariate data that includes measurement error information. Our model is designed to infer atypical measurements that are not due to errors, aiming to retrieve potentially interesting peculiar objects. Since exact inference is not possible in this model, we develop a tree-structured variational EM solution. This compares favorably against a fully factorial approximation scheme, approaching the accuracy of a Markov-Chain-EM, while maintaining computational simplicity. We demonstrate the benefits of including measurement errors in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. We then use this approach in detecting peculiar quasars from an astrophysical survey, given photometric measurements with errors.Comment: (Refereed) Proceedings of the 24-th Annual International Conference on Machine Learning 2007 (ICML07), (Ed.) Z. Ghahramani. June 20-24, 2007, Oregon State University, Corvallis, OR, USA, pp. 847-854; Omnipress. ISBN 978-1-59593-793-3; 8 pages, 6 figure
    • …
    corecore