36 research outputs found

    Identification of Outlying Observations with Quantile Regression for Censored Data

    Full text link
    Outlying observations, which significantly deviate from other measurements, may distort the conclusions of data analysis. Therefore, identifying outliers is one of the important problems that should be solved to obtain reliable results. While there are many statistical outlier detection algorithms and software programs for uncensored data, few are available for censored data. In this article, we propose three outlier detection algorithms based on censored quantile regression, two of which are modified versions of existing algorithms for uncensored or censored data, while the third is a newly developed algorithm to overcome the demerits of previous approaches. The performance of the three algorithms was investigated in simulation studies. In addition, real data from SEER database, which contains a variety of data sets related to various cancers, is illustrated to show the usefulness of our methodology. The algorithms are implemented into an R package OutlierDC which can be conveniently employed in the \proglang{R} environment and freely obtained from CRAN

    TESTING FOR MULTIPLE UPPER OUTLIERS IN DISTRIBUTION SAMPLES: A STUDY OF FOREIGN EXCHANGE DATA

    Get PDF
    In this study, the existences of k-upper outliers are investigated in distribution samples of gamma, Normal and exponential by carrying out a simulation of ten thousand at different values of n using algorithm introduced by Tietjen-Moore, test statistics, and critical values were equally estimated from the algorithm. A Normal Q-Q plot was made which aims at distinguishing a data set that follows a normal distribution and one that deviates from normality. The algorithm was applied to Nigeria-US dollars foreign exchange rate, both on raw and logarithmic transformed data. The simulation study reveals the existence of upper outliers more in Gamma and exponential samples than the Normal sample. Empirical analysis shows that there are upper outliers in the raw data set but no upper outliers are found in the transformed data. The result in this paper would help the researcher in business and economics to take time to explore data before use and properly transform accordingly to avoid error in estimation

    SURGE: Continuous Detection of Bursty Regions Over a Stream of Spatial Objects

    Full text link
    With the proliferation of mobile devices and location-based services, continuous generation of massive volume of streaming spatial objects (i.e., geo-tagged data) opens up new opportunities to address real-world problems by analyzing them. In this paper, we present a novel continuous bursty region detection problem that aims to continuously detect a bursty region of a given size in a specified geographical area from a stream of spatial objects. Specifically, a bursty region shows maximum spike in the number of spatial objects in a given time window. The problem is useful in addressing several real-world challenges such as surge pricing problem in online transportation and disease outbreak detection. To solve the problem, we propose an exact solution and two approximate solutions, and the approximation ratio is 1α4\frac{1-\alpha}{4} in terms of the burst score, where α\alpha is a parameter to control the burst score. We further extend these solutions to support detection of top-kk bursty regions. Extensive experiments with real-world data are conducted to demonstrate the efficiency and effectiveness of our solutions

    Improved spatial outlier detection method within a river network

    Get PDF
    A spatial outlier refers to the observation whose non-spatial attribute values are significantly different from those of its neighbors. Such observations can also be found in water quality data at monitoring stations within a river network. However, existing spatial outlier detection procedures based on distance measures such as the Euclidean distance between monitoring stations do not take into account the river network topology. In general, water quality levels in lower streams will be affected by the flow from the upper streams. Similarly, the water quality at some tributaries may have little influence on the other tributaries. Hence, a method for identifying spatial outliers in a river network, taking into account the effect of river flow connectivity on the determination of the neighbors of the monitoring stations, is proposed. While the robust Mahalalobis distance is used in both methods, the proposed method uses river distance instead of the Euclidean distance. The performance of the proposed method is shown to be superior using a synthetic river dataset through simulation. For illustration, we apply the proposed method on the water quality data from Sg. Klang Basin in 2016 provided by the Department of Environment, Malaysia. The finding provides a better identification of the water quality in some stations that significantly differ from their neighbouring stations. Such information is useful for the authorities in their planning of the environmental monitoring of water quality in the areas

    Exploring passenger rail markets using new station catchment size and shape metrics

    Get PDF
    This paper presents a novel spatial market segmentation method to determine key user groups of a train station (such as gender, age and access mode), based on the size and shape of the station catchment area of each group. Two new indices–area ratio and composite ratio–are developed to quantify the importance of user groups for a train station. This method is applied to identify key user groups at seven train stations in Perth, Western Australia. The study offers a new way to explore the travel behaviour of train users and provides insights for rail transport planning and marketing

    Direct structural analysis of domains defined by point clouds

    Full text link
    This contribution presents a method that aims at the numerical analysis of solids represented by oriented point clouds. The proposed approach is based on the Finite Cell Method, a high-order immersed boundary technique that computes on a regular background grid of finite elements and requires only inside-outside information from the geometric model. It is shown that oriented point clouds provide sufficient information for these point-membership classifications. Further, we address a tessellation-free formulation of contour integrals that allows to apply Neumann boundary conditions on point clouds without having to recover the underlying surface. Two-dimensional linear elastic benchmark examples demonstrate that the method is able to provide the same accuracy as those computed with conventional, continuous surface descriptions, because the associated error can be controlled by the density of the cloud. Three-dimensional examples computed on point clouds of historical structures show how the method can be employed to establish seamless connections between digital shape measurement techniques and numerical analyses

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework
    corecore