Search CORE

79,889 research outputs found

Recommended from our members

User activities outliers detection; integration of statistical and computational intelligence techniques

Author: Langensiepen C
Lotfi A
Mahmoud S
Publication venue: 'Wiley'
Publication date: 01/02/2016
Field of study

In this paper, a hybrid technique for user activities outliers detection is introduced. The hybrid technique consists of a two-stage integration of Principal Component Analysis (PCA) and Fuzzy Rule-Based Systems (FRBS). In the first stage, the Hamming distance is used to measure the differences between different activities. PCA is then applied to the distance measures to find two indices of Hotelling's T2 and Squared Prediction Error. In the second stage of the process, the calculated indices are provided as inputs to the FRBSs to model them heuristically. The model is used to identify the outliers and classify them. The proposed system is tested in real home environments, equipped with appropriate sensory devices, to identify outliers in the activities of daily living of the user. Three case studies are reported to demonstrate the effectiveness of the proposed system. The proposed system successfully identifies the outliers in activities distinguishing between the normal and abnormal behavioural patterns

Nottingham Trent Institutional Repository (IRep)

Recovery of Outliers in Water Environment Monitoring Data

Author: Huang Liming
Jia Dongyan
Song Jinling
Wang Gang
Zhu Meining
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2023
Field of study

The water environment monitoring data are time sequences with outliers which depress the data quality, so outlier detection and recovery play an important role in the applications such as knowledge acquisition and prediction modelling of water environment indicators. To detect the outliers, the short-term chain comparison with the sliding window based on the time sequence characteristics is adopted. To recover outliers closer to the real data at that time, the sub-sequences are divided dynamically according to the change characteristics of the dataset, then the similarity between sub-sequences is measured by the shape distance and the outliers are recovered according to the change trend of the corresponding data in the most similar sub-sequences. The monitoring data of a water station are selected in the study. The experimental results show that the recovery method is superior to the commonly used prediction recovery method and fitting recovery method, the recovered data is smoother and the short-term trend is more obvious

Hrčak - Portal of scientific journals of Croatia

A Semi-Supervised Feature Engineering Method for Effective Outlier Detection in Mixed Attribute Data Sets

Author: Rentala Girish Srivatsa
Publication venue: Louisiana Tech Digital Commons
Publication date: 16/08/2018
Field of study

Outlier detection is one of the crucial tasks in data mining which can lead to the finding of valuable and meaningful information within the data. An outlier is a data point that is notably dissimilar from other data points in the data set. As such, the methods for outlier detection play an important role in identifying and removing the outliers, thereby increasing the performance and accuracy of the prediction systems. Outlier detection is used in many areas like financial fraud detection, disease prediction, and network intrusion detection. Traditional outlier detection methods are founded on the use of different distance measures to estimate the similarity between the points and are confined to data sets that are purely continuous or categorical. These methods, though effective, lack in elucidating the relationship between outliers and known clusters/classes in the data set. We refer to this relationship as the context for any reported outlier. Alternate outlier detection methods establish the context of a reported outlier using underlying contextual beliefs of the data. Contextual beliefs are the established relationships between the attributes of the data set. Various studies have been recently conducted where they explore the contextual beliefs to determine outlier behavior. However, these methods do not scale in the situations where the data points and their respective contexts are sparse. Thus, the outliers reported by these methods tend to lose meaning. Another limitation of these methods is that they assume all features are equally important and do not consider nor determine subspaces among the features for identifying the outliers. Furthermore, determining subspaces is computationally exacerbated, as the number of possible subspaces increases with increasing dimensionality. This makes searching through all the possible subspaces impractical. In this thesis, we propose a Hybrid Bayesian Network approach to capture the underlying contextual beliefs to detect meaningful outliers in mixed attribute data sets. Hybrid Bayesian Networks utilize their probability distributions to encode the information of the data and outliers are those points which violate this information. To deal with the sparse contexts, we use an angle-based similarity method which is then combined with the joint probability distributions of the Hybrid Bayesian Network in a robust manner. With regards to the subspace selection, we employ a feature engineering method that consists of two-stage feature selection using Maximal Information Coefficient and Markov blankets of Hybrid Bayesian Networks to select highly correlated feature subspaces. This proposed method was tested on a real world medical record data set. The results indicate that the algorithm was able to identify meaningful outliers successfully. Moreover, we compare the performance of our algorithm with the existing baseline outlier detection algorithms. We also present a detailed analysis of the reported outliers using our method and demonstrate its efficiency when handling data points with sparse contexts

Louisiana Tech Digital Commons

Predicting software project effort: A grey relational analysis based method

Author: Albrecht
Boehm
Boehm
Boetticher
Breunig
Briand
Briand
Briand
Burgess
Cheung
Deng
Deng
Finnie
Huang
Huang
Jain
Jeffery
Jiang
Jou
Jørgensen
Kadoda
Kemerer
Khotanzad
Kohavi
Li
Liu
Luo
Martin Shepperd
Mitchell
Moløkken
Mukhopadhyay
Myrtveit
Putnam
Qinbao Song
Samson
Shepperd
Siedelecki
Song
Srinivasan
Su
Walkerden
Wang
Wang
Wang
Wittig
Wittig
Publication venue: 'Elsevier BV'
Publication date: 01/06/2011
Field of study

This is the post-print version of the final paper published in Expert Systems with Applications. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2011 Elsevier B.V.The inherent uncertainty of the software development process presents particular challenges for software effort prediction. We need to systematically address missing data values, outlier detection, feature subset selection and the continuous evolution of predictions as the project unfolds, and all of this in the context of data-starvation and noisy data. However, in this paper, we particularly focus on outlier detection, feature subset selection, and effort prediction at an early stage of a project. We propose a novel approach of using grey relational analysis (GRA) from grey system theory (GST), which is a recently developed system engineering theory based on the uncertainty of small samples. In this work we address some of the theoretical challenges in applying GRA to outlier detection, feature subset selection, and effort prediction, and then evaluate our approach on five publicly available industrial data sets using both stepwise regression and Analogy as benchmarks. The results are very encouraging in the sense of being comparable or better than other machine learning techniques and thus indicate that the method has considerable potential.National Natural Science Foundation of Chin

Crossref

Brunel University Research Archive

Detecting Outliers in Data with Correlated Measures

Author: Kifer Daniel
Kuo Yu-Hsuan
Li Zhenhui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/08/2018
Field of study

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Outlier identification in radiation therapy knowledge-based planning: A study of pelvic cases.

Author: Aggarwal
Appenzoller
Barnett
Boutilier
Chanyavanich
Delaney
Fogliata
Hawkins
Hodge
Lian
Motulsky
Osborne
Pardoe
Sheng
Sheng
Tol
Wang
Whittingham
Wu
Yang
Youden
Yuan
Yuan
Zhu
Publication venue: Jefferson Digital Commons
Publication date: 01/11/2017
Field of study

PURPOSE: The purpose of this study was to apply statistical metrics to identify outliers and to investigate the impact of outliers on knowledge-based planning in radiation therapy of pelvic cases. We also aimed to develop a systematic workflow for identifying and analyzing geometric and dosimetric outliers. METHODS: Four groups (G1-G4) of pelvic plans were sampled in this study. These include the following three groups of clinical IMRT cases: G1 (37 prostate cases), G2 (37 prostate plus lymph node cases) and G3 (37 prostate bed cases). Cases in G4 were planned in accordance with dynamic-arc radiation therapy procedure and include 10 prostate cases in addition to those from G1. The workflow was separated into two parts: 1. identifying geometric outliers, assessing outlier impact, and outlier cleaning; 2. identifying dosimetric outliers, assessing outlier impact, and outlier cleaning. G2 and G3 were used to analyze the effects of geometric outliers (first experiment outlined below) while G1 and G4 were used to analyze the effects of dosimetric outliers (second experiment outlined below). A baseline model was trained by regarding all G2 cases as inliers. G3 cases were then individually added to the baseline model as geometric outliers. The impact on the model was assessed by comparing leverages of inliers (G2) and outliers (G3). A receiver-operating-characteristic (ROC) analysis was performed to determine the optimal threshold. The experiment was repeated by training the baseline model with all G3 cases as inliers and perturbing the model with G2 cases as outliers. A separate baseline model was trained with 32 G1 cases. Each G4 case (dosimetric outlier) was subsequently added to perturb the model. Predictions of dose-volume histograms (DVHs) were made using these perturbed models for the remaining 5 G1 cases. A Weighted Sum of Absolute Residuals (WSAR) was used to evaluate the impact of the dosimetric outliers. RESULTS: The leverage of inliers and outliers was significantly different. The Area-Under-Curve (AUC) for differentiating G2 (outliers) from G3 (inliers) was 0.98 (threshold: 0.27) for the bladder and 0.81 (threshold: 0.11) for the rectum. For differentiating G3 (outlier) from G2 (inlier), the AUC (threshold) was 0.86 (0.11) for the bladder and 0.71 (0.11) for the rectum. Significant increase in WSAR was observed in the model with 3 dosimetric outliers for the bladder (P \u3c 0.005 with Bonferroni correction), and in the model with only 1 dosimetric outlier for the rectum (P \u3c 0.005). CONCLUSIONS: We established a systematic workflow for identifying and analyzing geometric and dosimetric outliers, and investigated statistical metrics for outlier detection. Results validated the necessity for outlier detection and clean-up to enhance model quality in clinical practice

Crossref

Jefferson Digital Commons