3,480 research outputs found

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    Adaptive Online Sequential ELM for Concept Drift Tackling

    Get PDF
    A machine learning method needs to adapt to over time changes in the environment. Such changes are known as concept drift. In this paper, we propose concept drift tackling method as an enhancement of Online Sequential Extreme Learning Machine (OS-ELM) and Constructive Enhancement OS-ELM (CEOS-ELM) by adding adaptive capability for classification and regression problem. The scheme is named as adaptive OS-ELM (AOS-ELM). It is a single classifier scheme that works well to handle real drift, virtual drift, and hybrid drift. The AOS-ELM also works well for sudden drift and recurrent context change type. The scheme is a simple unified method implemented in simple lines of code. We evaluated AOS-ELM on regression and classification problem by using concept drift public data set (SEA and STAGGER) and other public data sets such as MNIST, USPS, and IDS. Experiments show that our method gives higher kappa value compared to the multiclassifier ELM ensemble. Even though AOS-ELM in practice does not need hidden nodes increase, we address some issues related to the increasing of the hidden nodes such as error condition and rank values. We propose taking the rank of the pseudoinverse matrix as an indicator parameter to detect underfitting condition.Comment: Hindawi Publishing. Computational Intelligence and Neuroscience Volume 2016 (2016), Article ID 8091267, 17 pages Received 29 January 2016, Accepted 17 May 2016. Special Issue on "Advances in Neural Networks and Hybrid-Metaheuristics: Theory, Algorithms, and Novel Engineering Applications". Academic Editor: Stefan Hauf

    PointMap: A real-time memory-based learning system with on-line and post-training pruning

    Full text link
    Also published in the International Journal of Hybrid Intelligent Systems, Volume 1, January, 2004A memory-based learning system called PointMap is a simple and computationally efficient extension of Condensed Nearest Neighbor that allows the user to limit the number of exemplars stored during incremental learning. PointMap evaluates the information value of coding nodes during training, and uses this index to prune uninformative nodes either on-line or after training. These pruning methods allow the user to control both a priori code size and sensitivity to detail in the training data, as well as to determine the code size necessary for accurate performance on a given data set. Coding and pruning computations are local in space, with only the nearest coded neighbor available for comparison with the input; and in time, with only the current input available during coding. Pruning helps solve common problems of traditional memory-based learning systems: large memory requirements, their accompanying slow on-line computations, and sensitivity to noise. PointMap copes with the curse of dimensionality by considering multiple nearest neighbors during testing without increasing the complexity of the training process or the stored code. The performance of PointMap is compared to that of a group of sixteen nearest-neighbor systems on benchmark problems.This research was supported by grants from the Air Force Office of Scientific Research (AFOSR F49620-98-l-0108, F49620-0l-l-0397, and F49620-0l-l-0423) and the Office of Naval Research (ONR N00014-0l-l-0624)

    A Latent Space Support Vector Machine (LSSVM) Model for Cancer Prognosis

    Get PDF
    AbstractGene expression microarray analysis is a rapid, low cost method of analyzing gene expression profiles for cancer prognosis/diagnosis. Microarray data generated from oncological studies typically contain thousands of expression values with few cases. Traditional regression and classification methods require first reducing the number of dimensions via statistical or heuristic methods. Partial Least Squares (PLS) is a dimensionality reduction method that builds a least squares regression model in a reduced dimensional space. It is well known that Support Vector Machines (SVM) outperform least squares regression models. In this study, we replace the PLS least squares model with a SVM model in the PLS reduced dimensional space. To verify our method, we build upon our previous work with a publicly available data set from the Gene Expression Omnibus database containing gene expression levels, clinical data, and survival times for patients with non-small cell lung carcinoma. Using 5-fold cross validation, and Receiver Operating Characteristic (ROC) analysis, we show a comparison of classifier performance between the traditional PLS model and the PLS/SVM hybrid. Our results show that replacing least squares regression with SVM, we increase the quality of the model as measured by the area under the ROC curve

    Development of artificial neural network-based classifiers to identify military impulse noise

    Get PDF
    Noise monitoring stations are in place around some military installations, to provide records that assist in processing noise complaints and damage claims. However, they are known to produce false positives and miss many impulse events. In this thesis, classifiers based on artificial neural networks were developed to improve the accuracy of military impulse noise identification. Two time-domain metrics, kurtosis and crest factor, and two custom frequency-domain metrics, spectral slope and weighted square error, were selected as inputs to the artificial neural networks. A separate effort attempted to identify military impulse noise by the shape of the recorded waveform. The classification algorithm achieved up to 100% accuracy on the training data and the validation data
    • …
    corecore