3 research outputs found

    Scalable kernel density estimation-based local outlier detection over large data streams

    No full text
    © 2019 Copyright held by the owner/author(s). Local outlier techniques are known to be effective for detecting outliers in skewed data, where subsets of the data exhibit diverse distribution properties. However, existing methods are not well equipped to support modern high-velocity data streams due to the high complexity of the detection algorithms and their volatility to data updates. To tackle these shortcomings, we propose local outlier semantics that operate at an abstraction level by leveraging kernel density estimation (KDE) to effectively detect local outliers from streaming data. A strategy to continuously detect top-N KDE-based local outliers over streams is designed, called KELOS – the first linear time complexity streaming local outlier detection approach. The first innovation of KELOS is the abstract kernel center-based KDE (aKDE) strategy. aKDE accurately yet efficiently estimates the data density at each point – essential for local outlier detection. This is based on the observation that a cluster of points close to each other tend to have a similar influence on a target point’s density estimation when used as kernel centers. These points thus can be represented by one abstract kernel center. Next, the KELOS’s inlier pruning strategy early prunes points that have no chance to become top-N outliers. This empowers KELOS to skip the computation of their data density and of the outlier status for every data point. Together aKDE and the inlier pruning strategy eliminate the performance bottleneck of streaming local outlier detection. The experimental evaluation demonstrates that KELOS is up to 6 orders of magnitude faster than existing solutions, while being highly effective in detecting local outliers from streaming data

    Metabolic profiling on 2D NMR TOCSY spectra using machine learning

    Get PDF
    Due to the dynamicity of biological cells, the role of metabolic profiling in discovering biological fingerprints of diseases, and their evolution, as well as the cellular pathway of different biological or chemical stimuli is most significant. Two-dimensional nuclear magnetic resonance (2D NMR) is one of the fundamental and strong analytical instruments for metabolic profiling. Though, total correlation spectroscopy (2D NMR 1H -1H TOCSY) can be used to improve spectral overlap of 1D NMR, strong peak shift, signal overlap, spectral crowding and matrix effects in complex biological mixtures are extremely challenging in 2D NMR analysis. In this work, we introduce an automated metabolic deconvolution and assignment based on the deconvolution of 2D TOCSY of real breast cancer tissue, in addition to different differentiation pathways of adipose tissue-derived human Mesenchymal Stem cells. A major alternative to the common approaches in NMR based machine learning where images of the spectra are used as an input, our metabolic assignment is based only on the vertical and horizontal frequencies of metabolites in the 1H-1H TOCSY. One- and multi-class Kernel null foley–Sammon transform, support vector machines, polynomial classifier kernel density estimation, and support vector data description classifiers were tested in semi-supervised learning and novelty detection settings. The classifiers’ performance was evaluated by comparing the conventional human-based methodology and automatic assignments under different initial training sizes settings. The results of our novel metabolic profiling methods demonstrate its suitability, robustness, and speed in automated nontargeted NMR metabolic analysis
    corecore