39 research outputs found

    Outlier Detection from Network Data with Subnetwork Interpretation

    Full text link
    Detecting a small number of outliers from a set of data observations is always challenging. This problem is more difficult in the setting of multiple network samples, where computing the anomalous degree of a network sample is generally not sufficient. In fact, explaining why the network is exceptional, expressed in the form of subnetwork, is also equally important. In this paper, we develop a novel algorithm to address these two key problems. We treat each network sample as a potential outlier and identify subnetworks that mostly discriminate it from nearby regular samples. The algorithm is developed in the framework of network regression combined with the constraints on both network topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus goes beyond subspace/subgraph discovery and we show that it converges to a global optimum. Evaluation on various real-world network datasets demonstrates that our algorithm not only outperforms baselines in both network and high dimensional setting, but also discovers highly relevant and interpretable local subnetworks, further enhancing our understanding of anomalous networks

    A Divide-and-Conquer Solver for Kernel Support Vector Machines

    Full text link
    The kernel support vector machine (SVM) is one of the most widely used classification methods; however, the amount of computation required becomes the bottleneck when facing millions of samples. In this paper, we propose and analyze a novel divide-and-conquer solver for kernel SVMs (DC-SVM). In the division step, we partition the kernel SVM problem into smaller subproblems by clustering the data, so that each subproblem can be solved independently and efficiently. We show theoretically that the support vectors identified by the subproblem solution are likely to be support vectors of the entire kernel SVM problem, provided that the problem is partitioned appropriately by kernel clustering. In the conquer step, the local solutions from the subproblems are used to initialize a global coordinate descent solver, which converges quickly as suggested by our analysis. By extending this idea, we develop a multilevel Divide-and-Conquer SVM algorithm with adaptive clustering and early prediction strategy, which outperforms state-of-the-art methods in terms of training speed, testing accuracy, and memory usage. As an example, on the covtype dataset with half-a-million samples, DC-SVM is 7 times faster than LIBSVM in obtaining the exact SVM solution (to within 10−610^{-6} relative error) which achieves 96.15% prediction accuracy. Moreover, with our proposed early prediction strategy, DC-SVM achieves about 96% accuracy in only 12 minutes, which is more than 100 times faster than LIBSVM

    Comparative Study on modeling Efficiency Between Support Vector Machines (SVMs) model and Parallel OBF-NN model

    Get PDF
    This project is about the comparative study between model efficiency between support vector machine (SVM) and parallel OBF-NN model. To demonstrate the concept, basic support vector regression (SVR) model is developed as nonlinear model identification. Best parameter and option for SVR model is selected in order to construct optimum model performance. The study is developed using selected case study, which is using van de vusse reactor datasets. The data consist of input and output than applicable to perform simulation as training and validation data. Lastly, an OBF-SVR model is developed that use OBF model as linear part and SVR model as nonlinear part align in parallel. The performance of each developed model is tested in their performance in validation to approach real system value. The developed OBF-SVR model is compared with OBF-NN model and the deviation between each model is investigated
    corecore