89,820 research outputs found

    Soft Methodology for Cost-and-error Sensitive Classification

    Full text link
    Many real-world data mining applications need varying cost for different types of classification errors and thus call for cost-sensitive classification algorithms. Existing algorithms for cost-sensitive classification are successful in terms of minimizing the cost, but can result in a high error rate as the trade-off. The high error rate holds back the practical use of those algorithms. In this paper, we propose a novel cost-sensitive classification methodology that takes both the cost and the error rate into account. The methodology, called soft cost-sensitive classification, is established from a multicriteria optimization problem of the cost and the error rate, and can be viewed as regularizing cost-sensitive classification with the error rate. The simple methodology allows immediate improvements of existing cost-sensitive classification algorithms. Experiments on the benchmark and the real-world data sets show that our proposed methodology indeed achieves lower test error rates and similar (sometimes lower) test costs than existing cost-sensitive classification algorithms. We also demonstrate that the methodology can be extended for considering the weighted error rate instead of the original error rate. This extension is useful for tackling unbalanced classification problems.Comment: A shorter version appeared in KDD '1

    Classification hardness for supervised learners on 20 years of intrusion detection data

    Get PDF
    This article consolidates analysis of established (NSL-KDD) and new intrusion detection datasets (ISCXIDS2012, CICIDS2017, CICIDS2018) through the use of supervised machine learning (ML) algorithms. The uniformity in analysis procedure opens up the option to compare the obtained results. It also provides a stronger foundation for the conclusions about the efficacy of supervised learners on the main classification task in network security. This research is motivated in part to address the lack of adoption of these modern datasets. Starting with a broad scope that includes classification by algorithms from different families on both established and new datasets has been done to expand the existing foundation and reveal the most opportune avenues for further inquiry. After obtaining baseline results, the classification task was increased in difficulty, by reducing the available data to learn from, both horizontally and vertically. The data reduction has been included as a stress-test to verify if the very high baseline results hold up under increasingly harsh constraints. Ultimately, this work contains the most comprehensive set of results on the topic of intrusion detection through supervised machine learning. Researchers working on algorithmic improvements can compare their results to this collection, knowing that all results reported here were gathered through a uniform framework. This work's main contributions are the outstanding classification results on the current state of the art datasets for intrusion detection and the conclusion that these methods show remarkable resilience in classification performance even when aggressively reducing the amount of data to learn from

    Transfer Learning with Deep Convolutional Neural Network (CNN) for Pneumonia Detection using Chest X-ray

    Get PDF
    Pneumonia is a life-threatening disease, which occurs in the lungs caused by either bacterial or viral infection. It can be life-endangering if not acted upon in the right time and thus an early diagnosis of pneumonia is vital. The aim of this paper is to automatically detect bacterial and viral pneumonia using digital x-ray images. It provides a detailed report on advances made in making accurate detection of pneumonia and then presents the methodology adopted by the authors. Four different pre-trained deep Convolutional Neural Network (CNN)- AlexNet, ResNet18, DenseNet201, and SqueezeNet were used for transfer learning. 5247 Bacterial, viral and normal chest x-rays images underwent preprocessing techniques and the modified images were trained for the transfer learning based classification task. In this work, the authors have reported three schemes of classifications: normal vs pneumonia, bacterial vs viral pneumonia and normal, bacterial and viral pneumonia. The classification accuracy of normal and pneumonia images, bacterial and viral pneumonia images, and normal, bacterial and viral pneumonia were 98%, 95%, and 93.3% respectively. This is the highest accuracy in any scheme than the accuracies reported in the literature. Therefore, the proposed study can be useful in faster-diagnosing pneumonia by the radiologist and can help in the fast airport screening of pneumonia patients.Comment: 13 Figures, 5 tables. arXiv admin note: text overlap with arXiv:2003.1314

    Construction of embedded fMRI resting state functional connectivity networks using manifold learning

    Full text link
    We construct embedded functional connectivity networks (FCN) from benchmark resting-state functional magnetic resonance imaging (rsfMRI) data acquired from patients with schizophrenia and healthy controls based on linear and nonlinear manifold learning algorithms, namely, Multidimensional Scaling (MDS), Isometric Feature Mapping (ISOMAP) and Diffusion Maps. Furthermore, based on key global graph-theoretical properties of the embedded FCN, we compare their classification potential using machine learning techniques. We also assess the performance of two metrics that are widely used for the construction of FCN from fMRI, namely the Euclidean distance and the lagged cross-correlation metric. We show that the FCN constructed with Diffusion Maps and the lagged cross-correlation metric outperform the other combinations

    Finding groups in data: Cluster analysis with ants

    Get PDF
    Wepresent in this paper a modification of Lumer and Faietaā€™s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques
    • ā€¦
    corecore