459 research outputs found

    Распараллеливание алгоритмов функционирования классификатора со случайными подпространствами

    Get PDF
    Нейронные сети являются достаточно популярным средством решения многих задач искуственного интеллекта. В то же время, любые аппаратные реализации нейросетевых архитектур очень быстро устаревают благодаря стремительному развитию вычислительной техники. Таким образом, большинство исследователей стремятся использовать в первую очередь программные реализации алгоритмов, что делает актуальным распараллеливание функционирования нейросетевых систем. В работе рассматривается нейросетевой классификатор со случайными подпространствами и предлагаются алгоритмы распараллеливания его основных операций.Neural networks are quite popular for solving many tasks of artificial intelligence. At the same time, any hardware implementation of some neural architecture becomes obsolete really fast due to rapid development of semiconductor industry. Thus, most researchers first of all tend to use software algorithm implementations, which makes parallelization of neural network algorithms quite attractive. This article analyses neural random subspace classifier and suggests algorithms for parallelization off it’s basic operations

    Impact of the learners diversity and combination method on the generation of heterogeneous classifier ensembles

    Get PDF
    Ensembles of classifiers is a proven approach in machine learning with a wide variety of research works. The main issue in ensembles of classifiers is not only the selection of the base classifiers, but also the combination of their outputs. According to the literature, it has been established that much is to be gained from combining classifiers if those classifiers are accurate and diverse. However, it is still an open issue how to define the relation between accuracy and diversity in order to define the best possible ensemble of classifiers. In this paper, we propose a novel approach to evaluate the impact of the diversity of the learners on the generation of heterogeneous ensembles. We present an exhaustive study of this approach using 27 different multiclass datasets and analysing their results in detail. In addition, to determine the performance of the different results, the presence of labelling noise is also considered.This work has been supported under projects PEAVAUTO-CM-UC3M–2020/00036/001, PID2019-104793RB-C31, and RTI2018-096036-B-C22, and by the Region of Madrid’s Excellence Program, Spain (EPUC3M17)

    Data Mining with Supervised Instance Selection Improves Artificial Neural Network Classification Accuracy

    Get PDF
    IDSs may monitor intrusion logs, traffic control packets, and assaults. Nets create large amounts of data. IDS log characteristics are used to detect whether a record or connection was attacked or regular network activity. Reduced feature size aids machine learning classification. This paper describes a standardised and systematic intrusion detection classification approach. Using dataset signatures, the Naive Bayes Algorithm, Random Tree, and Neural Network classifiers are assessed. We examine the feature reduction efficacy of PCA and the fisheries score in this study. The first round of testing uses a reduced dataset without decreasing the components set, and the second uses principal components analysis. PCA boosts classification accuracy by 1.66 percent. Artificial immune systems, inspired by the human immune system, use learning, long-term memory, and association to recognise and v-classify. Introduces the Artificial Neural Network (ANN) classifier model and its development issues. Iris and Wine data from the UCI learning repository proves the ANN approach works. Determine the role of dimension reduction in ANN-based classifiers. Detailed mutual information-based feature selection methods are provided. Simulations from the KDD Cup'99 demonstrate the method's efficacy. Classifying big data is important to tackle most engineering, health, science, and business challenges. Labelled data samples train a classifier model, which classifies unlabeled data samples into numerous categories. Fuzzy logic and artificial neural networks (ANNs) are used to classify data in this dissertation

    Investigating Randomised Sphere Covers in Supervised Learning

    Get PDF
    c©This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and that no quotation from the thesis, nor any information derived therefrom, may be published without the author’s prior, written consent. In this thesis, we thoroughly investigate a simple Instance Based Learning (IBL) classifier known as Sphere Cover. We propose a simple Randomized Sphere Cover Classifier (αRSC) and use several datasets in order to evaluate the classification performance of the αRSC classifier. In addition, we analyse the generalization error of the proposed classifier using bias/variance decomposition. A Sphere Cover Classifier may be described from the compression scheme which stipulates data compression as the reason for high generalization performance. We investigate the compression capacity of αRSC using a sample compression bound. The Compression Scheme prompted us to search new compressibility methods for αRSC. As such, we used a Gaussian kernel to investigate further data compression

    NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

    Full text link
    We introduce the Never Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks, sorted chronologically and extracted from papers sampled uniformly from computer vision proceedings spanning the last three decades. The resulting stream reflects what the research community thought was meaningful at any point in time. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, crowd counting, scene recognition, and so forth. The diversity is also reflected in the wide range of dataset sizes, spanning over four orders of magnitude. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks, yet with a low entry barrier as it is limited to a single modality and each task is a classical supervised learning problem. Moreover, we provide a reference implementation including strong baselines and a simple evaluation protocol to compare methods in terms of their trade-off between accuracy and compute. We hope that NEVIS'22 can be useful to researchers working on continual learning, meta-learning, AutoML and more generally sequential learning, and help these communities join forces towards more robust and efficient models that efficiently adapt to a never ending stream of data. Implementations have been made available at https://github.com/deepmind/dm_nevis

    CONTINUAL LEARNING FOR MULTI-LABEL DRIFTING DATA STREAMS USING HOMOGENEOUS ENSEMBLE OF SELF-ADJUSTING NEAREST NEIGHBORS

    Get PDF
    Multi-label data streams are sequences of multi-label instances arriving over time to a multi-label classifier. The properties of the data stream may continuously change due to concept drift. Therefore, algorithms must adapt constantly to the new data distributions. In this paper we propose a novel ensemble method for multi-label drifting streams named Homogeneous Ensemble of Self-Adjusting Nearest Neighbors (HESAkNN). It leverages a self-adjusting kNN as a base classifier with the advantages of ensembles to adapt to concept drift in the multi-label environment. To promote diverse knowledge within the ensemble, each base classifier is given a unique subset of features and samples to train on. These samples are distributed to classifiers in a probabilistic manner that follows a Poisson distribution as in online bagging. Accompanying these mechanisms, a collection of ADWIN detectors monitor each classifier for the occurrence of a concept drift. Upon detection, the algorithm automatically trains additional classifiers in the background to attempt to capture new concepts. After a pre-determined number of instances, both active and background classifiers are compared and only the most accurate classifiers are selected to populate the new active ensemble. The experimental study compares the proposed approach with 30 other classifiers including problem transformation, algorithm adaptation, kNNs, and ensembles on 30 diverse multi-label datasets and 11 performance metrics. Results validated using non-parametric statistical analysis support the better performance of the heterogeneous ensemble and highlights the contribution of the feature and instance diversity in improving the performance of the ensemble

    Практические аспекты применения классификатора со случайными подпространствами

    Get PDF
    Актуальное современное направление, связанное с извлечением знаний из данных, во многом является применением методов классификации и распознавания образов. Традиционная постановка задачи классификации предполагает представление данных в виде множества вещественных векторов. В то же время, для многих практических задач такая постановка не является адекватной. В данной работе рассматривается применение классификатора со случайными подпространствами для решения задач с неполными данными и категориальными атрибутами. Предлагаются алгоритмы кластеризации и распределенного анализа данных.The data mining algorithms is a modern topic in the area of computational intelligence. However, many solutions are based on well-known methods of classification and pattern recognition. For traditional classification task the data are represented as the set of real-valued vectors. At the same time, such approach is not suitable for many practical tasks. This article analyzes the application of random subspace classifier for datasets with missing values and categorical attributes. The clustering and distributed data processing algorithms are suggested

    Automated Detection of COVID-19 using Chest X-Ray Images and CT Scans through Multilayer- Spatial Convolutional Neural Networks

    Get PDF
    The novel coronavirus-2019 (Covid-19), a contagious disease became a pandemic and has caused overwhelming effects on the human lives and world economy. The detection of the contagious disease is vital to avert further spread and to promptly treat the infected people. The need of automated scientific assisting diagnostic methods to identify Covid-19 in the infected people has increased since less accurate automated diagnostic methods are available. Recent studies based on the radiology imaging suggested that the imaging patterns on X-ray images and Computed Tomography (CT) scans contain leading information about Covid-19 and is considered as a potential automated diagnosis method. Machine learning and deep learning techniques combined with radiology imaging can be helpful for accurate detection of the disease. A deep learning approach based on the multilayer-Spatial Convolutional Neural Network for automatic detection of Covid-19 using chest X-ray images and CT scans is proposed in this paper. The proposed model, named as the Multilayer Spatial Covid Convolutional Neural Network (MSCovCNN), provides an automated accurate diagnostics for Covid-19 detection. The proposed model showed 93.63% detection accuracy and 97.88% AUC (Area Under Curve) for chest x-ray images and 91.44% detection accuracy and 95.92% AUC for chest CT scans, respectively. We have used 5-tiered 2D-CNN frameworks followed by the Artificial Neural Network (ANN) and softmax classifier. In the CNN each convolution layer is followed by an activation function and a Maxpooling layer. The proposed model can be used to assist the radiologists in detecting the Covid-19 and confirming their initial screening

    ПРАКТИЧЕСКИЕ АСПЕКТЫ ПРИМЕНЕНИЯ КЛАССИФИКАТОРА\ud СО СЛУЧАЙНЫМИ ПОДПРОСТРАНСТВАМИ\ud

    Get PDF
    Актуальное современное направление, связанное с извлечением знаний из данных, во многом является применением методов классификации и распознавания образов. Традиционная постановка задачи классификации предполагает представление данных в виде множества вещественных векторов. В то же время, для многих практических задач такая постановка не является адекватной. В данной работе рассматривается применение классификатора со случайными подпространствами для решения задач с неполными данными и категориальными атрибутами. Предлагаются алгоритмы кластеризации и распределенного анализа данных.\ud The data mining algorithms is a modern topic in the area of computational intelligence. However, many solutions are based on well-known methods of classification and pattern recognition. For traditional classification task the data are represented as the set of real-valued vectors. At the same time, such approach is not suitable for many practical tasks. This article analyzes the application of random subspace classifier for datasets with missing values and categorical attributes. The clustering and distributed data processing algorithms are suggested.\ud \u
    corecore