31 research outputs found

    Attribute Selection for Classification

    Get PDF
    The selection of attributes used to construct a classification model is crucial in machine learning, in particular with instance similarity methods. We present a new algorithm to select and rank attributes based on weighing features according to their ability to help class prediction. The algorithm uses the same structure that holds training records for classification. Attribute values and their classes are projected into a one-dimensional space, to account for various degrees of the relationship between them. With the user deciding on the degree of this relation, any of several potential solutions can be used as criterion to determine attribute relevance. This low complexity algorithm increases classification predictive accuracy and also helps to reduce the feature dimension problem

    Характеристика для вибору моделей у ансамблі класифікаторів

    No full text
    Проведено аналіз досліджень та підходів практичного застосування ансамблів та визначено характерні фактори впливу на комбінацію моделей. Фактори несуть визначальний характер та притаманні комбінаціям застосувань. Обґрунтовано необхідність використання ознак моделей, які характерні тільки для ансамблів. Встановлені характерні особливості ансамблів та визначено необхідність розробки специфічних характеристик застосувань в розрізі комбінації рішень з визначення їх характерних ознак. Ці ознаки, а саме точність та відмінність здійснюють визначальний вплив на вибір та застосовність рішень в ансамблях і дозволяють вибрати найбільш дієву комбінацію. Запропоновано до використання таку характеристику рішення в ансамблі як відмінність певного рангу за параметром точності. Ця характеристики моделей дозволяє здійснювати їх вибір та характеризує модель в ансамблі. Вона застосовна тільки у випадку комбінації моделей. Вказує на відмінність однієї моделі від іншої та враховує точність моделі. Застосовується для моделей різної природи. Дозволяє визначити глибину відмінності моделей та дозволяє поєднуватись і з іншими відомими характеристиками класифікаторів. Особливість її полягає в тому, що вона дозволяє дати оцінку використання рішень в ансамблі та здійснювати вибір рішень.Проведен анализ исследований и подходов практического применения ансамблей и определены характерные факторы влияния на комбинацию моделей. Факторы несут определяющий характер и присущи комбинациям приложений. Обоснована необходимость использования признаков моделей, которые характерны только для ансамблей. Установлены характерные особенности ансамблей и определена необходимость разработки специфических характеристик приложений в разрезе комбинации решений по определению их характерных признаков. Эти признаки, а именно точность и отличие осуществляют определяющее влияние на выбор и применимость решений в ансамблях и позволяют выбрать наиболее действенную комбинацию. Предложено к использованию такую характеристику решение в ансамбле как различие определенного ранга по параметру точности. Эта характеристики моделей позволяет осуществлять их выбор и характеризует модель в ансамбле. Она применима только в случае комбинации моделей. Указывает на различие одной модели от другой и учитывает точность модели. Применяется для определения моделей различной природы. Позволяет определить глубину различия моделей и позволяет сочетаться и с другими известными характеристиками классификаторов. Особенность ее заключается в том, что она позволяет дать оценку использования решений в ансамбле и осуществлять выбор решений.The analysis of researches and approaches of practical application of ensembles is carried out and the characteristic factors of influence on a combination of models are determined. Factors are determinative and inherent in combinations of models. The necessity of using patterns of models that are characteristic only for ensembles is substantiated. The characteristic features of ensembles are established and the necessity of developing specific characteristics of models in terms of a combination of solutions by definition of their characteristic features is determined. These attributes, namely precision and distinction have a decisive influence on the choice and applicability of solutions in ensembles and allow to choose the most effective combination. We propose to use such a characteristic of the models in the ensemble as distinction of a certain rank by precision parameter. These characteristics of the models allow them to be chosen and characterize the model in the ensemble. It applies only in the case of a combination of models and indicates the distinction between one model and another and takes into account the precision of the model. This approach allows you to define models of different nature, to determine the depth of model distinction and allows it to be combined with other well-known characteristics of classifiers. Its peculiarity consists in the fact that it allows to evaluate the use of solutions in the ensemble and to carry out the selection of models

    Échantillonnage progressif guidé pour stabiliser la courbe d'apprentissage

    Get PDF
    National audienceL'un des enjeux de l'apprentissage artificiel est de pouvoir fonctionner avec des volumes de données toujours plus grands. Bien qu'il soit généralement admis que plus un ensemble d'apprentissage est large et plus les résultats sont performants, il existe des limites à la masse d'informations qu'un algorithme d'apprentissage peut manipuler. Pour résoudre ce problème, nous proposons d'améliorer la méthode d'échantillonnage progressif en guidant la construction d'un ensemble d'apprentissage réduit à partir d'un large ensemble de données. L'apprentissage à partir de l'ensemble réduit doit conduire à des performances similaires à l'apprentissage effectué avec l'ensemble complet. Le guidage de l'échantillonnage s'appuie sur une connaissance a priori qui accélère la convergence de l'algorithme. Cette approche présente trois avantages : 1) l'ensemble d'apprentissage réduit est composé des cas les plus représentatifs de l'ensemble complet; 2) la courbe d'apprentissage est stabilisée; 3) la détection de convergence est accélérée. L'application de cette méthode à des données classiques et à des données provenant d'unités de soins intensifs révèle qu'il est possible de réduire de façon significative un ensemble d'apprentissage sans diminuer la performance de l'apprentissage

    A Hybrid Machine Learning Framework for Predicting Students’ Performance in Virtual Learning Environment

    Get PDF
    Virtual Learning Environments (VLE), such as Moodle and Blackboard, store vast data to help identify students\u27 performance and engagement. As a result, researchers have been focusing their efforts on assisting educational institutions in providing machine learning models to predict at-risk students and improve their performance. However, it requires an efficient approach to construct a model that can ultimately provide accurate predictions. Consequently, this study proposes a hybrid machine learning framework to predict students\u27 performance using eight classification algorithms and three ensemble methods (Bagging, Boosting, Voting) to determine the best-performing predictive model. In addition, this study used filter-based and wrapper-based feature selection techniques to select the best features of the dataset related to students\u27 performance. The obtained results reveal that the ensemble methods recorded higher predictive accuracy when compared to single classifiers. Furthermore, the accuracy of the models improved due to the feature selection techniques utilized in this study

    Development of an Algorithm for Multicriteria Optimization of Deep Learning Neural Networks

    Get PDF
    Nowadays, machine learning methods are actively used to process big data. A promising direction is neural networks, in which structure optimization occurs on the principles of self-configuration. Genetic algorithms are applied to solve this nontrivial problem. Most multicriteria evolutionary algorithms use a procedure known as non-dominant sorting to rank decisions. However, the efficiency of procedures for adding points and updating rank values in non-dominated sorting (incremental non-dominated sorting) remains low. In this regard, this research improves the performance of these algorithms, including the condition of an asynchronous calculation of the fitness of individuals. The relevance of the research is determined by the fact that although many scholars and specialists have studied the self-tuning of neural networks, they have not yet proposed a comprehensive solution to this problem. In particular, algorithms for efficient non-dominated sorting under conditions of incremental and asynchronous updates when using evolutionary methods of multicriteria optimization have not been fully developed to date. To achieve this goal, a hybrid co-evolutionary algorithm was developed that significantly outperforms all algorithms included in it, including error-back propagation and genetic algorithms that operate separately. The novelty of the obtained results lies in the fact that the developed algorithms have minimal asymptotic complexity. The practical value of the developed algorithms is associated with the fact that they make it possible to solve applied problems of increased complexity in a practically acceptable time. Doi: 10.28991/HIJ-2023-04-01-011 Full Text: PD

    Maximum Likelihood Topology Preserving Ensembles

    Get PDF
    Statistical re-sampling techniques have been used extensively and successfully in the machine learning approaches for generations of classifier and predictor ensembles. It has been frequently shown that combining so called unstable predictors has a stabilizing effect on and improves the performance of the prediction system generated in this way. In this paper we use the re-sampling techniques in the context of a topology preserving map which can be used for scale invariant classification, taking into account the fact that it models the residual after feedback with a family of distributions and finds filters which make the residuals most likely under this model. This model is applied to artificial data sets and compared with a similar version based on the Self Organising Map (SOM)

    Supporting System for Detecting Pathologies

    Get PDF
    Arrays CGH make possible the realization of tests on patients for the detection of mutations in chromosomal regions. Detecting these mutations allows to carry out diagnoses and to complete studies of sequencing in relevant regions of the DNA. The analysis process of arrays CGH requires the use of mechanisms that facilitate the data processing by specialized personnel since traditionally, a segmentation process is needed and starting from the segmented data, a visual analysis of the information is carried out for the selection of relevant segments. In this study a CBR system is presented as a supporting system for the extraction of relevant information in arrays CGH that facilitates the process of analysis and its interpretation

    A recurrent neural network architecture for biomedical event trigger classification

    Get PDF
    A “biomedical event” is a broad term used to describe the roles and interactions between entities (such as proteins, genes and cells) in a biological system. The task of biomedical event extraction aims at identifying and extracting these events from unstructured texts. An important component in the early stage of the task is biomedical trigger classification which involves identifying and classifying words/phrases that indicate an event. In this thesis, we present our work on biomedical trigger classification developed using the multi-level event extraction dataset. We restrict the scope of our classification to 19 biomedical event types grouped under four broad categories - Anatomical, Molecular, General and Planned. While most of the existing approaches are based on traditional machine learning algorithms which require extensive feature engineering, our model relies on neural networks to implicitly learn important features directly from the text. We use natural language processing techniques to transform the text into vectorized inputs that can be used in a neural network architecture. As per our knowledge, this is the first time neural attention strategies are being explored in the area of biomedical trigger classification. Our best results were obtained from an ensemble of 50 models which produced a micro F-score of 79.82%, an improvement of 1.3% over the previous best score
    corecore