Search CORE

11 research outputs found

An ensemble approach of dual base learners for multi-class classification problems

Author: Alonso Weber Juan Manuel
Gutiérrez Sánchez Germán
Ledezma Espino Agapito Ismael
Sanchis de Miguel María Araceli
Sesmero Lorente María Paz
Publication venue: 'Elsevier BV'
Publication date: 01/07/2015
Field of study

In this work, we formalise and evaluate an ensemble of classifiers that is designed for the resolution of multi-class problems. To achieve a good accuracy rate, the base learners are built with pairwise coupled binary and multi-class classifiers. Moreover, to reduce the computational cost of the ensemble and to improve its performance, these classifiers are trained using a specific attribute subset. This proposal offers the opportunity to capture the advantages provided by binary decomposition methods, by attribute partitioning methods, and by cooperative characteristics associated with a combination of redundant base learners. To analyse the quality of this architecture, its performance has been tested on different domains, and the results have been compared to other well-known classification methods. This experimental evaluation indicates that our model is, in most cases, as accurate as these methods, but it is much more efficient. (C) 2014 Elsevier B.V. All rights reserved.This research was supported by the Spanish MICINN under Projects TRA2010-20225-C03-01, TRA 2011-29454-C03-02, and TRA 2011-29454-C03-03

Universidad Carlos III de Madrid e-Archivo

Recommended from our members

Accelerating the Design of Automotive Catalyst Products Using Machine Learning Leveraging experimental data to guide new formulations

Author: Chen F
Conduit GJ
Daly C
Whitehead TM
Publication venue: Johnson Matthey Technology Review
Publication date: 01/01/2022
Field of study

The design of catalyst products to reduce harmful emissions is currently an intensive process of expert-driven discovery, taking several years to develop a product. Machine learning can accelerate this timescale, leveraging historic experimental data from related products to guide which new formulations and experiments will enable a project to most directly reach its targets. We used machine learning to accurately model 16 key performance targets for catalyst products, enabling detailed understanding of the factors governing catalyst performance and realistic suggestions of future experiments to rapidly develop more effective products. The proposed formulations are currently undergoing experimental validation.</jats:p

Apollo (Cambridge)

Efficient Network Domination for Life Science Applications

Author: Grady Stephen K
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2022
Field of study

With the ever-increasing size of data available to researchers, traditional methods of analysis often cannot scale to match problems being studied. Often only a subset of variables may be utilized or studied further, motivating the need of techniques that can prioritize variable selection. This dissertation describes the development and application of graph theoretic techniques, particularly the notion of domination, for this purpose. In the first part of this dissertation, algorithms for vertex prioritization in the field of network controllability are studied. Here, the number of solutions to which a vertex belongs is used to classify said vertex and determine its suitability in controlling a network. Novel efficient scalable algorithms are developed and analyzed. Empirical tests demonstrate the improvement of these algorithms over those already established in the literature. The second part of this dissertation concerns the prioritization of genes for loss-of-function allele studies in mice. The International Mouse Phenotyping Consortium leads the initiative to develop a loss-of-function allele for each protein coding gene in the mouse genome. Only a small proportion of untested genes can be selected for further study. To address the need to prioritize genes, a generalizable data science strategy is developed. This strategy models genes as a gene-similarity graph, and from it selects subset that will be further characterized. Empirical tests demonstrate the method’s utility over that of pseudorandom selection and less computationally demanding methods. Finally, part three addresses the important task of preprocessing in the context of noisy public health data. Many public health databases have been developed to collect, curate, and store a variety of environmental measurements. Idiosyncrasies in these measurements, however, introduce noise to data found in these databases in several ways including missing, incorrect, outlying, and incompatible data. Beyond noisy data, multiple measurements of similar variables can introduce problems of multicollinearity. Domination is again employed in a novel graph method to handle autocorrelation. Empirical results using the Public Health Exposome dataset are reported. Together these three parts demonstrate the utility of subset selection via domination when applied to a multitude of data sources from a variety of disciplines in the life sciences

University of Tennessee, Knoxville: Trace

Vücut Yağ Yüzdesi Tahmini İçin Özellik Seçim Yöntemlerinin Karşılaştırılması

Author: Asude Altıparmak Bilgin
Burhan Baraklı
Publication venue: Düzce University
Publication date: 01/10/2023
Field of study

Çağımızın yaygın olarak görülen sağlık problemlerinden biri olan obezite, kişinin yaşam kalitesine olumsuz etkisinin yanında birçok rahatsızlığa da sebep olmaktadır. Vücut yağ yüzdesi, obezitenin teşhis edilmesinde en önemli göstergedir. Vücut yağ yüzdesinin hızlı, kolay, maliyetsiz ve yüksek doğruluk ile belirlenmesi ise en az obezitenin teşhis edilebilmesi kadar önemlidir. Antropometrik verilerden hesaplanabilen vücut yağ yüzdesi değerini makine öğrenmesi algoritmaları ile güvenli bir şekilde hesaplamak mümkündür. Ancak yüksek boyutlu, alakasız ve gereksiz veriler makine öğrenmesi algoritmalarının doğruluğunu saptırmakta ve modelin eğitim süresini arttırmaktadır. Makine öğrenmesi algoritmalarını daha az özellik ile kullanarak daha yüksek doğruluğun elde edilmesini sağlayan özellik seçim algoritmaları bulunmaktadır. Bu çalışmada vücut yağ yüzdesi tahmini için yedi farklı özellik seçim algoritması karşılaştırılıp daha az özellik ile daha yüksek doğrulukta sonuçların elde edilmesi sağlanmıştır. Özellik seçim yöntemlerinin farklı modellere etkisini incelemek için dört makine öğrenmesi yöntemi kullanılmıştır. Bu makine öğrenmesi algoritmalarının eğitim süreleri karşılaştırılmıştır. Deneysel çalışmalar sonucunda özellik seçim yöntemleri kullanılarak daha az özellik ile modelin eğitimi için daha kısa süre harcanarak daha yüksek doğrulukta tahminler elde edilebileceği gösterilmiştir

Directory of Open Access Journals

Pathophysiological characterization of traumatic brain injury using novel analytical methods

Author: Åkerlund Cecilia
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/05/2023
Field of study

Severity of traumatic brain injury is usually classified by Glasgow coma scale (GCS) as “mild”, "moderate" or "severe’, which does not capture the heterogeneity of the disease. According to current guidelines, intracranial pressure (ICP) should not exceed 22 mmHg, with no further recommendations concerning individualization or tolerable duration of intracranial hypertension. The aims of this thesis were to identify subgroups of patients beyond characterization using GCS, and to investigate the impact of duration and magnitude of intracranial hypertension on outcome, using data from the observational prospective study Collaborative European neurotrauma effectiveness research in TBI (CENTER-TBI). To investigate the temporal aspect of tolerable ICP elevations, we examined the correlation between dose of ICP and outcome represented by 6-month Glasgow outcome scale extended (GOSE). ICP dose was represented both by the number of events above thresholds for ICP magnitude and duration and by area under the ICP curve (i.e., “pressure time dose” (PTD)). A variation in tolerable ICP thresholds of 18 mmHg +/- 4 mmHg (2 standard deviations (SD)) for events with duration longer than five minutes was identified using a bootstrapping technique. PTD was correlated to both mortality and unfavorable outcome. A cerebrovascular autoregulation (CA) dependent ICP tolerability was identified. If CA was impaired, no tolerable ICP magnitude and duration thresholds were identified, while if CA was intact, both 19 mmHg for 5 minutes or longer and 15 mmHg for 50 minutes or longer were correlated to worse outcome. While no significant difference in PTD was seen between favorable and unfavorable outcome if CA was intact, there was a significant difference if CA was impaired. In a multivariable analysis, PTD did not remain a significant predictor of outcome when adjusting for other known predictors in TBI. In a causal inference analysis, both cerebrovascular autoregulation status and ICP-lowering therapies represented by the therapy intensity level (TIL) have a directional relationship with outcome. However, no direct causal relationship of ICP towards outcome was found. By applying an unsupervised clustering method, we identified six distinct admission clusters defined by GCS, lactate, oxygen saturation (SpO2), creatinine, glucose, base excess, pH, PaCO2, and body temperature. These clusters can be summarized in clinical presentation and metabolic profile. When clustering longitudinal features during the first week in the intensive care unit (ICU), no optimal number of clusters could be seen. However, glucose variation, a panel of brain biomarkers, and creatinine consistently described trajectories. Although no information on outcome was included in the models, both admission clusters and trajectories showed clear outcome differences, with mortality from 7 to 40% in the admission clusters and 4 to 85% in the trajectories. Adding cluster or trajectory labels to the established outcome prediction IMPACT model significantly improved outcome predictions. The results in this thesis support the importance of cerebrovascular autoregulation status as it was found that CA status was more informative towards outcome than ICP magnitude and duration. There was a variation in tolerable ICP intensity and duration dependent on whether CA was intact. Distinct clusters defined by GCS and metabolic profiles related to outcome suggest the importance of an extracranial evaluation in addition to GCS in TBI patients. Longitudinal trajectories of TBI patients in the ICU are highly characterized by glucose variation, brain biomarkers and creatinine

Publications from Karolinska Institutet

Is mutual information adequate for feature selection in regression?

Author: Doquire Gauthier
Frénay Benoît
Verleysen Michel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Feature selection is an important preprocessing step for many high-dimensional regression problems. One of the most common strategies is to select a relevant feature subset based on the mutual information criterion. However, no connection has been established yet between the use of mutual information and a regression error criterion in the machine learning literature. This is obviously an important lack, since minimising such a criterion is eventually the objective one is interested in. This paper demonstrates that under some reasonable assumptions, features selected with the mutual information criterion are the ones minimising the mean squared error and the mean absolute error. On the contrary, it is also shown that the mutual information criterion can fail in selecting optimal features in some situations that we characterise. The theoretical developments presented in this work are expected to lead in practice to a critical and efficient use of the mutual information for feature selection

Crossref

DIAL UCLouvain

Is mutual information adequate for feature selection in regression?

Author
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref