11 research outputs found

    An ensemble approach of dual base learners for multi-class classification problems

    Get PDF
    In this work, we formalise and evaluate an ensemble of classifiers that is designed for the resolution of multi-class problems. To achieve a good accuracy rate, the base learners are built with pairwise coupled binary and multi-class classifiers. Moreover, to reduce the computational cost of the ensemble and to improve its performance, these classifiers are trained using a specific attribute subset. This proposal offers the opportunity to capture the advantages provided by binary decomposition methods, by attribute partitioning methods, and by cooperative characteristics associated with a combination of redundant base learners. To analyse the quality of this architecture, its performance has been tested on different domains, and the results have been compared to other well-known classification methods. This experimental evaluation indicates that our model is, in most cases, as accurate as these methods, but it is much more efficient. (C) 2014 Elsevier B.V. All rights reserved.This research was supported by the Spanish MICINN under Projects TRA2010-20225-C03-01, TRA 2011-29454-C03-02, and TRA 2011-29454-C03-03

    Efficient Network Domination for Life Science Applications

    Get PDF
    With the ever-increasing size of data available to researchers, traditional methods of analysis often cannot scale to match problems being studied. Often only a subset of variables may be utilized or studied further, motivating the need of techniques that can prioritize variable selection. This dissertation describes the development and application of graph theoretic techniques, particularly the notion of domination, for this purpose. In the first part of this dissertation, algorithms for vertex prioritization in the field of network controllability are studied. Here, the number of solutions to which a vertex belongs is used to classify said vertex and determine its suitability in controlling a network. Novel efficient scalable algorithms are developed and analyzed. Empirical tests demonstrate the improvement of these algorithms over those already established in the literature. The second part of this dissertation concerns the prioritization of genes for loss-of-function allele studies in mice. The International Mouse Phenotyping Consortium leads the initiative to develop a loss-of-function allele for each protein coding gene in the mouse genome. Only a small proportion of untested genes can be selected for further study. To address the need to prioritize genes, a generalizable data science strategy is developed. This strategy models genes as a gene-similarity graph, and from it selects subset that will be further characterized. Empirical tests demonstrate the method’s utility over that of pseudorandom selection and less computationally demanding methods. Finally, part three addresses the important task of preprocessing in the context of noisy public health data. Many public health databases have been developed to collect, curate, and store a variety of environmental measurements. Idiosyncrasies in these measurements, however, introduce noise to data found in these databases in several ways including missing, incorrect, outlying, and incompatible data. Beyond noisy data, multiple measurements of similar variables can introduce problems of multicollinearity. Domination is again employed in a novel graph method to handle autocorrelation. Empirical results using the Public Health Exposome dataset are reported. Together these three parts demonstrate the utility of subset selection via domination when applied to a multitude of data sources from a variety of disciplines in the life sciences

    Vücut Yağ Yüzdesi Tahmini İçin Özellik Seçim Yöntemlerinin Karşılaştırılması

    Get PDF
    Çağımızın yaygın olarak görülen sağlık problemlerinden biri olan obezite, kişinin yaşam kalitesine olumsuz etkisinin yanında birçok rahatsızlığa da sebep olmaktadır. Vücut yağ yüzdesi, obezitenin teşhis edilmesinde en önemli göstergedir. Vücut yağ yüzdesinin hızlı, kolay, maliyetsiz ve yüksek doğruluk ile belirlenmesi ise en az obezitenin teşhis edilebilmesi kadar önemlidir. Antropometrik verilerden hesaplanabilen vücut yağ yüzdesi değerini makine öğrenmesi algoritmaları ile güvenli bir şekilde hesaplamak mümkündür. Ancak yüksek boyutlu, alakasız ve gereksiz veriler makine öğrenmesi algoritmalarının doğruluğunu saptırmakta ve modelin eğitim süresini arttırmaktadır. Makine öğrenmesi algoritmalarını daha az özellik ile kullanarak daha yüksek doğruluğun elde edilmesini sağlayan özellik seçim algoritmaları bulunmaktadır. Bu çalışmada vücut yağ yüzdesi tahmini için yedi farklı özellik seçim algoritması karşılaştırılıp daha az özellik ile daha yüksek doğrulukta sonuçların elde edilmesi sağlanmıştır. Özellik seçim yöntemlerinin farklı modellere etkisini incelemek için dört makine öğrenmesi yöntemi kullanılmıştır. Bu makine öğrenmesi algoritmalarının eğitim süreleri karşılaştırılmıştır. Deneysel çalışmalar sonucunda özellik seçim yöntemleri kullanılarak daha az özellik ile modelin eğitimi için daha kısa süre harcanarak daha yüksek doğrulukta tahminler elde edilebileceği gösterilmiştir

    Pathophysiological characterization of traumatic brain injury using novel analytical methods

    Get PDF
    Severity of traumatic brain injury is usually classified by Glasgow coma scale (GCS) as “mild”, "moderate" or "severe’, which does not capture the heterogeneity of the disease. According to current guidelines, intracranial pressure (ICP) should not exceed 22 mmHg, with no further recommendations concerning individualization or tolerable duration of intracranial hypertension. The aims of this thesis were to identify subgroups of patients beyond characterization using GCS, and to investigate the impact of duration and magnitude of intracranial hypertension on outcome, using data from the observational prospective study Collaborative European neurotrauma effectiveness research in TBI (CENTER-TBI). To investigate the temporal aspect of tolerable ICP elevations, we examined the correlation between dose of ICP and outcome represented by 6-month Glasgow outcome scale extended (GOSE). ICP dose was represented both by the number of events above thresholds for ICP magnitude and duration and by area under the ICP curve (i.e., “pressure time dose” (PTD)). A variation in tolerable ICP thresholds of 18 mmHg +/- 4 mmHg (2 standard deviations (SD)) for events with duration longer than five minutes was identified using a bootstrapping technique. PTD was correlated to both mortality and unfavorable outcome. A cerebrovascular autoregulation (CA) dependent ICP tolerability was identified. If CA was impaired, no tolerable ICP magnitude and duration thresholds were identified, while if CA was intact, both 19 mmHg for 5 minutes or longer and 15 mmHg for 50 minutes or longer were correlated to worse outcome. While no significant difference in PTD was seen between favorable and unfavorable outcome if CA was intact, there was a significant difference if CA was impaired. In a multivariable analysis, PTD did not remain a significant predictor of outcome when adjusting for other known predictors in TBI. In a causal inference analysis, both cerebrovascular autoregulation status and ICP-lowering therapies represented by the therapy intensity level (TIL) have a directional relationship with outcome. However, no direct causal relationship of ICP towards outcome was found. By applying an unsupervised clustering method, we identified six distinct admission clusters defined by GCS, lactate, oxygen saturation (SpO2), creatinine, glucose, base excess, pH, PaCO2, and body temperature. These clusters can be summarized in clinical presentation and metabolic profile. When clustering longitudinal features during the first week in the intensive care unit (ICU), no optimal number of clusters could be seen. However, glucose variation, a panel of brain biomarkers, and creatinine consistently described trajectories. Although no information on outcome was included in the models, both admission clusters and trajectories showed clear outcome differences, with mortality from 7 to 40% in the admission clusters and 4 to 85% in the trajectories. Adding cluster or trajectory labels to the established outcome prediction IMPACT model significantly improved outcome predictions. The results in this thesis support the importance of cerebrovascular autoregulation status as it was found that CA status was more informative towards outcome than ICP magnitude and duration. There was a variation in tolerable ICP intensity and duration dependent on whether CA was intact. Distinct clusters defined by GCS and metabolic profiles related to outcome suggest the importance of an extracranial evaluation in addition to GCS in TBI patients. Longitudinal trajectories of TBI patients in the ICU are highly characterized by glucose variation, brain biomarkers and creatinine

    Is mutual information adequate for feature selection in regression?

    No full text
    Feature selection is an important preprocessing step for many high-dimensional regression problems. One of the most common strategies is to select a relevant feature subset based on the mutual information criterion. However, no connection has been established yet between the use of mutual information and a regression error criterion in the machine learning literature. This is obviously an important lack, since minimising such a criterion is eventually the objective one is interested in. This paper demonstrates that under some reasonable assumptions, features selected with the mutual information criterion are the ones minimising the mean squared error and the mean absolute error. On the contrary, it is also shown that the mutual information criterion can fail in selecting optimal features in some situations that we characterise. The theoretical developments presented in this work are expected to lead in practice to a critical and efficient use of the mutual information for feature selection

    Is mutual information adequate for feature selection in regression?

    No full text
    corecore