58 research outputs found

    An AUC-based Permutation Variable Importance Measure for Random Forests

    Get PDF
    The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html

    6^{6}He + α\alpha clustering in 10^{10}Be

    Full text link
    In a kinematically complete measurement of the 7^{7}Li(7^{7}Li,α\alpha6^{6}He)4^4He reaction at EiE_{i} = 8 MeV it was observed that the 10^{10}Be excited states at 9.6 and 10.2 MeV decay by 6^{6}He emission. The state at 10.2 MeV may be a member of a rotational band based on the 6.18 MeV 0+^+ state.Comment: 9 pages, RevTex, 3 Postscript figures (tarred, gzipped and uuencoded) include

    Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification

    Get PDF
    The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA and a supervised filter are applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of combining several techniques to tackle the imbalance and the high dimensionality problems, and also to evaluate the order of application that leads to the best classification performance. Experimental results demonstrate the significance of using together these two preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order corresponds to first a resampling strategy and then a feature (or extraction) selection algorithm, this is a question that still needs a much more thorough investigation in the futureThis work has partially been supported by the Spanish Ministry of Education and Science under grants CSD2007–00018, AYA2008–05965–0596 and TIN2009–14205, the Fundació Caixa Castelló–Bancaixa under grant P1–1B2009–04, and the Generalitat Valenciana under grant PROMETEO/2010/02

    Classification of Caesarean Section and Normal Vaginal Deliveries Using Foetal Heart Rate Signals and Advanced Machine Learning Algorithms

    Get PDF
    ABSTRACT – Background: Visual inspection of Cardiotocography traces by obstetricians and midwives is the gold standard for monitoring the wellbeing of the foetus during antenatal care. However, inter- and intra-observer variability is high with only a 30% positive predictive value for the classification of pathological outcomes. This has a significant negative impact on the perinatal foetus and often results in cardio-pulmonary arrest, brain and vital organ damage, cerebral palsy, hearing, visual and cognitive defects and in severe cases, death. This paper shows that using machine learning and foetal heart rate signals provides direct information about the foetal state and helps to filter the subjective opinions of medical practitioners when used as a decision support tool. The primary aim is to provide a proof-of-concept that demonstrates how machine learning can be used to objectively determine when medical intervention, such as caesarean section, is required and help avoid preventable perinatal deaths. Methodology: This is evidenced using an open dataset that comprises 506 controls (normal virginal deliveries) and 46 cases (caesarean due to pH ≤7.05 and pathological risk). Several machine-learning algorithms are trained, and validated, using binary classifier performance measures. Results: The findings show that deep learning classification achieves Sensitivity = 94%, Specificity = 91%, Area under the Curve = 99%, F-Score = 100%, and Mean Square Error = 1%. Conclusions: The results demonstrate that machine learning significantly improves the efficiency for the detection of caesarean section and normal vaginal deliveries using foetal heart rate signals compared with obstetrician and midwife predictions and systems reported in previous studies

    An insight into imbalanced Big Data classification: outcomes and challenges

    Get PDF
    Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. However, traditional learning approaches cannot be directly applied due to scalability issues. To overcome this issue, the MapReduce framework has arisen as a “de facto” solution. Basically, it carries out a “divide-and-conquer” distributed procedure in a fault-tolerant way to adapt for commodity hardware. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts, are accentuated during the data partitioning to fit the MapReduce programming style. This paper is designed under three main pillars. First, to present the first outcomes for imbalanced classification in Big Data problems, introducing the current research state of this area. Second, to analyze the behavior of standard pre-processing techniques in this particular framework. Finally, taking into account the experimental results obtained throughout this work, we will carry out a discussion on the challenges and future directions for the topic.This work has been partially supported by the Spanish Ministry of Science and Technology under Projects TIN2014-57251-P and TIN2015-68454-R, the Andalusian Research Plan P11-TIC-7765, the Foundation BBVA Project 75/2016 BigDaPTOOLS, and the National Science Foundation (NSF) Grant IIS-1447795

    Enhancement strategies for transdermal drug delivery systems: current trends and applications

    Get PDF
    corecore