305 research outputs found

    On the suitability of combining feature selection and resampling to manage data complexity

    Get PDF
    The effectiveness of a learning task depends on data com- plexity (class overlap, class imbalance, irrelevant features, etc.). When more than one complexity factor appears, two or more preprocessing techniques should be applied. Nevertheless, no much effort has been de- voted to investigate the importance of the order in which they can be used. This paper focuses on the joint use of feature reduction and bal- ancing techniques, and studies which could be the application order that leads to the best classification results. This analysis was made on a spe- cific problem whose aim was to identify the melodic track given a MIDI file. Several experiments were performed from different imbalanced 38- dimensional training sets with many more accompaniment tracks than melodic tracks, and where features were aggregated without any correla- tion study. Results showed that the most effective combination was the ordered use of resampling and feature reduction techniques

    Assessing the reliability of ensemble forecasting systems under serial dependence

    Get PDF
    The problem of testing the reliability of ensemble forecasting systems is revisited. A popular tool to assess the reliability of ensemble forecasting systems (for scalar verifications) is the rank histogram; this histogram is expected to be more or less flat, since for a reliable ensemble, the ranks are uniformly distributed among their possible outcomes. Quantitative tests for flatness (e.g. Pearson's goodness–of–fit test) have been suggested; without exception though, these tests assume the ranks to be a sequence of independent random variables, which is not the case in general as can be demonstrated with simple toy examples. In this paper, tests are developed that take the temporal correlations between the ranks into account. A refined analysis exploiting the reliability property shows that the ranks still exhibit strong decay of correlations. This property is key to the analysis, and the proposed tests are valid for general ensemble forecasting systems with minimal extraneous assumptions

    Electron-hadron shower discrimination in a liquid argon time projection chamber

    Get PDF
    By exploiting structural differences between electromagnetic and hadronic showers in a multivariate analysis we present an efficient Electron-Hadron discrimination algorithm for liquid argon time projection chambers, validated using Geant4 simulated data

    Process mining meets abstract interpretation

    Get PDF
    The discovery of process models out of system traces is an interesting problem that has received significant attention in the last years. In this work, a theory for the derivation of a Petri net from a set of traces is presented. The method is based on the theory of abstract interpretation, which has been applied successfully in other areas. The principal application of the theory presented is Process Mining, an area that tries to incorporate the use of formal models both in the design and use of information systems.Postprint (published version

    Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification

    Get PDF
    The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA and a supervised filter are applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of combining several techniques to tackle the imbalance and the high dimensionality problems, and also to evaluate the order of application that leads to the best classification performance. Experimental results demonstrate the significance of using together these two preprocessing tools to improve the performance of hyperspectral imagery classification. Although it seems that the most effective order corresponds to first a resampling strategy and then a feature (or extraction) selection algorithm, this is a question that still needs a much more thorough investigation in the futureThis work has partially been supported by the Spanish Ministry of Education and Science under grants CSD2007–00018, AYA2008–05965–0596 and TIN2009–14205, the Fundació Caixa Castelló–Bancaixa under grant P1–1B2009–04, and the Generalitat Valenciana under grant PROMETEO/2010/02
    corecore