128 research outputs found

    Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

    Full text link
    The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the kk-nearest neighbors (kkNN) rule in the context of binary classification. Here we focus on the leave-pp-out cross-validation (LppO) used to assess the performance of the kkNN classifier. Remarkably this LppO estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the LppO estimator applied to the kkNN classifier. Such results are obtained first by exploiting the connection between the LppO estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L11O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the LppO estimator and the classification error/risk of the kkNN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments

    Risk Bounds for Embedded Variable Selection in Classification Trees

    Get PDF
    International audienceThe problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out. Simulation studies are performed which confirm that the hold-out procedure mimics the form of the proposed penalized criterion

    Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

    Get PDF
    The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the kk-nearest neighbors (kkNN) rule in the context of binary classification. Here we focus on the leave-pp-out cross-validation (LppO) used to assess the performance of the kkNN classifier. Remarkably this LppO estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the LppO estimator applied to the kkNN classifier. Such results are obtained first by exploiting the connection between the LppO estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L11O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the LppO estimator and the classification error/risk of the kkNN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments

    Spotting effect in microarray experiments

    Get PDF
    BACKGROUND: Microarray data must be normalized because they suffer from multiple biases. We have identified a source of spatial experimental variability that significantly affects data obtained with Cy3/Cy5 spotted glass arrays. It yields a periodic pattern altering both signal (Cy3/Cy5 ratio) and intensity across the array. RESULTS: Using the variogram, a geostatistical tool, we characterized the observed variability, called here the spotting effect because it most probably arises during steps in the array printing procedure. CONCLUSIONS: The spotting effect is not appropriately corrected by current normalization methods, even by those addressing spatial variability. Importantly, the spotting effect may alter differential and clustering analysis

    New breeding strategies for mixed cropping in a barley (H. vulgare L.) pea (P. sativum L.) model system

    Get PDF
    Crop mixtures consisting of cereals and legumes have proven as a well-adapted arrangement due to their complementarity towards important resources, especially nitrogen. Crop mixtures combine high yield performance and yield stability. They can contribute to a diversified cropping landscape and adaptation to climate change. The search for alternatives to protein imports from overseas and investments in post-harvest separation technologies are currently fostering their adoption by farmers in Western-Europe, especially under organic and lowinput farming conditions. However, screening and breeding for mixed cropping has hardly been explored for arable crops. Thus, the objective was to develop novel breeding strategies and tools specifically for mixed cropping systems. We tested mixtures and pure stands of a morphologically diverse panel of 32 spring pea (Pisum sativum L.) and eight spring barley (Hordeum vulgare L.) cultivars in replicated field trials at two locations in Switzerland over two years with pea as the focal species. In an incomplete factorial design (Fig. 1) we determined general and specific mixing ability (GMA and SMA, respectively) of pea and barley in analogy to GCA and SCA (general and specific combining ability) in hybrid breeding. Key traits, such as early vigour, canopy height and leaf morphology parameters were measured, due to their potential use as covariates or indirect selection criteria for mixing ability. Our results show that total yield of mixtures can only partly be explained by pea pure stand yields (R² = 0.35), making the latter a weak predictor for mixture yield. Pea GMA variance was predominant over SMA variance which underlines the potential for breeding for mixing ability using a tester. Key traits, such as pea stipule area were correlated (R² = 0.56) with total mixture yield and merit further investigation as indirect selection criteria. The separated yield fractions of pea and barley in mixtures allow to decompose GMA of pea into the producer effect of pea cultivar on pea fraction yield and the associate effect of pea on barley fraction yield. This novel concept allows to elucidate key trait effects on fraction yields of pea and barley which might otherwise be masked when solely using a GMA approach

    Development of genetic models to breed for mixed cropping systems

    Get PDF
    Introduction Mixed cropping, i.e. mixing different crops in the same field, provides agronomic advantages as increased productivity under low inputs conditions (e.g. for organic farming: Bedoussac et al. 2015) and higher yield stability (Raseduzzaman and Jensen 2017). In mixed cropping, choosing the right cultivars is critical for the performance of the mixture, as shown for pea-barley mixtures (Hauggaard-Nielsen and Jensen 2001) and maize-bean mixtures (Hoppe 2016). As performance in pure stand can strongly diverge from performance in mixture, estimating the ability of a cultivar to be mixed with another crop is therefore of utmost importance. For this purpose, concepts of General and Specific Combining Ability in hybrid breeding (Griffing 1956) have been adapted to cultivar and crop mixtures. Thus, these effects are called General Mixing Ability (GMA) and Specific Mixing Ability (SMA) (Federer 1993). In contrast to intraspecific mixtures, interspecific mixed cropping experiments often provide additional information, since harvested lots can be separated into their different grain fractions. Until now, statistical developments mobilizing the additional information provided by separated harvest lots to estimate mixing abilities in intercropping experiments have been neglected. The concept of Producer- and Associate-effects (abbreviated Pr and As, respectively) describes interactions between varieties sown in alternate row trials (Forst 2018). The producer effect Pr is the average performance of a cultivar grown in mixture with other crop-species, whereas the associate effect As is the average effect of a cultivar on the performance of the mixing partner. We used the fraction yields of a spring-pea (Pisum sativum L.) and spring-barley (Hordeum vulgare L.) mixed cropping experiment to determine Pr and As effects of different pea genotypes. The additional information provided by this approach is biologically more informative than GMA/SMA estimates, since it better reflects competition and facilitation occurring between different cultivars of the two crop-species

    Die Erbsensorte entscheidet über den Erfolg der Mischung

    Get PDF
    Sommer-Eiweisserbsen und zweizeilige Gerste sind gute Mischungspartner. Doch gewisse Sortenkombinationen sind besser als andere. Das hat das FiBL in einem zweijährigen Versuch gezeigt
    corecore