128 research outputs found
Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier
The present work aims at deriving theoretical guaranties on the behavior of
some cross-validation procedures applied to the -nearest neighbors (NN)
rule in the context of binary classification. Here we focus on the
leave--out cross-validation (LO) used to assess the performance of the
NN classifier. Remarkably this LO estimator can be efficiently computed
in this context using closed-form formulas derived by
\cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and
exponential concentration inequalities for the LO estimator applied to the
NN classifier. Such results are obtained first by exploiting the connection
between the LO estimator and U-statistics, and second by making an intensive
use of the generalized Efron-Stein inequality applied to the LO estimator.
One other important contribution is made by deriving new quantifications of the
discrepancy between the LO estimator and the classification error/risk of
the NN classifier. The optimality of these bounds is discussed by means of
several lower bounds as well as simulation experiments
Risk Bounds for Embedded Variable Selection in Classification Trees
International audienceThe problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out. Simulation studies are performed which confirm that the hold-out procedure mimics the form of the proposed penalized criterion
Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier
The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the -nearest neighbors (NN) rule in the context of binary classification. Here we focus on the leave--out cross-validation (LO) used to assess the performance of the NN classifier. Remarkably this LO estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the LO estimator applied to the NN classifier. Such results are obtained first by exploiting the connection between the LO estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the LO estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the LO estimator and the classification error/risk of the NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments
Spotting effect in microarray experiments
BACKGROUND: Microarray data must be normalized because they suffer from multiple biases. We have identified a source of spatial experimental variability that significantly affects data obtained with Cy3/Cy5 spotted glass arrays. It yields a periodic pattern altering both signal (Cy3/Cy5 ratio) and intensity across the array. RESULTS: Using the variogram, a geostatistical tool, we characterized the observed variability, called here the spotting effect because it most probably arises during steps in the array printing procedure. CONCLUSIONS: The spotting effect is not appropriately corrected by current normalization methods, even by those addressing spatial variability. Importantly, the spotting effect may alter differential and clustering analysis
New breeding strategies for mixed cropping in a barley (H. vulgare L.) pea (P. sativum L.) model system
Crop mixtures consisting of cereals and legumes have proven as a well-adapted arrangement due to their complementarity towards important resources, especially nitrogen. Crop mixtures combine high yield performance and yield stability. They can contribute to a diversified cropping landscape and adaptation to climate change. The search for alternatives to protein imports from overseas and investments in post-harvest separation technologies are currently fostering their adoption by farmers in Western-Europe, especially under organic and lowinput
farming conditions. However, screening and breeding for mixed cropping has hardly been explored for arable crops. Thus, the objective was to develop novel breeding strategies and tools specifically for mixed cropping systems.
We tested mixtures and pure stands of a morphologically diverse panel of 32 spring pea (Pisum sativum L.) and eight spring barley (Hordeum vulgare L.) cultivars in replicated field trials at two locations in Switzerland over two years with pea as the focal species. In an incomplete factorial design (Fig. 1) we determined general and specific mixing ability (GMA and SMA, respectively) of pea and barley in analogy to GCA and SCA (general and specific combining ability) in hybrid breeding. Key traits, such as early vigour, canopy height and leaf morphology parameters were measured, due to their potential use as covariates or indirect selection criteria for mixing ability. Our results show that total yield of mixtures can only partly be explained by pea pure stand yields (R² = 0.35), making the latter a weak predictor for mixture yield. Pea GMA variance was predominant over SMA variance which underlines the potential for breeding for mixing ability using a tester. Key traits, such as pea stipule area were correlated (R² = 0.56) with total mixture yield and merit further investigation as indirect selection criteria. The separated yield fractions of pea and barley in mixtures allow to decompose GMA of pea into the producer effect of pea cultivar on pea fraction yield and the associate effect of pea on barley fraction yield. This novel concept allows to elucidate key trait effects on fraction yields of pea and barley which might otherwise be masked when solely using a GMA approach
Development of genetic models to breed for mixed cropping systems
Introduction
Mixed cropping, i.e. mixing different crops in the same field, provides agronomic advantages as increased productivity under low inputs conditions (e.g. for organic farming: Bedoussac et al. 2015) and higher yield stability (Raseduzzaman and Jensen 2017). In mixed cropping, choosing the right cultivars is critical for the performance of the mixture, as shown for pea-barley mixtures (Hauggaard-Nielsen and Jensen 2001) and maize-bean mixtures (Hoppe 2016). As performance in pure stand can strongly diverge from performance in mixture, estimating the ability of a cultivar to be mixed with another crop is therefore of utmost importance. For this purpose, concepts of General and Specific Combining Ability in hybrid breeding (Griffing 1956) have been adapted to cultivar and crop mixtures. Thus, these effects are called General Mixing Ability (GMA) and Specific Mixing Ability (SMA) (Federer 1993). In contrast to intraspecific mixtures, interspecific mixed cropping experiments often provide additional information, since harvested lots can be separated into their different grain fractions. Until now, statistical developments mobilizing the additional information provided by separated harvest lots to estimate mixing abilities in intercropping experiments have been neglected. The concept of Producer- and Associate-effects (abbreviated Pr and As, respectively) describes interactions between varieties sown in alternate row trials (Forst 2018). The producer effect Pr is the average performance of a cultivar grown in mixture with other crop-species, whereas the associate effect As is the average effect of a cultivar on the performance of the mixing partner. We used the fraction yields of a spring-pea (Pisum sativum L.) and spring-barley (Hordeum vulgare L.) mixed cropping experiment to determine Pr and As effects of different pea genotypes. The additional information provided by this approach is biologically more informative than GMA/SMA estimates, since it better reflects competition and facilitation occurring between different cultivars of the two crop-species
Die Erbsensorte entscheidet über den Erfolg der Mischung
Sommer-Eiweisserbsen und zweizeilige Gerste sind gute Mischungspartner. Doch gewisse Sortenkombinationen sind
besser als andere. Das hat das FiBL in einem zweijährigen Versuch gezeigt
- …