Search CORE

128 research outputs found

Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

Author: Celisse Alain
Mary-Huard Tristan
Publication venue
Publication date: 12/10/2017
Field of study

The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the

k

-nearest neighbors (

k

NN) rule in the context of binary classification. Here we focus on the leave-

p

-out cross-validation (L

p

O) used to assess the performance of the

k

NN classifier. Remarkably this L

p

O estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the L

p

O estimator applied to the

k

NN classifier. Such results are obtained first by exploiting the connection between the L

p

O estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L

1

O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the L

p

O estimator and the classification error/risk of the

k

NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments

arXiv.org e-Print Archive

HAL Descartes

ProdInra

Risk Bounds for Embedded Variable Selection in Classification Trees

Author: Gey Servane
Mary-Huard Tristan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

International audienceThe problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out. Simulation studies are performed which confirm that the hold-out procedure mimics the form of the proposed penalized criterion

HAL Descartes

Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

Author: Celisse Alain
Mary-Huard Tristan
Publication venue: HAL CCSD
Publication date: 15/08/2015
Field of study

The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the

k

-nearest neighbors (

k

NN) rule in the context of binary classification. Here we focus on the leave-

p

-out cross-validation (L

p

O) used to assess the performance of the

k

NN classifier. Remarkably this L

p

p

O estimator applied to the

k

NN classifier. Such results are obtained first by exploiting the connection between the L

p

O estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L

1

O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the L

p

O estimator and the classification error/risk of the

k

NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments

INRIA a CCSD electronic archive server

HAL Descartes

Spotting effect in microarray experiments

Author: Bitton Frédérique
Cabannes Eric
Daudin Jean-Jacques
Hilson Pierre
Mary-Huard Tristan
Robin Stéphane
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Microarray data must be normalized because they suffer from multiple biases. We have identified a source of spatial experimental variability that significantly affects data obtained with Cy3/Cy5 spotted glass arrays. It yields a periodic pattern altering both signal (Cy3/Cy5 ratio) and intensity across the array. RESULTS: Using the variogram, a geostatistical tool, we characterized the observed variability, called here the spotting effect because it most probably arises during steps in the array printing procedure. CONCLUSIONS: The spotting effect is not appropriately corrected by current normalization methods, even by those addressing spatial variability. Importantly, the spotting effect may alter differential and clustering analysis

HAL Evry

Springer - Publisher Connector

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central

ProdInra

New breeding strategies for mixed cropping in a barley (H. vulgare L.) pea (P. sativum L.) model system

Author: Enjalbert Jérôme
Forst Emma
Goldringer Isabelle
Haug Benedikt
Hohmann Pierre
Mary-Huard Tristan
Messmer Monika
Publication venue
Publication date: 06/11/2019
Field of study

Crop mixtures consisting of cereals and legumes have proven as a well-adapted arrangement due to their complementarity towards important resources, especially nitrogen. Crop mixtures combine high yield performance and yield stability. They can contribute to a diversified cropping landscape and adaptation to climate change. The search for alternatives to protein imports from overseas and investments in post-harvest separation technologies are currently fostering their adoption by farmers in Western-Europe, especially under organic and lowinput farming conditions. However, screening and breeding for mixed cropping has hardly been explored for arable crops. Thus, the objective was to develop novel breeding strategies and tools specifically for mixed cropping systems. We tested mixtures and pure stands of a morphologically diverse panel of 32 spring pea (Pisum sativum L.) and eight spring barley (Hordeum vulgare L.) cultivars in replicated field trials at two locations in Switzerland over two years with pea as the focal species. In an incomplete factorial design (Fig. 1) we determined general and specific mixing ability (GMA and SMA, respectively) of pea and barley in analogy to GCA and SCA (general and specific combining ability) in hybrid breeding. Key traits, such as early vigour, canopy height and leaf morphology parameters were measured, due to their potential use as covariates or indirect selection criteria for mixing ability. Our results show that total yield of mixtures can only partly be explained by pea pure stand yields (R² = 0.35), making the latter a weak predictor for mixture yield. Pea GMA variance was predominant over SMA variance which underlines the potential for breeding for mixing ability using a tester. Key traits, such as pea stipule area were correlated (R² = 0.56) with total mixture yield and merit further investigation as indirect selection criteria. The separated yield fractions of pea and barley in mixtures allow to decompose GMA of pea into the producer effect of pea cultivar on pea fraction yield and the associate effect of pea on barley fraction yield. This novel concept allows to elucidate key trait effects on fraction yields of pea and barley which might otherwise be masked when solely using a GMA approach

Organic Eprints

Development of genetic models to breed for mixed cropping systems

Author: Enjalbert Jérôme
Forst Emma
Goldringer Isabelle
Haug Benedikt
Hohmann Pierre
Mary-Huard Tristan
Messmer Monika
Publication venue
Publication date: 19/11/2019
Field of study

Introduction Mixed cropping, i.e. mixing different crops in the same field, provides agronomic advantages as increased productivity under low inputs conditions (e.g. for organic farming: Bedoussac et al. 2015) and higher yield stability (Raseduzzaman and Jensen 2017). In mixed cropping, choosing the right cultivars is critical for the performance of the mixture, as shown for pea-barley mixtures (Hauggaard-Nielsen and Jensen 2001) and maize-bean mixtures (Hoppe 2016). As performance in pure stand can strongly diverge from performance in mixture, estimating the ability of a cultivar to be mixed with another crop is therefore of utmost importance. For this purpose, concepts of General and Specific Combining Ability in hybrid breeding (Griffing 1956) have been adapted to cultivar and crop mixtures. Thus, these effects are called General Mixing Ability (GMA) and Specific Mixing Ability (SMA) (Federer 1993). In contrast to intraspecific mixtures, interspecific mixed cropping experiments often provide additional information, since harvested lots can be separated into their different grain fractions. Until now, statistical developments mobilizing the additional information provided by separated harvest lots to estimate mixing abilities in intercropping experiments have been neglected. The concept of Producer- and Associate-effects (abbreviated Pr and As, respectively) describes interactions between varieties sown in alternate row trials (Forst 2018). The producer effect Pr is the average performance of a cultivar grown in mixture with other crop-species, whereas the associate effect As is the average effect of a cultivar on the performance of the mixing partner. We used the fraction yields of a spring-pea (Pisum sativum L.) and spring-barley (Hordeum vulgare L.) mixed cropping experiment to determine Pr and As effects of different pea genotypes. The additional information provided by this approach is biologically more informative than GMA/SMA estimates, since it better reflects competition and facilitation occurring between different cultivars of the two crop-species

Organic Eprints

Die Erbsensorte entscheidet über den Erfolg der Mischung

Author: Enjalbert Jérôme
Forst Emma
Frick Claudia
Goldringer Isabelle
Haug Benedikt
Hohmann Pierre
Mary-Huard Tristan
Messmer Monika
Publication venue
Publication date: 01/01/2020
Field of study

Sommer-Eiweisserbsen und zweizeilige Gerste sind gute Mischungspartner. Doch gewisse Sortenkombinationen sind besser als andere. Das hat das FiBL in einem zweijährigen Versuch gezeigt

Organic Eprints

Biases induced by pooling samples in microarray experiments

Author: Annibale Biggeri
Avner Bar-Hen
Jean-Jacques Daudin
Michela Baccini
Tristan Mary-Huard
Publication venue
Publication date: 01/01/2007
Field of study

CiteSeerX