Search CORE

16 research outputs found

Apprivoiser l'hétérogénéité en informatique 1ère année

Author: Bennani Nadia
Cazalens Sylvie
Cheutet Vincent
Leschi Claire
Merveille Odyssée
Moriot Camille
Muller Delphine
Pecatte Timothée
Pothier Catherine
Rigotti Christophe
Rivano Hervé
Stouls Nicolas
Publication venue: HAL CCSD
Publication date: 20/05/2021
Field of study

National audienceFace au constat d’une hétérogénéité grandissante des savoir-faire et connaissances en informatique des étudiants à l’arrivée en première année, et le risque de son exacerbation dans le contexte du « nouveau bac », nous avons voulu expérimenter une approche pédagogique, qui permette une gestion de cette hétérogénéité tout en respectant les contraintes d’un emploi du temps homogène et un coût constant. Les actions menées s’articulent autour de 4 pôles : la constitution de groupes de niveau, avec une attention particulière portée sur les 2 niveaux extrêmes (renforcement et avancé/en autonomie), la mise en place de QCMs réguliers, l’utilisation ponctuelle de l’Apprentissage Par Problème (APP), et un auto-positionnement. L’expérimentation est encore en cours, mais déjà de premiers éléments permettent d’ouvrir les échanges

INRIA a CCSD electronic archive server

Mining episode rules in STULONG dataset

Author: Leschi Claire
Lucas Noel
Rigotti Christophe
Publication venue: HAL CCSD
Publication date: 20/09/2004
Field of study

International audienc

HAL

Hal-Diderot

Scaling up semi-supervised learning: An efficient and effective LLGC variant

Author: Leschi Claire
Pfahringer Bernhard
Reutemann Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi- supervised learning tries to exploit this abundance of unlabeled training data to improve classification. Unfortunately most of the theoretically well-founded algorithms that have been described in recent years are cubic or worse in the total number of both labeled and unlabeled training examples. In this paper we apply modifications to the standard LLGC algorithm to improve efficiency to a point where we can handle datasets with hundreds of thousands of training data. The modifications are priming of the unlabeled data, and most importantly, sparsification of the similarity matrix. We report promising results on large text classification problems

CiteSeerX

Research Commons@Waikato

HAL

Hal-Diderot

Feature Construction and δ-Free Sets in 0/1 Samples

Author: Boulicaut Jean-François
Gay Dominique
Leschi Claire
Selmaoui-Folcher Nazha
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 07/10/2016
Field of study

International audienceGiven the recent breakthrough in constraint-based mining of local patterns, we decided to investigate its impact on feature construction for classification tasks. We discuss preliminary results concerning the use of the so-called δ-free sets. Our guess is that their minimality might help to collect important features. Once these sets are computed, we propose to select the essential ones w.r.t. class separation and generalization as new features. Our experiments have given encouraging results

HAL

Feature Construction and δ-Free Sets in 0/1 Samples

Author: Boulicaut Jean-François
Gay Dominique
Leschi Claire
Selmaoui-Folcher Nazha
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 07/10/2016
Field of study

Hal-Diderot

Data mining bread quality and process data in a plant bakery

Author: Leschi Claire
Morgenstern M. P.
Pfarhinger B.
Wilson A. J.
Publication venue: Elsevier
Publication date: 24/05/2004
Field of study

International audienceIn modern automated plant bakeries a large amount of data is collected on the operation of the plant. When this data is combined with product quality data such as loaf colour, appearance, consumer complaints, sales data etc. it has the potential to be used to improve processing efficiency, final product quality, and product marketability. However the huge volume of this data means it is often ignored as being too hard to analyse in any meaningful way. Data mining, which is a combination of techniques that produces information from large data sets, has the potential to be applied to this data to extract useful information.This paper describes our experience of applying data mining techniques to a plant bakery in New Zealand. The process involved setting up the systems required to extract data from the bakeries SCADA system, setting up sensors to automatically measure and record quality parameters, cleaning the data to remove faulted or anomalous results and then combining all the separate data blocks into one complete database for analysis.Data were analysed at two levels. Firstly, selected data were analysed for simple trends on an individual loaf basis which served to identify variability caused by divider pockets, tin positioning etc. Secondly, data mining techniques such as various classifiers and principal components were applied to the whole data set to find relationships between process data and product quality

HAL

Hal-Diderot

Feature Construction and δ-Free Sets in 0/1 Samples

Author: Boulicaut Jean-François
Gay Dominique
Leschi Claire
Selmaoui-Folcher Nazha
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 07/10/2016
Field of study

Hal-Diderot

Assessment of discretization techniques for relevant pattern discovery from gene expression data

Author: Claire Leschi
Jean-françois Boulicaut
Jérémy Besson
Ruggero G. Pensa
Publication venue: In Press
Publication date: 01/01/2004
Field of study

In the domain of gene expression data analysis, various researchers have recently emphasized the promising application of pattern discovery techniques like association rule mining or formal concept extraction from boolean matrices that encode gene properties. To take the most from these approaches, a needed step concerns gene property encoding (e.g., over-expression) and its need for the discretization of raw gene expression data. The impact of this preprocessing step on both the quantity and the relevancy of the extracted patterns is crucial. In this paper, we study the impact of discretization parameters by a sound comparison between the dendrograms, i.e., trees that are generated by a hierarchical clustering algorithm, computed from raw expression data and from the various derived boolean matrices. Thanks to a new similarity measure and practical validation over several gene expression data sets, we propose a method that supports the choice of a discretization technique and its parameters for each specific data set. 1

CiteSeerX

HAL

Hal-Diderot

Concept of temporal pretopology for the analysis for structural changes. Application to econometrics

Author: Gorohouna Samuel
Leschi Claire
Ris Catherine
Roi Laisa
Selmaoui-Folcher Nazha
Tokotoko Jannaï
Publication venue: IGI Global
Publication date: 01/01/2022
Field of study

International audienc

HAL

Hal-Diderot