16 research outputs found

    Apprivoiser l'hétérogénéité en informatique 1ère année

    Get PDF
    National audienceFace au constat d’une hétérogénéité grandissante des savoir-faire et connaissances en informatique des étudiants à l’arrivée en première année, et le risque de son exacerbation dans le contexte du « nouveau bac », nous avons voulu expérimenter une approche pédagogique, qui permette une gestion de cette hétérogénéité tout en respectant les contraintes d’un emploi du temps homogène et un coût constant. Les actions menées s’articulent autour de 4 pôles : la constitution de groupes de niveau, avec une attention particulière portée sur les 2 niveaux extrêmes (renforcement et avancé/en autonomie), la mise en place de QCMs réguliers, l’utilisation ponctuelle de l’Apprentissage Par Problème (APP), et un auto-positionnement. L’expérimentation est encore en cours, mais déjà de premiers éléments permettent d’ouvrir les échanges

    Mining episode rules in STULONG dataset

    No full text
    International audienc

    Scaling up semi-supervised learning: An efficient and effective LLGC variant

    No full text
    Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi- supervised learning tries to exploit this abundance of unlabeled training data to improve classification. Unfortunately most of the theoretically well-founded algorithms that have been described in recent years are cubic or worse in the total number of both labeled and unlabeled training examples. In this paper we apply modifications to the standard LLGC algorithm to improve efficiency to a point where we can handle datasets with hundreds of thousands of training data. The modifications are priming of the unlabeled data, and most importantly, sparsification of the similarity matrix. We report promising results on large text classification problems

    Feature Construction and δ-Free Sets in 0/1 Samples

    No full text
    International audienceGiven the recent breakthrough in constraint-based mining of local patterns, we decided to investigate its impact on feature construction for classification tasks. We discuss preliminary results concerning the use of the so-called δ-free sets. Our guess is that their minimality might help to collect important features. Once these sets are computed, we propose to select the essential ones w.r.t. class separation and generalization as new features. Our experiments have given encouraging results

    Feature Construction and δ-Free Sets in 0/1 Samples

    No full text
    International audienceGiven the recent breakthrough in constraint-based mining of local patterns, we decided to investigate its impact on feature construction for classification tasks. We discuss preliminary results concerning the use of the so-called δ-free sets. Our guess is that their minimality might help to collect important features. Once these sets are computed, we propose to select the essential ones w.r.t. class separation and generalization as new features. Our experiments have given encouraging results

    Data mining bread quality and process data in a plant bakery

    No full text
    International audienceIn modern automated plant bakeries a large amount of data is collected on the operation of the plant. When this data is combined with product quality data such as loaf colour, appearance, consumer complaints, sales data etc. it has the potential to be used to improve processing efficiency, final product quality, and product marketability. However the huge volume of this data means it is often ignored as being too hard to analyse in any meaningful way. Data mining, which is a combination of techniques that produces information from large data sets, has the potential to be applied to this data to extract useful information.This paper describes our experience of applying data mining techniques to a plant bakery in New Zealand. The process involved setting up the systems required to extract data from the bakeries SCADA system, setting up sensors to automatically measure and record quality parameters, cleaning the data to remove faulted or anomalous results and then combining all the separate data blocks into one complete database for analysis.Data were analysed at two levels. Firstly, selected data were analysed for simple trends on an individual loaf basis which served to identify variability caused by divider pockets, tin positioning etc. Secondly, data mining techniques such as various classifiers and principal components were applied to the whole data set to find relationships between process data and product quality

    Feature Construction and δ-Free Sets in 0/1 Samples

    No full text
    International audienceGiven the recent breakthrough in constraint-based mining of local patterns, we decided to investigate its impact on feature construction for classification tasks. We discuss preliminary results concerning the use of the so-called δ-free sets. Our guess is that their minimality might help to collect important features. Once these sets are computed, we propose to select the essential ones w.r.t. class separation and generalization as new features. Our experiments have given encouraging results

    Assessment of discretization techniques for relevant pattern discovery from gene expression data

    No full text
    In the domain of gene expression data analysis, various researchers have recently emphasized the promising application of pattern discovery techniques like association rule mining or formal concept extraction from boolean matrices that encode gene properties. To take the most from these approaches, a needed step concerns gene property encoding (e.g., over-expression) and its need for the discretization of raw gene expression data. The impact of this preprocessing step on both the quantity and the relevancy of the extracted patterns is crucial. In this paper, we study the impact of discretization parameters by a sound comparison between the dendrograms, i.e., trees that are generated by a hierarchical clustering algorithm, computed from raw expression data and from the various derived boolean matrices. Thanks to a new similarity measure and practical validation over several gene expression data sets, we propose a method that supports the choice of a discretization technique and its parameters for each specific data set. 1
    corecore