11 research outputs found

    Discretization of Continuous Attributes

    No full text
    7 pagesIn the data mining field, many learning methods -like association rules, Bayesian networks, induction rules (Grzymala-Busse & Stefanowski, 2001)- can handle only discrete attributes. Therefore, before the machine learning process, it is necessary to re-encode each continuous attribute in a discrete attribute constituted by a set of intervals, for example the age attribute can be transformed in two discrete values representing two intervals: less than 18 (a minor) and 18 and more (of age). This process, known as discretization, is an essential task of the data preprocessing, not only because some learning methods do not handle continuous attributes, but also for other important reasons: the data transformed in a set of intervals are more cognitively relevant for a human interpretation (Liu, Hussain, Tan & Dash, 2002); the computation process goes faster with a reduced level of data, particularly when some attributes are suppressed from the representation space of the learning problem if it is impossible to find a relevant cut (Mittal & Cheong, 2002); the discretization can provide non-linear relations -e.g., the infants and the elderly people are more sensitive to illness

    Construction d'arbres de décision par optimisation

    No full text
    Série RIA-ECA, N° spécial Méthodes d'optimisation pour l'ECA RSTI02rlNational audienceL'apprentissage par arbres se prête mal au traitement des très grandes bases de données dans la mesure où il nécessite à chaque noeud de scanner à nouveau l'ensemble de la base. Dans cet article, nous proposons de limiter l'espace de recherche des arbres à celui des arbres par niveau où les différents noeuds de chaque niveau sont segmentés par la même variable. Une telle simplification permet de transformer le problème d'apprentissage en un problème d'optimisation qui autorise une stratégie gloutonne ne nécessitant qu'une seule passe sur les données. Dans le cas de données booléennes, le critère d'optimisation retenu est celui de la maximisation du coefficient de détermination R2 , alors que dans le cas de données catégorielles multivaluées, on se ramène au cas booléen sans altérer la complexité de l'algorithme en raisonnant dans l'espace des paires d'individus décrites par les indicatrices de co-étiquetage. Notre stratégie d'optimisation pénalise la profondeur de l'arbre par le recours à la correction du R2 . Les expérimentations ont montré que la précision en généralisation des arbres par niveau n'est pas détériorée par rapport aux arbres usuels

    Can automatically extracted rhythmic units discriminate among languages

    No full text
    This paper deals with rhythmic modeling and its application to language identification. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, but significant problems are unresolved for its modeling. In this paper, an algorithm dedicated to rhythmic segmentation is described. Experiments are performed on read speech for 5 European languages. Several algorithms are compared. They show that salient features may be automatically extracted and efficiently modeled from the raw signal: a linear discriminant analysis of the extracted features results in a 80 % percent of correct language identification for the 5 languages, using 20 s duration utterances. Additional experiments reveal that the automatic rhythmic units convey also speaker specific features. 1

    Meta Modeling for Combinatorial Catalyst Optimization

    No full text
    International audienceOur aim is to find the best catalyst, the best combination of compounds, in order to optimize a chemical reaction. The chemists use mainly a heuristic algorithm, especially an evolutionary algorithm, to achieve the best combination. In this paper, we outline a variant of evolutionary optimization algorithm, says meta modeling. Our idea is to combine a statistical learning algorithm with the optimization process. The goal is a better use of the past experience, the labelled individuals, in the guidance of the search exploration of the optimal solution. The approach is especially useful in the combinatorial catalysis optimization because the fitness function is unknown and the labelled individual is obtained by real chemical reaction. This is highly costly and takes time. We show on a well-known chemists' benchmark that our process slightly the average performance of the standard evolutionary algorithms. But numerous problems remain opened. We try to inventory them in order to define our future work to improve the approach
    corecore