35 research outputs found

    A Complex Mining Process about Air Quality

    Get PDF
    http://www.atlantis-press.com/php/download_paper.php?id=9649International audienceIn this paper we present a mining project about extracting knowledge from public documents concerning air pollution. Our collection contains annual reports about air quality, acid rains, climatological conditions in the large area of Mexico City. These reports contain reliable data and are generated by the Department of Environment, they are in a printable format (.pdf file) with number of pages, table of content, textual information, numerical information in tables, images. For a human being it is impossible to read the whole collection during a relatively short period (a few days or weeks) and understand the content of them. An automatic box of tools able to extract knowledge, to quick retrieve important term, to answer some exact questions about precise climate parameters would be an important help for lecturers. We will describe our project based upon a text and data mining process; the aims of the complex process are extract frequent temporal pattern, to extract association rules, to integrate also some information retrieval simple tools. In parallel, some data mining techniques will be used to detect the same types of data presented in every report and then to extract a numerical datamart containing climatological data structured by month, year, geographical area. The datamart will be analyzed also. The main steps of our mining process are: preparing documents (cleaning, removing images, table of contents, footnotes), transforming in structured document (in a XML format with a precise DTD), indexing, various algorithms and methods of mining, visualising results and validating knowledge. We think also that our methodology will concern also other collections of the same category : reliable data and informations presented in huge periodical reports

    UJM at CLEF in Author Verification based on optimized classification trees

    Get PDF
    http://ceur-ws.org/Vol-1180/CLEF2014wn-Pan-FreryEt2014.pdfInternational audienceThis article describes our proposal for the Author Identification task in the PAN CLEF Challenge 2014. We have adopted a machine learning ap- proach based on several representations of the texts and on optimized decision trees which have as entry various attributes and which are learned for every train- ing corpus separately for this classification task. Our method ranked us at the 2nd place with an overall AUC of 70.7%, and C@1 of 68.4% and, between the 1st and the 6th place on the six corpora

    Utilisation de la langue naturelle pour l'interrogation de documents structurés

    Get PDF
    http://www.asso-aria.org/coria/2005/19.pdfInternational audienceLe langage de requête est l'indispensable interface entre l'utilisateur et l'outil de recherche. Simplifié au maximum dans les cas où les moteurs indexent essentiellement des documents plats, il devient fort complexe lorsqu'il s'adresse à des documents structurés et qu'il s'a git de définir des contraintes portant à la fois sur la structure et le contenu. L'approche ici- décrite propose d'utiliser la langue naturelle comme interface pour exprimer de telles requêtes. L'article décrit dans un premier temps les différentes phases qui permettent de transformer (dans un cadre de recherche d'information) la requête en langage naturel en une représentation sémantique indépendante du contexte. Des règles de simplification adaptées à la structure et au domaine du corpus sont ensuite appliquées, permettant d'obtenir une forme finale, adaptée à une conversion ver s un langage de requête formel. L'article décrit enfin les expérimentations effectuées et tir e les premières conclusions sur divers aspects de cette approche

    Specification Design for an XML Mining Configurable Application

    Get PDF
    http://www.iaeng.org/publication/IMECS2011/IMECS2011_pp378-381.pdfInternational audienceThis work presents a methodology for XML mining centered on the process of extracting document features related to structure and content. This information is used to obtain similarity measures and to cluster XML documents. A conceptual framework is proposed to design an application with the primary goal of implementing a modular and easily configurable tool for mining large XML document collections

    Web - évolution pour le meilleur

    No full text
    International audienc

    Introduccion a la Programmacion

    No full text
    International audienc

    Big Data : Panorama General

    No full text
    International audienc

    Análisis Multivariado y Establecimiento de una Tipología de Especies Arbóreas mediante la categorización de Modalidades

    No full text
    International audienceConsiderando los amplios beneficios de los árboles en el espacio urbano, es importante considerar dentro de las políticas de plantación y programas de mantenimiento, guías para definir que especies son más adecuadas para elegir el árbol más adecuado para el sitio de plantación más adecuado. En este trabajo se realiza un estudio exploratorio multivariado para establecer una tipología de especies arbóreas, considerando un número de características relacionadas con el ambiente: tolerancia a la salinidad del suelo, tolerancia a la temperatura, tolerancia a la sequía, tolerancia al maltrato; así como la tolerancia o resistencia a diferentes niveles de contaminación. Dadas las características categóricas de la información disponible, con modalidades ausentes, se propone el uso del Análisis de Correspondencias Múltiples (ACM) para su estimación. La aplicación del ACM con las modalidades estimadas, la Clasificación Jerárquica Acumulada, y l categorización de sus modalidades, sugieren una tipología de nueve grupos para las 134 especies descritas, aunque se deja abierta la posibilidad de integrar otras características y criterios de validación de expertos en arboricultura urbana

    Exploring Urban Tree Site Planting Selection in Mexico City through Association Rules

    No full text
    International audienceIn this paper we present an exploration of association rules determine planting sites considering urban tree’s characteristics. In first step itemsets and rules are generated using the unsupervised algorithm Apriori. They are rapidly characterized in terms of tree planting sites. In a second step planting sites are fixed as target values to establish rules (a supervised version of the a priori algorithm). An original approach is also presented and validated for the prediction of the planting site of the species
    corecore