Search CORE

35 research outputs found

A Complex Mining Process about Air Quality

Author: Brambila Silvia González
Juganaru-Mathieu Mihaela
Publication venue: 'Atlantis Press'
Publication date: 01/01/2013
Field of study

http://www.atlantis-press.com/php/download_paper.php?id=9649International audienceIn this paper we present a mining project about extracting knowledge from public documents concerning air pollution. Our collection contains annual reports about air quality, acid rains, climatological conditions in the large area of Mexico City. These reports contain reliable data and are generated by the Department of Environment, they are in a printable format (.pdf file) with number of pages, table of content, textual information, numerical information in tables, images. For a human being it is impossible to read the whole collection during a relatively short period (a few days or weeks) and understand the content of them. An automatic box of tools able to extract knowledge, to quick retrieve important term, to answer some exact questions about precise climate parameters would be an important help for lecturers. We will describe our project based upon a text and data mining process; the aims of the complex process are extract frequent temporal pattern, to extract association rules, to integrate also some information retrieval simple tools. In parallel, some data mining techniques will be used to detect the same types of data presented in every report and then to extract a numerical datamart containing climatological data structured by month, year, geographical area. The datamart will be analyzed also. The main steps of our mining process are: preparing documents (cleaning, removing images, table of contents, footnotes), transforming in structured document (in a XML format with a precise DTD), indexing, various algorithms and methods of mining, visualising results and validating knowledge. We think also that our methodology will concern also other collections of the same category : reliable data and informations presented in huge periodical reports

Crossref

HAL Descartes

HAL-EMSE

UJM at CLEF in Author Verification based on optimized classification trees

Author: Frery Jordan
Juganaru-Mathieu Mihaela
Largeron Christine
Publication venue: HAL CCSD
Publication date: 15/09/2014
Field of study

http://ceur-ws.org/Vol-1180/CLEF2014wn-Pan-FreryEt2014.pdfInternational audienceThis article describes our proposal for the Author Identification task in the PAN CLEF Challenge 2014. We have adopted a machine learning ap- proach based on several representations of the texts and on optimized decision trees which have as entry various attributes and which are learned for every train- ing corpus separately for this classification task. Our method ranked us at the 2nd place with an overall AUC of 70.7%, and C@1 of 68.4% and, between the 1st and the 6th place on the six corpora

HAL-UJM

HAL-EMSE

Utilisation de la langue naturelle pour l'interrogation de documents structurés

Author: Girardot Jean-Jacques
Juganaru-Mathieu Mihaela
Tannier Xavier
Publication venue: HAL CCSD
Publication date: 09/03/2005
Field of study

http://www.asso-aria.org/coria/2005/19.pdfInternational audienceLe langage de requête est l'indispensable interface entre l'utilisateur et l'outil de recherche. Simplifié au maximum dans les cas où les moteurs indexent essentiellement des documents plats, il devient fort complexe lorsqu'il s'adresse à des documents structurés et qu'il s'a git de définir des contraintes portant à la fois sur la structure et le contenu. L'approche ici- décrite propose d'utiliser la langue naturelle comme interface pour exprimer de telles requêtes. L'article décrit dans un premier temps les différentes phases qui permettent de transformer (dans un cadre de recherche d'information) la requête en langage naturel en une représentation sémantique indépendante du contexte. Des règles de simplification adaptées à la structure et au domaine du corpus sont ensuite appliquées, permettant d'obtenir une forme finale, adaptée à une conversion ver s un langage de requête formel. L'article décrit enfin les expérimentations effectuées et tir e les premières conclusions sur divers aspects de cette approche

HAL-uB

HAL - Université de Franche-Comté

HAL-EMSE

Specification Design for an XML Mining Configurable Application

Author: Galindo Duran Cristal Karina
Juganaru-Mathieu Mihaela
Vazquez Hector Javier
Publication venue: HAL CCSD
Publication date: 16/03/2011
Field of study

http://www.iaeng.org/publication/IMECS2011/IMECS2011_pp378-381.pdfInternational audienceThis work presents a methodology for XML mining centered on the process of extracting document features related to structure and content. This information is used to obtain similarity measures and to cluster XML documents. A conceptual framework is proposed to design an application with the primary goal of implementing a modular and easily configurable tool for mining large XML document collections

HAL Descartes

Hal-Diderot

HAL-EMSE

Web - évolution pour le meilleur

Author: Juganaru-Mathieu Mihaela
Publication venue: HAL CCSD
Publication date: 05/02/2012
Field of study

International audienc

HAL-EMSE

Introduccion a la Programmacion

Author: Juganaru-Mathieu Mihaela
Publication venue: Editorial Patria
Publication date: 01/02/2012
Field of study

International audienc

HAL-EMSE

Big Data : Panorama General

Author: Brambila Silvia González
Juganaru-Mathieu Mihaela
Publication venue: HAL CCSD
Publication date: 21/11/2013
Field of study

International audienc

HAL Descartes

HAL-EMSE

Análisis Multivariado y Establecimiento de una Tipología de Especies Arbóreas mediante la categorización de Modalidades

Author: Javier Vazquez Hector
Juganaru-Mathieu Mihaela
Publication venue: HAL CCSD
Publication date: 21/02/2022
Field of study

International audienceConsiderando los amplios beneficios de los árboles en el espacio urbano, es importante considerar dentro de las políticas de plantación y programas de mantenimiento, guías para definir que especies son más adecuadas para elegir el árbol más adecuado para el sitio de plantación más adecuado. En este trabajo se realiza un estudio exploratorio multivariado para establecer una tipología de especies arbóreas, considerando un número de características relacionadas con el ambiente: tolerancia a la salinidad del suelo, tolerancia a la temperatura, tolerancia a la sequía, tolerancia al maltrato; así como la tolerancia o resistencia a diferentes niveles de contaminación. Dadas las características categóricas de la información disponible, con modalidades ausentes, se propone el uso del Análisis de Correspondencias Múltiples (ACM) para su estimación. La aplicación del ACM con las modalidades estimadas, la Clasificación Jerárquica Acumulada, y l categorización de sus modalidades, sugieren una tipología de nueve grupos para las 134 especies descritas, aunque se deja abierta la posibilidad de integrar otras características y criterios de validación de expertos en arboricultura urbana

HAL-EMSE

Exploring Urban Tree Site Planting Selection in Mexico City through Association Rules

Author: Javier Vazquez Hector
Juganaru-Mathieu Mihaela
Publication venue: 'Scitepress'
Publication date: 01/01/2016
Field of study

International audienceIn this paper we present an exploration of association rules determine planting sites considering urban tree’s characteristics. In first step itemsets and rules are generated using the unsupervised algorithm Apriori. They are rapidly characterized in terms of tree planting sites. In a second step planting sites are fixed as target values to establish rules (a supervised version of the a priori algorithm). An original approach is also presented and validated for the prediction of the planting site of the species

HAL-UJM

Crossref

HAL-EMSE