2,317 research outputs found

    Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology

    Get PDF
    This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors' names in the bibliograph

    Relating Water Quality and Age in Drinking Water Distribution Systems Using Self-Organising Maps

    Get PDF
    Understanding and managing water quality in drinking water distribution system is essential for public health and wellbeing, but is challenging due to the number and complexity of interacting physical, chemical and biological processes occurring within vast, deteriorating pipe networks. In this paper we explore the application of Self Organising Map techniques to derive such understanding from international data sets, demonstrating how multivariate, non-linear techniques can be used to identify relationships that are not discernible using univariate and/or linear analysis methods for drinking water quality. The paper reports on how various microbial parameters correlated with modelled water ages and were influenced by water temperatures in three drinking water distribution systems

    Benefits of InterSite Pre-Processing and Clustering Methods in E-Commerce Domain

    Get PDF
    This paper presents our preprocessing and clustering analysis on the clickstream dataset proposed for the ECMLPKDD 2005 Discovery Challenge. The main contributions of this article are double. First, after presenting the clickstream dataset, we show how we build a rich data warehouse based an advanced preprocesing. We take into account the intersite aspects in the given ecommerce domain, which offers an interesting data structuration. A preliminary statistical analysis based on time period clickstreams is given, emphasing the importance of intersite user visits in such a context. Secondly, we describe our crossed-clustering method which is applied on data generated from our data warehouse. Our preliminary results are interesting and promising illustrating the benefits of our WUM methods, even if more investigations are needed on the same dataset

    Cropping system sensitivity to climate change in the northern uplands of Lao PDR. An agroclimatic modeling approach

    Full text link
    In addition to the actual context of agrarian transition, subsistence agriculture in northern upland of Lao PDR will face climate change. The aim of this project was to describe the cropping systems in northern upland of Lao PDR and to assess their sensitivity to climate. To begin with, farmers were interviewed to identify the cultivated cultivars and their crop cycle dynamics. Field measurements and yields data analysis helped with intensification level determination. From the collected information on cultivars and cropping systems, a simple agroclimatic model, potential Yield Estimator (PYE), has been calibrated in order to simulate growth of 4 cultivars (1 glutinous rice cultivar, 2 maize cultivars and 1 job's tear cultivar) in potential and water-limited conditions. Then a virtual experiment has been set up to simulate the growth of these cultivars in cropping systems designed based on collected information. Several modes were tested for variable input parameters (runoff level, soil AWC and soil depth, sowing date). This virtual experiment, run for 16 years of historical weather data (1985-2000) and for 16 years of virtual weather data representing a possible evolution of climate in the future, led to an assessment of cropping system sensitivity considering several features. Cultivars potential yield has been analyzed regarding sowing date. Then the analysis of water-limited yield and its sensitivity to runoff and soil properties revealed an optimum sowing window for which water limited yield is close to potential yield and its interannual variability is low. Generally, water-limited yield is low sensitive to runoff but its sensitivity (average decrease in yield and interannual variability) to AWC and soil depth is increasing when sowing dates digresses from optimum sowing window. Climate change would decrease the potential yield but should not affect critically the relative water-limited yield and its variability due to soil and runoff properties. Drainage, another output of the model, is supposed to increase with climate change, which lead to a questioning regarding use of fertilizer to cope with fertility losses due to fallow-period shortenin

    Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation

    Full text link
    We propose in this paper an exploratory analysis algorithm for functional data. The method partitions a set of functions into KK clusters and represents each cluster by a simple prototype (e.g., piecewise constant). The total number of segments in the prototypes, PP, is chosen by the user and optimally distributed among the clusters via two dynamic programming algorithms. The practical relevance of the method is shown on two real world datasets

    Brote de gastroenteritis por agua potable de suministro público

    Get PDF
    ResumenIntroducciónLa potabilidad del agua induce a descartar el posible origen hídrico de los brotes. El objetivo fue investigar un brote de gastroenteritis por agua potable de suministro público.MétodosDespués de la notificación de un brote de gastroenteritis en el municipio de Baqueira (Valle de Arán) se diseñó un estudio epidemiológico de cohortes retrospectivo. Mediante un muestreo sistemático se eligió a 87 personas hospedadas en los hoteles y a 62 alojadas en diferentes apartamentos. Se recogió información sobre 4 factores (consumo de agua de la red, bocadillos, agua y alimentos en las pistas de esquí) y presencia de síntomas. Se determinó la existencia de cloro, se analizó el agua de la red y se realizó un coprocultivo a 4 enfermos. La implicación de cada factor se determinó con el riesgo relativo (RR) y su intervalo de confianza (IC) del 95%.ResultadosLa incidencia de gastroenteritis fue del 51,0% (76/149). Los porcentajes de los síntomas fueron los siguientes: fiebre, 27,0%; diarrea, 87,5%; náuseas, 50,7%; vómitos, 30,3%, y dolor abdominal, 80,0%. El único factor que presentó un riesgo estadísticamente significativo fue el consumo de agua de la red (RR = 11,0; IC del 95%, 1,6-74,7). La calificación sanitaria del agua fue de potabilidad. Se observó un defecto de situación del clorador en el depósito, que fue corregido. Se recomendó incrementar aún más las concentraciones de cloro, lo cual se acompañó de una disminución de los casos. Los coprocultivos de los 4 enfermos fueron negativos para las enterobacterias investigadas.ConclusionesEl estudio demuestra la posibilidad de presentación de brotes hídricos por agua cualificada como potable y sugiere la necesidad de mejorar la investigación microbiológica (determinación de protozoos y virus) en este tipo de brotes.AbstractIntroductionThe chlorination of public water supplies has led researchers to largely discard drinking water as a potential source of gastroenteritis outbreaks. The aim of this study was to investigate an outbreak of waterborne disease associated with drinking water from public supplies.MethodsA historical cohort study was carried out following notification of a gastroenteritis outbreak in Baqueira (Valle de Arán, Spain). We used systematic sampling to select 87 individuals staying at hotels and 67 staying in apartments in the target area.Information was gathered on four factors (consumption of water from the public water supply, sandwiches, water and food in the ski resorts) as well as on symptoms. We assessed residual chlorine in drinking water, analyzed samples of drinking water, and studied stool cultures from 4 patients. The risk associated with each water source and food type was assessed by means of relative risk (RR) and 95% confidence intervals (CI).ResultsThe overall attack rate was 51.0% (76/149). The main symptoms were diarrhea 87.5%, abdominal pain 80.0%, nausea 50.7%, vomiting 30.3%, and fever 27.0%. The only factor associated with a statistically significant risk of disease was consumption of drinking water (RR = 11.0; 95% CI, 1.6-74.7). No residual chlorine was detected in the drinking water, which was judged acceptable. A problem associated with the location of the chlorinator was observed and corrected. We also recommended an increase in chlorine levels, which was followed by a reduction in the number of cases. The results of stool cultures of the four patients were negative for enterobacteria.ConclusionsThis study highlights the potential importance of waterborne outbreaks of gastroenteritis transmitted through drinking water considered acceptable and suggests the need to improve microbiological research into these outbreaks (viruses and protozoa detection)

    AOC reduction by biologically active filtration

    Get PDF
    L'objectif de ce projet était de fournir un guide pratique de l'application des techniques de traitement biologique aux opérations de traitement actuel des eaux. Les études furent centrées sur la production d'une eau biologiquement équilibrée, sur la stabilité (l'équilibre) des désinfectants, et sur la formation moins importante de sous-produits désinfectants. Notamment, l'étude a montré que les procéssus biologiques peuvent satisfaire les besoins de la pratique aussi bien que les exigences régulatrices de l'industrie de l'eau.Le système de surveillance et de contrôle des niveaux du carbone organique assimilable (COA) des éffluents de la "Swimming River Treatment Plant" a montré que des données >100 µq/L pourraient expliquer d'une part, l'apparition des bactéries conformes dans le système de distribution et d'autre part la transgression potentielle des règlements récemment révisés de la "Limite Maximum de Contaminants de Coliformes" des Etats-Unis. L'optimum du traitement a été établi à 100 µg/L could be related to the occurrence of coliform bacteria in the distribution system. A treatment goal of <100 µg/L was established for biologically active treatment processes. Granular activated carbon (GAC) filters were found to support a larger bacterial population, and thus, provide better biological removal of AOC and total organic carton (TOC). All biologically active filters showed good performance relative to effluent turbidity levels, and headloss development. Preozonation of raw water increased AOC levels an average of 2.3 fold, and always increased filter effluent AOC levels relative to nonozonated water. Application of free chlorine to GAC filters did not inhibit biological activity. Application of chloramines to GAC filters showed a slight inhibitory affect relative to free chlorine. Effluent AOC levels averaged 82 µg/L at an EBCT of 5 min, and decreased to an average of 57 µg/L at 20 min EBCT. EBCT did affect TOC removals, with efficiencies averaging 29, 33, 42, and 51 % removal at EBCTs of 5, 10, 15 and 20 min, respectively. Trihalomethane formation potentials (THMFP) were related to TOC levels. Processes Chat decreased TOC levels also decreased THMFP. A preozonated GAC/sand filter (EBCT 10 min) achieved an annual average 54 % removal of THMFP precursors. Post disinfection of biologically treated effluents reduced HPC bacterial counts by 2-2.5log10. Post chlorination or chloramination of prechlorinated GAC/sand effluents resulted in a 20 %, or a 44 % (respectively) increase in AOC levels. Post disinfection of preozonated water resulted in small (<8%) AOC increases. Despite increases in AOC levels, prechlorinated water had lower AOC levels than preozonated water, even after post disinfection
    corecore