    The \u3cem\u3eX\u3c/em\u3e-Alter Algorithm: A Parameter-Free Method of Unsupervised Clustering

    Using quantization techniques, Laloë (2010) defined a new clustering algorithm called Alter. This L1-based algorithm is shown to be convergent but suffers two major flaws. The number of clusters, K, must be supplied by the user and the computational cost is high. This article adapts the X-means algorithm (Pelleg & Moore, 2000) to solve both problems

    Machine learning models based on molecular descriptors to predict human and environmental toxicological factors in continental freshwater

    It is a real challenge for life cycle assessment practitioners to identify all relevant substances contributing to the ecotoxicity. Once this identification has been made, the lack of corresponding ecotoxicity factors can make the results partial and difficult to interpret. So, it is a real and important challenge to provide ecotoxicity factors for a wide range of compounds. Nevertheless, obtaining such factors using experiments is tedious, time-consuming, and made at a high cost. A modeling method that could predict these factors from easy-to-obtain information on each chemical would be of great value. Here, we present such a method, based on machine learning algorithms, that used molecular descriptors to predict two specific endpoints in continental freshwater for ecotoxicological and human impacts. The method shows good performances on a learning database. Then, predictions were derived from the validated model for compounds with missing toxicity/ecotoxicity factors

    Categorizing chlordecone potential degradation products to explore their environmental fate

    EA BIOmE SUPDAT INRAInternational audienceChlordecone (C10Cl10O; CAS number 143-50-0) has been used extensively as an organochlorine insecticide but is nowadays banned and listed on annex A in The Stockholm Convention on Persistent Organic Pollutants (POPs). Although experimental evidences of biodegradation of this compound are scarce, several dechlorination products have been proposed by Dolfing et al. (2012) using Gibbs free energy calculations to explore different potential transformation routes. We here present the results of an in silico classification (TyPol - Typology of Pollutants) of chlordecone transformation products (TPs) based on statistical analyses combining several environmental endpoints and structural molecular descriptors. Starting from the list of putative chlordecone TPs and considering available data on degradation routes of other organochlorine compounds, we used different clustering strategies to explore the potential environmental behaviour of putative chlordecone TPs from the knowledge on their molecular descriptors. The method offers the possibility to focus on TPs present in different classes and to infer their environmental fate. Thus, we have deduced some hypothetical trends for the environmental behaviour of TPs of chlordecone assuming that TPs, which were clustered away from chlordecone, would have different environmental fate and ecotoxicological impact compared to chlordecone. Our findings suggest that mono- and di-hydrochlordecone, which are TPs of chlordecone often found in contaminated soils, may have similar environmental behaviour in terms of persistence

    Identification and characterization of tebuconazole transformation products in soil by combining suspect screening and molecular typology

    International audienceOnce released into the environment, pesticides generate transformation products (TPs) which may be of (eco-)toxicological importance. Past studies have demonstrated the difficulty to predict pesticide TP occurrence and their environmental risk by monitoring-driven approaches mostly used in current regulatory frameworks targeting only known toxicologically relevant TPs. We present a novel combined approach which identifies and categorizes known and unknown pesticide TPs in soil by combining suspect screening time-of-flight mass spectrometry with in silico molecular typology. This approach applies an empirical and theoretical pesticide TP library for compound identification by both non-target and target time-of-flight (tandem) mass spectrometry and structural elucidation through a molecular structure correlation program. In silico molecular typology was then used to group the detected TPs according to common molecular descriptors and to indirectly elucidate their environmental properties by analogy to known pesticide compounds having similar molecular descriptors. This approach was evaluated via the identification of TPs of the triazole fungicide tebuconazole occurring in a field dissipation study. Overall, 22 empirical and 12 yet unknown TPs were detected and categorized into three groups with defined environmental properties. This approach combining suspect screening time-of-flight mass spectrometry with molecular typology could be extended to other organic pollutants and used to rationalize the choice of TPs to be intensively studied towards a more comprehensive environmental risk assessment scheme

    National audienc

    Local regularity estimation

    L'objectif de cette thÚse est d'étudier le comportement local d'une mesure de probabilité, notamment à l'aide d'un indice de régularité locale. Dans la premiÚre partie, nous établissons la normalité asymptotique de l'estimateur des kn plus proches voisins de la densité. Dans la deuxiÚme, nous définissons un estimateur du mode sous des hypothÚses affaiblies. Nous montrons que l'indice de régularité intervient dans ces deux problÚmes. Enfin, nous construisons dans une troisiÚme partie différents estimateurs pour l'indice de régularité à partir d'estimateurs de la fonction de répartition, dont nous réalisons une revue bibliographique.The goal of this thesis is to study the local behavior of a probability measure, using a local regularity index. In the first part, we establish the asymptotic normality of the nearest neighbor density estimate. In the second, we define a mode estimator under weakened hypothesis. We show that the regularity index interferes in this two problems. Finally, we construct in a third part various estimators of the regularity index from estimators of the distribution function, which we achieve a review

    National audienceL'estimation de la fonction de repartition d'une variable alĂ©atoire est un volet important de l'estimation non parametrique. De nombreuses mĂ©thodes ont Ă©tĂ© proposĂ©es et Ă©tudiĂ©es aïŹn de modiïŹer efïŹcacement l'outil brut qu'est la fonction de repartition empirique. Dans cet article, nous effectuons un point bibliographique sur les differentes mĂ©thodes envisagĂ©ees dans le cas de variables alĂ©atoires rĂ©elles

    The goal of this thesis is to study the local behavior of a probability measure, using a local regularity index. In rst part, we establish the asymptotic normality of the nearest neighbor density estimate and of the histogram. In the second one, we de- ne a mode estimator under weakened hypothesis. We show that the regularity index interferes in this two problems. Finally, we construct in third part various estimators of the regularity index. The thesis ends with a review on distribution function estimators.L'objectif de cette thÚse est d'étudier le comportement local d'une mesure de probabilité, notamment au travers d'un indice de régularité locale. Dans la premiÚre partie, nous établissons la normalité asymptotique de l'estimateur des kn plus proches voisins de la densité et de l'histogramme. Dans la deuxiÚme, nous définissons un estimateur du mode sous des hypothÚses affaiblies. Nous montrons que l'indice de régularité intervient dans ces deux problÚmes. Enfin, nous construisons dans une troisiÚme partie différents estimateurs pour l'indice de régularité à partir d'estimateurs de la fonction de répartition, dont nous réalisons une revue bibliographique

    From non-parametric estimation to biostatistics

