5,303 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    An interval fuzzy model for magnetic monitoring: estimation of a pollution index

    Get PDF
    In this contribution, a methodology is reported in order to build an interval fuzzy model for the pollution index PLI (a composite index using relevant heavy metal concentration) with magnetic parameters as input variables. In general, modelling based on fuzzy set theory is designed to mimic how the human brain tends to classify imprecise information or data. The ??interval fuzzy model?? reported here, based on fuzzy logic and arithmetic of fuzzy numbers, calculates an ??estimation interval?? and seems to be an adequate mathematical tool for this nonlinear problem. For this model, fuzzy c-means clustering is used to partition data, hence the membership functions and rules are built. In addition, interval arithmetic is used to obtain the fuzzy intervals. The studied sets are different examples of pollution by different anthropogenic sources, in two different study areas: (a) soil samples collected in Antarctica and (b) road-deposited sediments collected in Argentina. The datasets comprise magnetic and chemical variables, and for both cases, relevant variables were selected: magnetic concentration-dependent variables, magnetic features-dependent variables and one chemical variable. The model output gives an estimation interval; its width depends on the data density, for the measured values. The results show not only satisfactory agreement between the estimation interval and data, but also provide valued information from the rules analysis that allows understanding the magnetic behaviour of the studied variables under different conditions.Fil: Chaparro, Mauro Alejandro Eduardo. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto Multidisciplinario de Ecosistemas y Desarrollo Sustentable; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil; ArgentinaFil: Chaparro, Marcos Adrián Eduardo. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Física Arroyo Seco; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil; ArgentinaFil: Sinito, Ana Maria. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Física Arroyo Seco; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil; Argentin

    Modelling Execution Tracing Quality by Means of Type-1 Fuzzy Logic

    Get PDF
    CCIExecution tracing quality is a crucial characteristic which contributes to the overall software product quality though the present quality frameworks neglect this property. In the scope of this pilot study the authors introduce a process to create a model for describing execution tracing as a quality property; moreover, the performance of four different models created is compared. The process and the models presented are capable of capturing subjective uncertainty which is an intrinsic part of the quality measurement process. In addition, the possibility of linking the presented models to software product quality frameworks is also illustrated

    Benchmarking in cluster analysis: A white paper

    Get PDF
    To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made
    corecore