1,484 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Application of decision trees and multivariate regression trees in design and optimization

    Get PDF
    Induction of decision trees and regression trees is a powerful technique not only for performing ordinary classification and regression analysis but also for discovering the often complex knowledge which describes the input-output behavior of a learning system in qualitative forms;In the area of classification (discrimination analysis), a new technique called IDea is presented for performing incremental learning with decision trees. It is demonstrated that IDea\u27s incremental learning can greatly reduce the spatial complexity of a given set of training examples. Furthermore, it is shown that this reduction in complexity can also be used as an effective tool for improving the learning efficiency of other types of inductive learners such as standard backpropagation neural networks;In the area of regression analysis, a new methodology for performing multiobjective optimization has been developed. Specifically, we demonstrate that muitiple-objective optimization through induction of multivariate regression trees is a powerful alternative to the conventional vector optimization techniques. Furthermore, in an attempt to investigate the effect of various types of splitting rules on the overall performance of the optimizing system, we present a tree partitioning algorithm which utilizes a number of techniques derived from diverse fields of statistics and fuzzy logic. These include: two multivariate statistical approaches based on dispersion matrices, an information-theoretic measure of covariance complexity which is typically used for obtaining multivariate linear models, two newly-formulated fuzzy splitting rules based on Pearson\u27s parametric and Kendall\u27s nonparametric measures of association, Bellman and Zadeh\u27s fuzzy decision-maximizing approach within an inductive framework, and finally, the multidimensional extension of a widely-used fuzzy entropy measure. The advantages of this new approach to optimization are highlighted by presenting three examples which respectively deal with design of a three-bar truss, a beam, and an electric discharge machining (EDM) process

    Typicality, graded membership, and vagueness

    Get PDF
    This paper addresses theoretical problems arising from the vagueness of language terms, and intuitions of the vagueness of the concepts to which they refer. It is argued that the central intuitions of prototype theory are sufficient to account for both typicality phenomena and psychological intuitions about degrees of membership in vaguely defined classes. The first section explains the importance of the relation between degrees of membership and typicality (or goodness of example) in conceptual categorization. The second and third section address arguments advanced by Osherson and Smith (1997), and Kamp and Partee (1995), that the two notions of degree of membership and typicality must relate to fundamentally different aspects of conceptual representations. A version of prototype theory—the Threshold Model—is proposed to counter these arguments and three possible solutions to the problems of logical selfcontradiction and tautology for vague categorizations are outlined. In the final section graded membership is related to the social construction of conceptual boundaries maintained through language use

    Towards general information theoretical representations of database problems

    Full text link

    Combining Representation Learning with Logic for Language Processing

    Get PDF
    The current state-of-the-art in many natural language processing and automated knowledge base completion tasks is held by representation learning methods which learn distributed vector representations of symbols via gradient-based optimization. They require little or no hand-crafted features, thus avoiding the need for most preprocessing steps and task-specific assumptions. However, in many cases representation learning requires a large amount of annotated training data to generalize well to unseen data. Such labeled training data is provided by human annotators who often use formal logic as the language for specifying annotations. This thesis investigates different combinations of representation learning methods with logic for reducing the need for annotated training data, and for improving generalization.Comment: PhD Thesis, University College London, Submitted and accepted in 201

    Logic Diffusion for Knowledge Graph Reasoning

    Full text link
    Most recent works focus on answering first order logical queries to explore the knowledge graph reasoning via multi-hop logic predictions. However, existing reasoning models are limited by the circumscribed logical paradigms of training samples, which leads to a weak generalization of unseen logic. To address these issues, we propose a plug-in module called Logic Diffusion (LoD) to discover unseen queries from surroundings and achieves dynamical equilibrium between different kinds of patterns. The basic idea of LoD is relation diffusion and sampling sub-logic by random walking as well as a special training mechanism called gradient adaption. Besides, LoD is accompanied by a novel loss function to further achieve the robust logical diffusion when facing noisy data in training or testing sets. Extensive experiments on four public datasets demonstrate the superiority of mainstream knowledge graph reasoning models with LoD over state-of-the-art. Moreover, our ablation study proves the general effectiveness of LoD on the noise-rich knowledge graph.Comment: 10 pages, 6 figure

    The time‐cost trade‐off analysis in construction project using computer simulation and interactive procedure

    Get PDF
    Several criteria must be considered while preparing the schedule of a construction project. The completion time and project cost are analyzed in most cases. Additionally, the risk related to the criteria has to be taken into account as well. Thus, project planning problem can be denned as a multicriteria decision problem under risk. In this paper, a project scheduling problem including time‐cost trade‐offs is analyzed. We assume that various resource allocations can be considered. A new technique based on computer simulation and interactive approach is proposed. In the first step, simulation experiments are performed to evaluate decision alternatives with respect to the criteria. An interactive technique INSDECM is employed for generating the final solution of the problem. The procedure uses stochastic dominance rules for comparing decision alternatives with respect to the criteria. A numerical example is presented to illustrate the applicability of the technique. Santrauka Rengiant statybos projekto įvykdymo grafiką reikia įvertinti keletą kriterijų. Daugeliu atvejų analizuojamas projekto baigimo laikas ir sąmatinė kaina. Taip pat gali būti įvertinama rizika. Taigi projekto planavimo problema gali būti apibūdinama kaip daugiakriterinė sprendimo problema įvertinant riziką. Straipsnyje analizuojama projekto planavimo problema, suderinant projekto įvykdymo laiką ir kainą. Remiamasi prielaida, kad galimi įvairūs išteklių paskirstymai. Pasiūlyta nauja metodologija, pagrįsta kompiuteriniu modeliavimu ir interaktyvaus metodo taikymu. Pirmuoju etapu imitaciniais modeliais įvertinamos sprendimo alternatyvos. Antruoju etapu galutiniam problemos sprendimui taikoma interaktyvi INSDECM metodologija. Šioje procedūroje, siekiant palyginti sprendimo alternatyvas pagal kriterijus, taikomos stochastinės dominavimo taisyklės. Naujos metodologijos taikymą iliustruoja skaitmeninis pavyzdys.  First published online: 21 Oct 2010 Reikšminiai žodžiai: projekto planavimas, daugiakriterinių sprendimų priėmimo automatizavimas, laiko ir kainos suderinamumas, interaktyvi metodologija, modeliavimas

    Multidimensional Poverty in Cameroon: Determinants and Spatial Distribution

    Get PDF
    The study examined the usefulness and relevance of the contingent valuation method (CVM) in community-based (CB) project planning and implementation. To elicit willingness to pay (WTP) values for the restocking of Lake Bamendjim with Tilapia nilotica and Heterotis niloticus fish species, the study used pre-tested questionnaires interviewer-administered to 1,000 randomly selected households in the Bambalang Region of Cameroon.The datawere elicitedwith the conventional referendumdesign and analysed using a referendum model. Empirical findings indicated that about 85% of the sampled households were willing to pay about CFAF1,054 (US$2.1) for the restocking project. This amount was found to be significantly related to the starting price used in the referendum design, household income, the gender of the respondent, the age of the respondent, household poverty status, and previous participation of a household in a community development project.The findings prompted the following recommendations. Firstly, in order to reduce community burden due to cash constraints, it is advisable for the mean estimate obtained for the scheme to be split into four instalments over a year. Secondly, since the success of the scheme largely depends on the governing roles of the scheme, it is further advisable for the community to allowthemanagement of the scheme to be handled by the elderly community members. Finally, it will be important during the financing of the scheme, to levy wealthier household heads an amount sufficient to subsidize poorer household heads who cannot afford to pay the threshold price.

    Soft computing approaches to uncertainty propagation in environmental risk mangement

    Get PDF
    Real-world problems, especially those that involve natural systems, are complex and composed of many nondeterministic components having non-linear coupling. It turns out that in dealing with such systems, one has to face a high degree of uncertainty and tolerate imprecision. Classical system models based on numerical analysis, crisp logic or binary logic have characteristics of precision and categoricity and classified as hard computing approach. In contrast soft computing approaches like probabilistic reasoning, fuzzy logic, artificial neural nets etc have characteristics of approximation and dispositionality. Although in hard computing, imprecision and uncertainty are undesirable properties, in soft computing the tolerance for imprecision and uncertainty is exploited to achieve tractability, lower cost of computation, effective communication and high Machine Intelligence Quotient (MIQ). Proposed thesis has tried to explore use of different soft computing approaches to handle uncertainty in environmental risk management. The work has been divided into three parts consisting five papers. In the first part of this thesis different uncertainty propagation methods have been investigated. The first methodology is generalized fuzzy α-cut based on the concept of transformation method. A case study of uncertainty analysis of pollutant transport in the subsurface has been used to show the utility of this approach. This approach shows superiority over conventional methods of uncertainty modelling. A Second method is proposed to manage uncertainty and variability together in risk models. The new hybrid approach combining probabilistic and fuzzy set theory is called Fuzzy Latin Hypercube Sampling (FLHS). An important property of this method is its ability to separate randomness and imprecision to increase the quality of information. A fuzzified statistical summary of the model results gives indices of sensitivity and uncertainty that relate the effects of variability and uncertainty of input variables to model predictions. The feasibility of the method is validated to analyze total variance in the calculation of incremental lifetime risks due to polychlorinated dibenzo-p-dioxins and dibenzofurans (PCDD/F) for the residents living in the surroundings of a municipal solid waste incinerator (MSWI) in Basque Country, Spain. The second part of this thesis deals with the use of artificial intelligence technique for generating environmental indices. The first paper focused on the development of a Hazzard Index (HI) using persistence, bioaccumulation and toxicity properties of a large number of organic and inorganic pollutants. For deriving this index, Self-Organizing Maps (SOM) has been used which provided a hazard ranking for each compound. Subsequently, an Integral Risk Index was developed taking into account the HI and the concentrations of all pollutants in soil samples collected in the target area. Finally, a risk map was elaborated by representing the spatial distribution of the Integral Risk Index with a Geographic Information System (GIS). The second paper is an improvement of the first work. New approach called Neuro-Probabilistic HI was developed by combining SOM and Monte-Carlo analysis. It considers uncertainty associated with contaminants characteristic values. This new index seems to be an adequate tool to be taken into account in risk assessment processes. In both study, the methods have been validated through its implementation in the industrial chemical / petrochemical area of Tarragona. The third part of this thesis deals with decision-making framework for environmental risk management. In this study, an integrated fuzzy relation analysis (IFRA) model is proposed for risk assessment involving multiple criteria. The fuzzy risk-analysis model is proposed to comprehensively evaluate all risks associated with contaminated systems resulting from more than one toxic chemical. The model is an integrated view on uncertainty techniques based on multi-valued mappings, fuzzy relations and fuzzy analytical hierarchical process. Integration of system simulation and risk analysis using fuzzy approach allowed to incorporate system modelling uncertainty and subjective risk criteria. In this study, it has been shown that a broad integration of fuzzy system simulation and fuzzy risk analysis is possible. In conclusion, this study has broadly demonstrated the usefulness of soft computing approaches in environmental risk analysis. The proposed methods could significantly advance practice of risk analysis by effectively addressing critical issues of uncertainty propagation problem.Los problemas del mundo real, especialmente aquellos que implican sistemas naturales, son complejos y se componen de muchos componentes indeterminados, que muestran en muchos casos una relación no lineal. Los modelos convencionales basados en técnicas analíticas que se utilizan actualmente para conocer y predecir el comportamiento de dichos sistemas pueden ser muy complicados e inflexibles cuando se quiere hacer frente a la imprecisión y la complejidad del sistema en un mundo real. El tratamiento de dichos sistemas, supone el enfrentarse a un elevado nivel de incertidumbre así como considerar la imprecisión. Los modelos clásicos basados en análisis numéricos, lógica de valores exactos o binarios, se caracterizan por su precisión y categorización y son clasificados como una aproximación al hard computing. Por el contrario, el soft computing tal como la lógica de razonamiento probabilístico, las redes neuronales artificiales, etc., tienen la característica de aproximación y disponibilidad. Aunque en la hard computing, la imprecisión y la incertidumbre son propiedades no deseadas, en el soft computing la tolerancia en la imprecisión y la incerteza se aprovechan para alcanzar tratabilidad, bajos costes de computación, una comunicación efectiva y un elevado Machine Intelligence Quotient (MIQ). La tesis propuesta intenta explorar el uso de las diferentes aproximaciones en la informática blanda para manipular la incertidumbre en la gestión del riesgo medioambiental. El trabajo se ha dividido en tres secciones que forman parte de cinco artículos. En la primera parte de esta tesis, se han investigado diferentes métodos de propagación de la incertidumbre. El primer método es el generalizado fuzzy α-cut, el cual está basada en el método de transformación. Para demostrar la utilidad de esta aproximación, se ha utilizado un caso de estudio de análisis de incertidumbre en el transporte de la contaminación en suelo. Esta aproximación muestra una superioridad frente a los métodos convencionales de modelación de la incertidumbre. La segunda metodología propuesta trabaja conjuntamente la variabilidad y la incertidumbre en los modelos de evaluación de riesgo. Para ello, se ha elaborado una nueva aproximación híbrida denominada Fuzzy Latin Hypercube Sampling (FLHS), que combina los conjuntos de la teoría de probabilidad con la teoría de los conjuntos difusos. Una propiedad importante de esta teoría es su capacidad para separarse los aleatoriedad y imprecisión, lo que supone la obtención de una mayor calidad de la información. El resumen estadístico fuzzificado de los resultados del modelo generan índices de sensitividad e incertidumbre que relacionan los efectos de la variabilidad e incertidumbre de los parámetros de modelo con las predicciones de los modelos. La viabilidad del método se llevó a cabo mediante la aplicación de un caso a estudio donde se analizó la varianza total en la cálculo del incremento del riesgo sobre el tiempo de vida de los habitantes que habitan en los alrededores de una incineradora de residuos sólidos urbanos en Tarragona, España, debido a las emisiones de dioxinas y furanos (PCDD/Fs). La segunda parte de la tesis consistió en la utilización de las técnicas de la inteligencia artificial para la generación de índices medioambientales. En el primer artículo se desarrolló un Índice de Peligrosidad a partir de los valores de persistencia, bioacumulación y toxicidad de un elevado número de contaminantes orgánicos e inorgánicos. Para su elaboración, se utilizaron los Mapas de Auto-Organizativos (SOM), que proporcionaron un ranking de peligrosidad para cada compuesto. A continuación, se elaboró un Índice de Riesgo Integral teniendo en cuenta el Índice de peligrosidad y las concentraciones de cada uno de los contaminantes en las muestras de suelo recogidas en la zona de estudio. Finalmente, se elaboró un mapa de la distribución espacial del Índice de Riesgo Integral mediante la representación en un Sistema de Información Geográfico (SIG). El segundo artículo es un mejoramiento del primer trabajo. En este estudio, se creó un método híbrido de los Mapas Auto-organizativos con los métodos probabilísticos, obteniéndose de esta forma un Índice de Riesgo Integrado. Mediante la combinación de SOM y el análisis de Monte-Carlo se desarrolló una nueva aproximación llamada Índice de Peligrosidad Neuro-Probabilística. Este nuevo índice es una herramienta adecuada para ser utilizada en los procesos de análisis. En ambos artículos, la viabilidad de los métodos han sido validados a través de su aplicación en el área de la industria química y petroquímica de Tarragona (Cataluña, España). El tercer apartado de esta tesis está enfocado en la elaboración de una estructura metodológica de un sistema de ayuda en la toma de decisiones para la gestión del riesgo medioambiental. En este estudio, se presenta un modelo integrado de análisis de fuzzy (IFRA) para la evaluación del riesgo cuyo resultado depende de múltiples criterios. El modelo es una visión integrada de las técnicas de incertidumbre basadas en diseños de valoraciones múltiples, relaciones fuzzy y procesos analíticos jerárquicos inciertos. La integración de la simulación del sistema y el análisis del riesgo utilizando aproximaciones inciertas permitieron incorporar la incertidumbre procedente del modelo junto con la incertidumbre procedente de la subjetividad de los criterios. En este estudio, se ha demostrado que es posible crear una amplia integración entre la simulación de un sistema incierto y de un análisis de riesgo incierto. En conclusión, este trabajo demuestra ampliamente la utilidad de aproximación Soft Computing en el análisis de riesgos ambientales. Los métodos propuestos podría avanzar significativamente la práctica de análisis de riesgos de abordar eficazmente el problema de propagación de incertidumbre
    corecore