8,206 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Assessment of check dams’ role in flood hazard mapping in a semi-arid environment

    Get PDF
    This study aimed to examine flood hazard zoning and assess the role of check dams as effective hydraulic structures in reducing flood hazards. To this end, factors associated with topographic, hydrologic and human characteristics were used to develop indices for flood mapping and assessment. These indices and their components were weighed for flood hazard zoning using two methods: (i) a multi-criterion decision-making model in fuzzy logic and (ii) entropy weight. After preparing the flood hazard map by using the above indices and methods, the characteristics of the change‐point were used to assess the role of the check dams in reducing flood risk. The method was used in the Ilanlu catchment, located in the northwest of Hamadan province, Iran, where it is prone to frequent flood events. The results showed that the area of ‘very low’, ‘low’ and ‘moderate’ flood hazard zones increased from about 2.2% to 7.3%, 8.6% to 19.6% and 22.7% to 31.2% after the construction of check dams, respectively. Moreover, the area of ‘high’ and ‘very high’ flood hazard zones decreased from 39.8% to 29.6%, and 26.7% to 12.2%, respectively

    Financial Markets Analysis by Probabilistic Fuzzy Modelling

    Get PDF
    For successful trading in financial markets, it is important to develop financial models where one can identify different states of the market for modifying one???s actions. In this paper, we propose to use probabilistic fuzzy systems for this purpose. We concentrate on Takagi???Sugeno (TS) probabilistic fuzzy systems that combine interpretability of fuzzy systems with the statistical properties of probabilistic systems. We start by recapitulating the general architecture of TS probabilistic fuzzy rule-based systems and summarize the corresponding reasoning schemes. We mention how probabilities can be estimated from a given data set and how a probability distribution can be approximated by a fuzzy histogram. We apply our methodology for financial time series analysis and demonstrate how a probabilistic TS fuzzy system can be identified, assuming that a linguistic term set is given. We illustrate the interpretability of such a system by inspecting the rule bases of our models.time series analysis;data-driven design;fuzzy reasoning;fuzzy rule base;probabilistic fuzzy systems

    Characterisation of large changes in wind power for the day-ahead market using a fuzzy logic approach

    Get PDF
    Wind power has become one of the renewable resources with a major growth in the electricity market. However, due to its inherent variability, forecasting techniques are necessary for the optimum scheduling of the electric grid, specially during ramp events. These large changes in wind power may not be captured by wind power point forecasts even with very high resolution Numerical Weather Prediction (NWP) models. In this paper, a fuzzy approach for wind power ramp characterisation is presented. The main benefit of this technique is that it avoids the binary definition of ramp event, allowing to identify changes in power out- put that can potentially turn into ramp events when the total percentage of change to be considered a ramp event is not met. To study the application of this technique, wind power forecasts were obtained and their corresponding error estimated using Genetic Programming (GP) and Quantile Regression Forests. The error distributions were incorporated into the characterisation process, which according to the results, improve significantly the ramp capture. Results are presented using colour maps, which provide a useful way to interpret the characteristics of the ramp events

    Query-driven learning for predictive analytics of data subspace cardinality

    Get PDF
    Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts’ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches
    • 

    corecore