3,400 research outputs found

    Novel analysis–forecast system based on multi-objective optimization for air quality index

    Full text link
    © 2018 Elsevier Ltd The air quality index (AQI) is an important indicator of air quality. Owing to the randomness and non-stationarity inherent in AQI, it is still a challenging task to establish a reasonable analysis–forecast system for AQI. Previous studies primarily focused on enhancing either forecasting accuracy or stability and failed to improve both aspects simultaneously, leading to unsatisfactory results. In this study, a novel analysis–forecast system is proposed that consists of complexity analysis, data preprocessing, and optimize–forecast modules and addresses the problems of air quality monitoring. The proposed system performs a complexity analysis of the original series based on sample entropy and data preprocessing using a novel feature selection model that integrates a decomposition technique and an optimization algorithm for removing noise and selecting the optimal input structure, and then forecasts hourly AQI series by utilizing a modified least squares support vector machine optimized by a multi-objective multi-verse optimization algorithm. Experiments based on datasets from eight major cities in China demonstrated that the proposed system can simultaneously obtain high accuracy and strong stability and is thus efficient and reliable for air quality monitoring

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Application of a novel early warning system based on fuzzy time series in urban air quality forecasting in China

    Full text link
    © 2018 Elsevier B.V. With atmospheric environmental pollution becoming increasingly serious, developing an early warning system for air quality forecasting is vital to monitoring and controlling air quality. However, considering the large fluctuations in the concentration of pollutants, most previous studies have focused on enhancing accuracy, while few have addressed the stability and uncertainty analysis, which may lead to insufficient results. Therefore, a novel early warning system based on fuzzy time series was successfully developed that includes three modules: deterministic prediction module, uncertainty analysis module, and assessment module. In this system, a hybrid model combining the fuzzy time series forecasting technique and data reprocessing approaches was constructed to forecast the major air pollutants. Moreover, an uncertainty analysis was generated to further analyze and explore the uncertainties involved in future air quality forecasting. Finally, an assessment module proved the effectiveness of the developed model. The experimental results reveal that the proposed model outperforms the comparison models and baselines, and both the accuracy and the stability of the developed system are remarkable. Therefore, fuzzy logic is a better option in air quality forecasting and the developed system will be a useful tool for analyzing and monitoring air pollution

    Evolving Ensemble Fuzzy Classifier

    Full text link
    The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the literature, most of them are crafted under a static base classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System

    Multi-agent system for flood forecasting in Tropical River Basin

    Get PDF
    It is well known, the problems related to the generation of floods, their control, and management, have been treated with traditional hydrologic modeling tools focused on the study and the analysis of the precipitation-runoff relationship, a physical process which is driven by the hydrological cycle and the climate regime and that is directly proportional to the generation of floodwaters. Within the hydrological discipline, they classify these traditional modeling tools according to three principal groups, being the first group defined as trial-and-error models (e.g., "black-models"), the second group are the conceptual models, which are categorized in three main sub-groups as "lumped", "semi-lumped" and "semi-distributed", according to the special distribution, and finally, models that are based on physical processes, known as "white-box models" are the so-called "distributed-models". On the other hand, in engineering applications, there are two types of models used in streamflow forecasting, and which are classified concerning the type of measurements and variables required as "physically based models", as well as "data-driven models". The Physically oriented prototypes present an in-depth account of the dynamics related to the physical aspects that occur internally among the different systems of a given hydrographic basin. However, aside from being laborious to implement, they rely thoroughly on mathematical algorithms, and an understanding of these interactions requires the abstraction of mathematical concepts and the conceptualization of the physical processes that are intertwined among these systems. Besides, models determined by data necessitates an a-priori understanding of the physical laws controlling the process within the system, and they are bound to mathematical formulations, which require a lot of numeric information for field adjustments. Therefore, these models are remarkably different from each other because of their needs for data, and their interpretation of physical phenomena. Although there is considerable progress in hydrologic modeling for flood forecasting, several significant setbacks remain unresolved, given the stochastic nature of the hydrological phenomena, is the challenge to implement user-friendly, re-usable, robust, and reliable forecasting systems, the amount of uncertainty they must deal with when trying to solve the flood forecasting problem. However, in the past decades, with the growing environment and development of the artificial intelligence (AI) field, some researchers have seldomly attempted to deal with the stochastic nature of hydrologic events with the application of some of these techniques. Given the setbacks to hydrologic flood forecasting previously described this thesis research aims to integrate the physics-based hydrologic, hydraulic, and data-driven models under the paradigm of Multi-agent Systems for flood forecasting by designing and developing a multi-agent system (MAS) framework for flood forecasting events within the scope of tropical watersheds. With the emergence of the agent technologies, the "agent-based modeling" and "multiagent systems" simulation methods have provided applications for some areas of hydro base management like flood protection, planning, control, management, mitigation, and forecasting to combat the shocks produced by floods on society; however, all these focused on evacuation drills, and the latter not aimed at the tropical river basin, whose hydrological regime is extremely unique. In this catchment modeling environment approach, it was applied the multi-agent systems approach as a surrogate of the conventional hydrologic model to build a system that operates at the catchment level displayed with hydrometric stations, that use the data from hydrometric sensors networks (e.g., rainfall, river stage, river flow) captured, stored and administered by an organization of interacting agents whose main aim is to perform flow forecasting and awareness, and in so doing enhance the policy-making process at the watershed level. Section one of this document surveys the status of the current research in hydrologic modeling for the flood forecasting task. It is a journey through the background of related concerns to the hydrological process, flood ontologies, management, and forecasting. The section covers, to a certain extent, the techniques, methods, and theoretical aspects and methods of hydrological modeling and their types, from the conventional models to the present-day artificial intelligence prototypes, making special emphasis on the multi-agent systems, as most recent modeling methodology in the hydrological sciences. However, it is also underlined here that the section does not contribute to an all-inclusive revision, rather its purpose is to serve as a framework for this sort of work and a path to underline the significant aspects of the works. In section two of the document, it is detailed the conceptual framework for the suggested Multiagent system in support of flood forecasting. To accomplish this task, several works need to be carried out such as the sketching and implementation of the system’s framework with the (Belief-Desire-Intention model) architecture for flood forecasting events within the concept of the tropical river basin. Contributions of this proposed architecture are the replacement of the conventional hydrologic modeling with the use of multi-agent systems, which makes it quick for hydrometric time-series data administration and modeling of the precipitation-runoff process which conveys to flood in a river course. Another advantage is the user-friendly environment provided by the proposed multi-agent system platform graphical interface, the real-time generation of graphs, charts, and monitors with the information on the immediate event taking place in the catchment, which makes it easy for the viewer with some or no background in data analysis and their interpretation to get a visual idea of the information at hand regarding the flood awareness. The required agents developed in this multi-agent system modeling framework for flood forecasting have been trained, tested, and validated under a series of experimental tasks, using the hydrometric series information of rainfall, river stage, and streamflow data collected by the hydrometric sensor agents from the hydrometric sensors.Como se sabe, los problemas relacionados con la generación de inundaciones, su control y manejo, han sido tratados con herramientas tradicionales de modelado hidrológico enfocados al estudio y análisis de la relación precipitación-escorrentía, proceso físico que es impulsado por el ciclo hidrológico y el régimen climático y este esta directamente proporcional a la generación de crecidas. Dentro de la disciplina hidrológica, clasifican estas herramientas de modelado tradicionales en tres grupos principales, siendo el primer grupo el de modelos empíricos (modelos de caja negra), modelos conceptuales (o agrupados, semi-agrupados o semi-distribuidos) dependiendo de la distribución espacial y, por último, los basados en la física, modelos de proceso (o "modelos de caja blanca", y/o distribuidos). En este sentido, clasifican las aplicaciones de predicción de caudal fluvial en la ingeniería de recursos hídricos en dos tipos con respecto a los valores y parámetros que requieren en: modelos de procesos basados en la física y la categoría de modelos impulsados por datos. Los modelos basados en la física proporcionan una descripción detallada de la dinámica relacionada con los aspectos físicos que ocurren internamente entre los diferentes sistemas de una cuenca hidrográfica determinada. Sin embargo, aparte de ser complejos de implementar, se basan completamente en algoritmos matemáticos, y la comprensión de estas interacciones requiere la abstracción de conceptos matemáticos y la conceptualización de los procesos físicos que se entrelazan entre estos sistemas. Además, los modelos impulsados por datos no requieren conocimiento de los procesos físicos que gobiernan, sino que se basan únicamente en ecuaciones empíricas que necesitan una gran cantidad de datos y requieren calibración de los datos en el sitio. Los dos modelos difieren significativamente debido a sus requisitos de datos y de cómo expresan los fenómenos físicos. La elaboración de modelos hidrológicos para el pronóstico de inundaciones ha dado grandes pasos, pero siguen sin resolverse algunos contratiempos importantes, dada la naturaleza estocástica de los fenómenos hidrológicos, es el desafío de implementar sistemas de pronóstico fáciles de usar, reutilizables, robustos y confiables, la cantidad de incertidumbre que deben afrontar al intentar resolver el problema de la predicción de inundaciones. Sin embargo, en las últimas décadas, con el entorno creciente y el desarrollo del campo de la inteligencia artificial (IA), algunos investigadores rara vez han intentado abordar la naturaleza estocástica de los eventos hidrológicos con la aplicación de algunas de estas técnicas. Dados los contratiempos en el pronóstico de inundaciones hidrológicas descritos anteriormente, esta investigación de tesis tiene como objetivo integrar los modelos hidrológicos, basados en la física, hidráulicos e impulsados por datos bajo el paradigma de Sistemas de múltiples agentes para el pronóstico de inundaciones por medio del bosquejo y desarrollo del marco de trabajo del sistema multi-agente (MAS) para los eventos de predicción de inundaciones en el contexto de cuenca hidrográfica tropical. Con la aparición de las tecnologías de agentes, se han emprendido algunos enfoques de simulación recientes en la investigación hidrológica con modelos basados en agentes y sistema multi-agente, principalmente en alerta por inundaciones, seguridad y planificación de inundaciones, control y gestión de inundaciones y pronóstico de inundaciones, todos estos enfocado a simulacros de evacuación, y este último no dirigido a la cuenca tropical, cuyo régimen hidrológico es extremadamente único. En este enfoque de entorno de modelado de cuencas, se aplican los enfoques de sistemas multi-agente como un sustituto del modelado hidrológico convencional para construir un sistema que opera a nivel de cuenca con estaciones hidrométricas desplegadas, que utilizan los datos de redes de sensores hidrométricos (por ejemplo, lluvia , nivel del río, caudal del río) capturado, almacenado y administrado por una organización de agentes interactuantes cuyo objetivo principal es realizar pronósticos de caudal y concientización para mejorar las capacidades de soporte en la formulación de políticas a nivel de cuenca hidrográfica. La primera sección de este documento analiza el estado del arte sobre la investigación actual en modelos hidrológicos para la tarea de pronóstico de inundaciones. Es un viaje a través de los antecedentes preocupantes relacionadas con el proceso hidrológico, las ontologías de inundaciones, la gestión y la predicción. El apartado abarca, en cierta medida, las técnicas, métodos y aspectos teóricos y métodos del modelado hidrológico y sus tipologías, desde los modelos convencionales hasta los prototipos de inteligencia artificial actuales, haciendo hincapié en los sistemas multi-agente, como un enfoque de simulación reciente en la investigación hidrológica. Sin embargo, se destaca que esta sección no contribuye a una revisión integral, sino que su propósito es servir de marco para este tipo de trabajos y una guía para subrayar los aspectos significativos de los trabajos. En la sección dos del documento, se detalla el marco de trabajo propuesto para el sistema multi-agente para el pronóstico de inundaciones. Los trabajos realizados comprendieron el diseño y desarrollo del marco de trabajo del sistema multi-agente con la arquitectura (modelo Creencia-Deseo-Intención) para la predicción de eventos de crecidas dentro del concepto de cuenca hidrográfica tropical. Las contribuciones de esta arquitectura propuesta son el reemplazo del modelado hidrológico convencional con el uso de sistemas multi-agente, lo que agiliza la administración de las series de tiempo de datos hidrométricos y el modelado del proceso de precipitación-escorrentía que conduce a la inundación en el curso de un río. Otra ventaja es el entorno amigable proporcionado por la interfaz gráfica de la plataforma del sistema multi-agente propuesto, la generación en tiempo real de gráficos, cuadros y monitores con la información sobre el evento inmediato que tiene lugar en la cuenca, lo que lo hace fácil para el espectador con algo o sin experiencia en análisis de datos y su interpretación para tener una idea visual de la información disponible con respecto a la cognición de las inundaciones. Los agentes necesarios desarrollados en este marco de modelado de sistemas multi-agente para el pronóstico de inundaciones han sido entrenados, probados y validados en una serie de tareas experimentales, utilizando la información de la serie hidrométrica de datos de lluvia, nivel del río y flujo del curso de agua recolectados por los agentes sensores hidrométricos de los sensores hidrométricos de campo.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: María Araceli Sanchis de Miguel.- Secretario: Juan Gómez Romero.- Vocal: Juan Carlos Corrale

    Deterministic and Probabilistic Risk Management Approaches in Construction Projects: A Systematic Literature Review and Comparative Analysis

    Get PDF
    Risks and uncertainties are inevitable in construction projects and can drastically change the expected outcome, negatively impacting the project’s success. However, risk management (RM) is still conducted in a manual, largely ineffective, and experience-based fashion, hindering automation and knowledge transfer in projects. The construction industry is benefitting from the recent Industry 4.0 revolution and the advancements in data science branches, such as artificial intelligence (AI), for the digitalization and optimization of processes. Data-driven methods, e.g., AI and machine learning algorithms, Bayesian inference, and fuzzy logic, are being widely explored as possible solutions to RM domain shortcomings. These methods use deterministic or probabilistic risk reasoning approaches, the first of which proposes a fixed predicted value, and the latter embraces the notion of uncertainty, causal dependencies, and inferences between variables affecting projects’ risk in the predicted value. This research used a systematic literature review method with the objective of investigating and comparatively analyzing the main deterministic and probabilistic methods applied to construction RM in respect of scope, primary applications, advantages, disadvantages, limitations, and proven accuracy. The findings established recommendations for optimum AI-based frameworks for different management levels—enterprise, project, and operational—for large or small data sets

    Time series data mining: preprocessing, analysis, segmentation and prediction. Applications

    Get PDF
    Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which significantly reduces the computational cost of previously proposed coral reef methods. Also, the optimisation of both objectives (clustering quality and approximation quality), which are in conflict, could be an interesting open challenge, which will be tackled in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences

    Flood Forecasting Using Machine Learning Methods

    Get PDF
    This book is a printed edition of the Special Issue Flood Forecasting Using Machine Learning Methods that was published in Wate
    corecore