11 research outputs found

    Nuovi metodi matematici e informatici per l'ottimizzazione dello studio e dello sfruttamento di risorse naturali

    Full text link
    I problemi con dati di origine naturale richiedono tecniche di analisi dei dati specifiche. In questa tesi sviluppiamo nuovi modelli matematici e informatici per lavorare con dati di origine naturale, che applichiamo all'ottimizzazione delle risorse idriche, alimentari e atmosferiche. Nello specifico, ci concentriamo sulla progettazione di tecniche di pre-elaborazione dinamiche viste come problemi di ottimizzazione multivariata, per migliorare le prestazioni degli algoritmi di apprendimento e l'interpretabilità dei loro risultati. Abbiamo applicato le nostre tecniche allo studio dell'inquinamento ambientale, alla caratterizzazione delle falde acquifere, alla rilevazione della pesca illegale, alla caratterizzazione geochimica di edifici medievali e all'identificazione delle impronte geochimiche dell'olio d'oliva. Abbiamo progettato una metodologia completa che include la selezione delle variabili, il rilevamento dei valori anomali, la selezione dei ritardi e, in modo limitato, l'ingegneria delle variabili e le abbiamo fornito una formalizzazione matematica, che ci ha permesso di studiarne le proprietà e le generalizzazioni. All'altra estremità del processo di ottimizzazione abbiamo utilizzato algoritmi classici di regressione, classificazione e clustering e ci siamo concentrati su algoritmi evolutivi (in particolare, il noto NSGA-II) per eseguire le stesse ottimizzazioni. Di conseguenza, in generale abbiamo ottenuto risultati più rilevanti e interpretabili di quelli già esistenti in letteratura, e siamo stati in grado di risolvere problemi che non potevano essere affrontati con gli strumenti classici. Infine, mostriamo come i nostri metodi possono essere applicati a una varietà di situazioni del mondo reale.Problems with data that have a natural origin require specific data analysis techniques. In this thesis we develop new mathematical and computer science models to work, specifically, with data with natural origin, and we apply them to the study of water, food, and atmospheric resources. Specifically, we focus on designing dynamic pre-processing techniques seen as multivariate optimization problems, to improve the performance of learning algorithms and the interpretability of their results. We obtain a comprehensive methodology that includes feature selection, outlier detection, lag selection and, in a limited way, feature engineering, and we give it a mathematical formalization, which allows us to study its properties and generalizations. At the other end of the optimization process we use classical regression, classification and clustering algorithms, and we focus on evolutionary algorithms (specifically, the well-known algorithm NSGA-II) to perform the optimizations themselves. By applying our method, in general we obtain more relevant and interpretable results than those already existing in literature, and we are able to solve problems that could not be tackled with classical tools. To show that our methods can be applied to a variety of real-world situations, we consider problems of environmental pollution, characterization of aquifers, detection of illegal fishing, characterization of medieval buildings, and identification of local-production olive oils; all such problems, and their data, come from existing, real, and current projects of different nature

    Feature and language selection in temporal symbolic regression for interpretable air quality modelling

    Get PDF
    Air quality modelling that relates meteorological, car traffic, and pollution data is a fundamental problem, approached in several different ways in the recent literature. In particular, a set of such data sampled at a specific location and during a specific period of time can be seen as a multivariate time series, and modelling the values of the pollutant concentrations can be seen as a multivariate temporal regression problem. In this paper, we propose a new method for symbolic multivariate temporal regression, and we apply it to several data sets that contain real air quality data from the city of Wroclaw (Poland). Our experiments show that our approach is superior to classical, especially symbolic, ones, both in statistical performances and the interpretability of the results

    An intelligent clustering method for devising the geochemical fingerprint of underground aquifers

    Get PDF
    Geochemical fingerprinting is a rapidly expanding discipline in the earth and environmental sciences, anchored in the recognition that geological processes leave behind physical, chemical and sometimes also isotopic patterns in the samples. Furthermore, the geochemical fingerprinting of natural cycles (water, carbon, soil and biota fingerprinting) are influenced by the anthropogenic impact and by the climate change. So, their monitoring is a tool of resilience and adaptation. In recent years, computational statistics and artificial intelligence methods have started to be used to help the process of geochemical fingerprinting. In this paper we consider data from 57 wells located in the province of Ferrara (Italy), all belonging to the same geological group and separated into 4 different aquifers. The aquifer from which each well extracts its water is known only in 18 of the 57 cases, while in other 39 cases it can be only hypothesized based on geological considerations. We devise a novel technique for geochemical fingerprinting of groundwater by means of which we are able to identify the exact aquifer from which a sample is extracted with a sufficiently high accuracy. Then, we experimentally prove that out method is sensibly more accurate than typical statistical approaches, such as principal component analysis, for this particular problem

    Multi-Objective Evolutionary Optimization for Time Series Lag Regression

    Get PDF
    It is well-known that in some regression problems the effect of an independent variables on the dependent one(s) may be delayed; this phenomenon is known as lag. Lag regression is one of the standard techniques for time series explanation and prediction. However, using lagged variables to transform a multivariate time series so that a propositional algorithm such as a linear regression learner can be used requires to decide, at preprocessing time, which independent variables must be lagged and by how much. In this paper, we propose a novel optimization schema to solve this problem. We test our solution, implemented with a multi-objective evolutionary algorithm, on real data taken from a larger project that aims to construct an explanation model for the study of atmospheric pollution in the city of Wroc law (Poland)

    Temporal Aspects in Air Quality Modeling-A Case Study in Wrocław

    Full text link
    Anthropogenic environmental pollution is a known and indisputable issue, and the importance of searching for reliable mathematical models that help understanding the underlying process is witnessed by the extensive literature on the topic. In this article, we focus on the temporal aspects of the processes that govern the concentration of pollutants using typical explanatory variables, such as meteorological values and traffic flows. We develop a novel technique based on multiobjective optimization and linear regression to find optimal delays for each variable, and then we apply such delays to our data to evaluate the improvement that can be obtained with respect to learning an explanatory model with standard techniques. We found that optimizing delays can, in some cases, improve the accuracy of the final model up to 15

    Simple Versus Composed Temporal Lag Regression with Feature Selection, with an Application to Air Quality Modeling

    Full text link
    Anthropogenic environmental pollution is a known and indisputable issue, and the need of ever more precise and reliable land use regression models is undeniable. In this paper we consider two years of hourly data taken in Wrocław (Poland), that contain the concentrations of NO2 and NOx in the atmosphere, and, along these, traffic flow, air pressure, humidity, solar duration, temperature, and wind speed. In the quest for an explanation model for the pollution concentrations, we improve and generalize the simple temporal lag regression model, and introduce a composed temporal regression model that entails a transformation of the data to improve the effectiveness of classical learning algorithms. We show that using the latter we obtain more accurate and better interpretable explanation models than using the former, and also than using the original, non-transformed data

    Lag Variables in Air Pollution Modeling Based on Traffic Flow and Meteorological Factors

    Full text link
    In order to refine the research on the impact of environmental factors on the concentration of pollutants in the air, in this paper, we present a mathematical model that allows the possibility of taking into account the past values of factors (explanatory variables) when modeling the current concentration of pollution. We conducted numerical analyzes based on hourly data from meteorological, traffic and air quality monitoring stations in Wrocław (Poland, Central Europe) from 2015–2017. In order to determine the optimal delay of each explanatory variable, we used a multi-objective optimization model (MO). It turned out that for the concentration of nitrogen oxides, delayed traffic flow, wind speed and sunshine duration time are more important than current ones. Then we built two random forest models: an actual model of current values of explanatory variables and a lag model with delayed variables determined by the MO method. Taking into account variables with an optimal delay (lag model) results in an increase in model accuracy for NO2 with R2 = 0.51 to 0.56 and for NOx from 0.46 to 0.52. We deduced that in pollutant concentrations modeling, the possibility of greater influence of variables with delay should always be considered because it can significantly increase the accuracy of the model and indicate additional relationships or dependencies

    Rule Extraction via Dynamic Discretization with an Application to Air Quality Modelling

    Full text link
    Association rule extraction is a very well-known and important problem in machine learning, and especially in the sub-field of explainable machine learning. Association rules are naturally extracted from data sets with Boolean (or at least categorical) attributes. In order for rule extraction algorithms to be applicable to data sets with numerical attributes as well, data must be suitably discretized, and a great amount of work has been devoted to finding good discretization algorithms, taking into account that optimal discretization is a NP-hard problem. Motivated by a specific application, in this paper we provide a novel discretization algorithm defined as an (heuristic) optimization problem and solved by an evolutionary algorithm, and we test its performances against well-known available solutions, proving (experimentally) that we are able to extract more rules in a easier way

    Towards Interval Temporal Logic Rule-Based Classification

    Full text link
    Supervised classification is one of the main computational tasks of modern Artificial Intelligence, and it is used to automatically extract an underlying theory from a set of already classified instances. The available learning schemata are mostly limited to static instances, in which the temporal component of the information is absent, neglected, or abstracted into atemporal data, and purely, native temporal classification is still largely unexplored. In this paper, we propose a temporal rulebased classifier based on interval temporal logic, that is able to learn a classification model for multivariate classified (abstracted) time series, and we discuss some implementation issues

    Gene expression profiling of the venom gland from the Venezuelan mapanare (Bothrops colombiensis) using expressed sequence tags (ESTs)

    Full text link
    corecore