1,667 research outputs found
Numeric prediction of dissolved oxygen status through two-stage training for classification-driven regression
Dissolved oxygen of aquaculture is an important measure of the quality of culture environment and how aquatic products have been grown. In the machine learning context, the above measure can be achieved by defining a regression problem, which aims at numerical prediction of the dissolved oxygen status. In general, the vast majority of popular machine learning algorithms were designed for undertaking classification tasks. In order to effectively adopt the popular machine learning algorithms for the above-mentioned numerical prediction, in this paper, we propose a two-stage training approach that involves transforming a regression problem into a classification problem and then transforming it back to regression problem. In particular, unsupervised discretization of continuous attributes is adopted at the first stage to transform the target (numeric) attribute into a discrete (nominal) one with several intervals, such that popular machine learning algorithms can be used to predict the interval to which an instance belongs in the setting of a classification task. Furthermore, based on the classification result at the first stage, some of the instances within the predicted interval are selected for training at the second stage towards numerical prediction of the target attribute value of each instance. An experimental study is conducted to investigate in general the effectiveness of the popular learning algorithms in the numerical prediction task and also analyze how the increase of the number of training instances (selected at the second training stage) can impact on the final prediction performance. The results show that the adoption of decision tree learning and neural networks lead to better and more stable performance than Naive Bayes, K Nearest Neighbours and Support Vector Machine
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
An assessment of the effectiveness of using data analytics to predict death claim seasonality and protection policy review lapses in a life insurance company
Data analytics tools are becoming increasingly common in the life insurance industry. This research considers two use cases for predictive analytics in a life insurance company based in Ireland. The first case study relates to the use of time series models to forecast the seasonality of death claim notifications. The baseline model predicted no seasonal variation in death claim notifications over a calendar year. This reflects the life insurance company’s current approach, whereby it is assumed that claims are notified linearly over a calendar year. More accurate forecasting of death claims seasonality would enhance the life insurance company’s cashflow planning and analysis of financial results. The performance of five time series models was compared against the baseline model. The time series models included a simple historical average model, a classical SARIMA model, the Random Forest Regressor and Prophet machine learning models and the LSTM deep learning model. The models were trained on both the life insurance company’s historical death claims data and on Irish population deaths data for the 25-74 age cohort over the same observation periods. The results demonstrated that machine learning time series models were generally more effective than the baseline model in forecasting death claim seasonality. It was also demonstrated that models trained on both Irish population deaths and the life insurance company’s historical death claims could outperform the baseline model. The best forecaster was Facebook’s Prophet model, trained on the life insurance company’s claims data. Each of the models trained on Irish population deaths data outperformed the baseline model. The SARIMA and LSTM consistently underperformed the baseline model when both were trained on death claims data. All models performed better when claims directly related to Covid-19 were removed from the testing data. The second case study relates to the use of classification models to predict protection policy lapse behaviour following a policy review. The life insurance company currently has no method of predicting individual policy lapses, hence the baseline model assumed that all policies had an equal probability of lapsing. More accurate prediction of policy review lapse outcomes would enhance the life insurance company’s profit forecasting ability. It would also provide the company with the opportunity to potentially reduce lapse rates at policy review by tailoring alternative options for certain groups of policyholders. The performance of 12 classification models was assessed against the baseline model - KNN, Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, Extra Trees, XGBoost, LightGBM, AdaBoost and Multi-Layer Perceptron (MLP). To address class imbalance in the data, 11 rebalancing techniques were assessed. These included cost-sensitive algorithms (Class Weight Balancing), oversampling (Random Oversampling, ADASYN, SMOTE, Borderline SMOTE), undersampling (Random Undersampling, and Near Miss versions 1 to 3) as well as a combination of oversampling and undersampling (SMOTETomek and SMOTEENN). When combined with rebalancing methods, the predictive capacity of the classification models outperformed the baseline model in almost every case. However, results varied by train/test split and by evaluation metric. Oversampling models performed best on F1 Score and ROC-AUC while SMOTEENN and the undersampling models generated the highest levels of Recall. The top F1 Score was generated by the Naïve Bayes model when combined with SMOTE. The MLP model generated the highest ROC-AUC when combined with BorderlineSMOTE. The results of both case studies demonstrate that data analytics techniques can enhance a life insurance company’s predictive toolkit. It is recommended that further opportunities to enhance the predictive ability of the time series and classification models be explored
Applying Machine Learning to Biological Status (QValues) from Physio-chemical Conditions of Irish Rivers
This thesis evaluates and optimises a variety of predictive models for assessing biological classification status, with an emphasis on water quality monitoring. Grounded in previous pertinent studies, it builds on the findings of (Arrighi and Castelli, 2023) concerning Tuscany’s river catchments, highlighting a solid correlation between river ecological status and parameters like summer climate and land use. They achieved an 80% prediction precision using the Random Forest algorithm, particularly adept at identifying good ecological conditions, leveraging a dataset devoid of chemical data
TB208: Biological Water Quality Standards to Achieve Biological Condition Goals in Maine Rivers and Streams: Science and Policy
This publication describes the philosophy, history, methodology, and management applications of numeric biological criteria in water quality standards in Maine. The presentation describes the decision-making process used by the Maine Department of Environmental Protection (MDEP) for assessing attainment of aquatic life uses in water quality standards using benthic macroinvertebrates in Maine streams and rivers including eight case studies of management applications and the improved environmental outcomes that have resulted. The MDEP, University of Maine, and business and nonprofit stakeholders participated in the development and testing of Maine’s numeric biological criteria. This publication further discusses the broader relevance of numeric biological criteria in water quality management at both the state and federal levels and considers parallels and differences between Maine’s biological criteria and other biological assessment methods in the United States and the European Union.https://digitalcommons.library.umaine.edu/aes_techbulletin/1205/thumbnail.jp
Identifying significant factors and optimal sites for commercial salmon farming in northern Norway. An integrated GIS and machine learning approach using random forest.
This study presents a data-driven modelling approach to identify important factors influencing
the growth- and mortality rate for farmed salmon in northern Norway. Furthermore, a model
is trained to determine the best fish farming sites and identify optimal areas with the best
geographical conditions.
Aquaculture site production and location data from 323 salmon farming sites (all licensed
aquaculture sites) in northern Norway were obtained from the Directory of Fisheries. Two
dependent variables, growth- and mortality rate, were calculated based on the monthly
increase in biomass and mortality. These variables were combined with state-of-the-art
environmental- and exploratory socio-economic data obtained from the institute of marine
research (IMR), the Norwegian Meteorological Institute, Delft University of Technology,
Norwegian Coastal Administration, and Statistics Norway.
Using random forest regression and recursive feature elimination, a data-driven ensemble
approach identified significant variables. Prediction of optimal sites for salmon farming in
northern Norway was done with a species distribution modelling approach using random
forest classification.
The important factors affecting salmon growth were specific feeding rate, temperature, and
total biomass. The important factors influencing salmon mortality were temperature and total
biomass. The predicted optimal areas were inside Vefsnfjorden, Ranfjorden, Sørfjorden and
Glomfjorden, small areas near the coast and around the small islands stretching from Gladstad
to Narvik. Areas near the coast of Lofoten, Værøy, Røst, Vesterålen, Sortland and Senja.
Further north, some dispersed regions were predicted as optimal outside Tromsø and Sørøya.
Also large areas around Varangerhalvøya, Olderdalen/Kåfjorden, Lille Altafjorden and near
the shore on both sides of Stjernøysundet.
The results clearly show that space is a scares resource and that there is an urge to evaluate
the regulations and legislations concerning aquaculture in Norway. Especially the minimum
distances between the fairways and aquaculture locations. The incorporation of machine
learning approaches in GIS-based MCE analysis is suggested to help planners and decision-makers make informed and sustainable decisions about sea-area use
Recommended from our members
Gulf Estuarine Research Society 2014 Meeting
Table of Contents: Thank You to Our Sponsors! (p. 3) -- About the Gulf Estuarine Research Society (p. 4) -- Student Travel Award winners (p. 5) -- Abbreviated Schedule (p. 7) -- 2014 Plenary Speaker – Dr. Michael Osland (p. 8) -- 2014 Plenary Speaker – Dr. Maggie Walser (p. 9) -- Full Schedule (p. 10) -- Poster Session Directory (p. 17) -- Oral Presentation Abstracts (p. 21) -- Poster Presentation Abstracts (p. 38) -- Things to Do in Port Aransas (p. 52) -- Greening the Meeting (p. 53) -- Map of University of Texas Marine Science Institute (p. 54)Coastal and Estuarine Research Foundation, Port Aransas, Gulf of Mexico Foundation, Coastal Bend Bays & Estuaries Program, Lotek Wireless Fish & Wildlife Monitoring, Sea Grant Mississippi-Alabama, Sea Grant Louisiana, Sea Grant Texas, The University of Austin Marine Science Institute, Mission-Aransas National Estuarine Research ReserveMarine Scienc
Development of Streams Classification System for Nutrient Criteria in Illinois
USEPApublished or submitted for publicationis peer reviewe
- …