176 research outputs found

    A review of recent machine learning advances for forecasting harmful Algal Blooms and shellfish contamination

    Get PDF
    Harmful algal blooms (HABs) are among the most severe ecological marine problems worldwide. Under favorable climate and oceanographic conditions, toxin-producing microalgae species may proliferate, reach increasingly high cell concentrations in seawater, accumulate in shellfish, and threaten the health of seafood consumers. There is an urgent need for the development of effective tools to help shellfish farmers to cope and anticipate HAB events and shellfish contamination, which frequently leads to significant negative economic impacts. Statistical and machine learning forecasting tools have been developed in an attempt to better inform the shellfish industry to limit damages, improve mitigation measures and reduce production losses. This study presents a synoptic review covering the trends in machine learning methods for predicting HABs and shellfish biotoxin contamination, with a particular focus on autoregressive models, support vector machines, random forest, probabilistic graphical models, and artificial neural networks (ANN). Most efforts have been attempted to forecast HABs based on models of increased complexity over the years, coupled with increased multi-source data availability, with ANN architectures in the forefront to model these events. The purpose of this review is to help defining machine learning-based strategies to support shellfish industry to manage their harvesting/production, and decision making by governmental agencies with environmental responsibilities.CEECINST/00102/2018/ UIDB/04516/2020/ UIDB/00297/2020/ UIDB/50021/2020/ UID/Multi/04326/2020info:eu-repo/semantics/publishedVersio

    PEMODELAN KECELAKAAN LALU LINTAS di KOTA BANDAR LAMPUNG MENGGUNAKAN METODE AUTOREGRESIVE INTEGRATED MOVING AVERAGE (ARIMA)

    Get PDF
    ABSTRAK Kecelakaan merupakan suatu peristiwa dijalan yang tidak teduga dengan melibatkan kendaraan dan mengakibatkan korban jiwa maupun kerugian material, kecelakaan juga disebut sebagai kejadian yang multi-faktor atau memiliki lebih dari satu penyebab yang mempengaruhi terjadinya kecelakaan. Penelitian ini bertujuan untuk mendapatkan model terbaik time series dengan menggunakan metode ARIMA dan melakukan prediksi jumlah kecelakaan lalu lintas dikota Bandar lampung dimasa mendatang. Penelitian ini, menggunakan data sekunder yang didapatkan dari Satlantas Polresta Bandar Lampung yaitu jumlah bulanan kecelakaan dari tahun 2015-2021, penelitian ini tergolong kedalam penelitian kuantitatif. Untuk memprediksi jumlah kecelakaan 7 bulan mendatang menggunakan model ARIMA yaitu peramalan yang didasarkan pada data kecelakaan di masa lampau, dengan bantuan aplikasi R-Gui untuk medapatkan model terbaik. Sehingga dalam penelitian ini diperoleh hasil akhir yaitu ARIMA(2,1,0) sebagai model terbaik untuk peramalan jumlah kecelakaan serta hasil peramalan menunjukan bahwa jumlah kecelakaan cenderung mendatar pada setiap bulannya. Kata kunci: kecelakaan lalu lintas; peramalan; ARIMA

    Sustainable marine ecosystems: deep learning for water quality assessment and forecasting

    Get PDF
    An appropriate management of the available resources within oceans and coastal regions is vital to guarantee their sustainable development and preservation, where water quality is a key element. Leveraging on a combination of cross-disciplinary technologies including Remote Sensing (RS), Internet of Things (IoT), Big Data, cloud computing, and Artificial Intelligence (AI) is essential to attain this aim. In this paper, we review methodologies and technologies for water quality assessment that contribute to a sustainable management of marine environments. Specifically, we focus on Deep Leaning (DL) strategies for water quality estimation and forecasting. The analyzed literature is classified depending on the type of task, scenario and architecture. Moreover, several applications including coastal management and aquaculture are surveyed. Finally, we discuss open issues still to be addressed and potential research lines where transfer learning, knowledge fusion, reinforcement learning, edge computing and decision-making policies are expected to be the main involved agents.Postprint (published version

    Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

    Full text link
    With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean

    Novel analysis–forecast system based on multi-objective optimization for air quality index

    Full text link
    © 2018 Elsevier Ltd The air quality index (AQI) is an important indicator of air quality. Owing to the randomness and non-stationarity inherent in AQI, it is still a challenging task to establish a reasonable analysis–forecast system for AQI. Previous studies primarily focused on enhancing either forecasting accuracy or stability and failed to improve both aspects simultaneously, leading to unsatisfactory results. In this study, a novel analysis–forecast system is proposed that consists of complexity analysis, data preprocessing, and optimize–forecast modules and addresses the problems of air quality monitoring. The proposed system performs a complexity analysis of the original series based on sample entropy and data preprocessing using a novel feature selection model that integrates a decomposition technique and an optimization algorithm for removing noise and selecting the optimal input structure, and then forecasts hourly AQI series by utilizing a modified least squares support vector machine optimized by a multi-objective multi-verse optimization algorithm. Experiments based on datasets from eight major cities in China demonstrated that the proposed system can simultaneously obtain high accuracy and strong stability and is thus efficient and reliable for air quality monitoring

    Application of a novel early warning system based on fuzzy time series in urban air quality forecasting in China

    Full text link
    © 2018 Elsevier B.V. With atmospheric environmental pollution becoming increasingly serious, developing an early warning system for air quality forecasting is vital to monitoring and controlling air quality. However, considering the large fluctuations in the concentration of pollutants, most previous studies have focused on enhancing accuracy, while few have addressed the stability and uncertainty analysis, which may lead to insufficient results. Therefore, a novel early warning system based on fuzzy time series was successfully developed that includes three modules: deterministic prediction module, uncertainty analysis module, and assessment module. In this system, a hybrid model combining the fuzzy time series forecasting technique and data reprocessing approaches was constructed to forecast the major air pollutants. Moreover, an uncertainty analysis was generated to further analyze and explore the uncertainties involved in future air quality forecasting. Finally, an assessment module proved the effectiveness of the developed model. The experimental results reveal that the proposed model outperforms the comparison models and baselines, and both the accuracy and the stability of the developed system are remarkable. Therefore, fuzzy logic is a better option in air quality forecasting and the developed system will be a useful tool for analyzing and monitoring air pollution

    Data augmentation to improve the performance of ensemble learning for system failure prediction with limited observations

    Get PDF
    Ensemble learning has been widely used to improve the performance and robustness of machine learning algorithms on time series data. However, in real operational processes where the observed data is limited, it hinders the capability of ensemble learning algorithms. To address the challenge of limited observed data, this paper proposes a novel three-layer ensemble learning framework by use of data augmentation. Firstly, multiple classical time series augmentation methods are applied to increase the size of the data set. Subsequently, after pre-processing, these augmented data is trained by multiple basic learners with K-fold cross-validation as the first layer of the developed ensemble learning framework. The outputs of the first layer are integrated via LASSO to further improve the prediction performance, which serves as the second layer of the developed framework. Finally, the third-layer output is generated by averaging the prediction of the second layer and the output from an improved Long-Short Term Memory model that provides prediction based on the augmented data. A case study on a real wastewater treatment plant is used to illustrate the effectiveness of the proposed method

    Combined forecast model involving wavelet-group methods of data handling for drought forecasting

    Get PDF
    Vigorous efforts to improve the effectiveness of drought forecasting models has yet to yield accurate result. The situation gives room on the use of robust forecasting methods that could effectively improve existing methods. The complex nature of time series data does not enable one single method that is suitable in all situations. Thus, a combined model that will provide a better result is then proposed. This study introduces a wavelet and group methods of data handling (GMDH) by integrating discrete wavelet transform (DWT) and GMDH with transfer functions such as sigmoid and radial basis function (RBF) to form three wavelet-GMDH models known as modified W-GMDH (MW-GMDH), sigmoid W-GMDH (SW-GMDH) and RBF W-GMDH. To assess the effectiveness of this approach, these models were applied to rainfall data at four study stations namely Arau and Kuala Krai in Malaysia as well as Badeggi and Duku-Lade in Nigeria. These data were transformed into four Standardized Precipitation Index (SPI) known as SPI3, SPI6, SPI9 and SPI12. The result shows that the integration of DWT improved the performance of the conventional GMDH model. The combination of these models further improved the performance of each model. The proposed model provides efficient, simple, and reliable accuracy when compared with other models. The incorporation of wavelet to the study results in improving performance for all four stations with the Combined W-GMDH (CW-GMDH) and Combined Regression W-GMDH (CRW-GMDH) models. The results show that Duku-Lade station produced the lowest value of 0.0239 and 0.0211 for RMSE and MAE and highest value of 0.9858 for R respectively. In addition, CRW-GMDH model produce the lowest value of 0.0168 and 0.0117, and the highest value of 0.9870 for RMSE MAE, and R respectively. On the percentage improvement, Duku-Lade station shows improvement over other models with the reductions in RMSE and MAE by 42.3% and 80.3% respectively. This indicates that the model is most suitable for the drought forecasting in this station. The results of the comparison among the four stations indicate that the CW-GMDH and CRW-GMDH models are more accurate and perform better than MW-GMDH, SW-GMDH and RBFW-GMDH models. However, the overall performance of the CRW-GMDH model outweigh that of the CW-GMDH model. In conclusion, CRW-GMDH model performs better than other models for drought forecasting and capable of providing a promising alternative to drought forecasting technique

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health
    corecore