10 research outputs found

    Evolutionary multivariate time series prediction

    Get PDF
    Multivariate time series (MTS) prediction plays a significant role in many practical data mining applications, such as finance, energy supply, and medical care domains. Over the years, various prediction models have been developed to obtain robust and accurate prediction. However, this is not an easy task by considering a variety of key challenges. First, not all channels (each channel represents one time series) are informative (channel selection). Considering the complexity of each selected time series, it is difficult to predefine a time window used for inputs. Second, since the selected time series may come from cross domains collected with different devices, they may require different feature extraction techniques by considering suitable parameters to extract meaningful features (feature extraction), which influences the selection and configuration of the predictor, i.e., prediction (configuration). The challenge arising from channel selection, feature extraction, and prediction (configuration) is to perform them jointly to improve prediction performance. Third, we resort to ensemble learning to solve the MTS prediction problem composed of the previously mentioned operations,  where the challenge is to obtain a set of models satisfied both accurate and diversity. Each of these challenges leads to an NP-hard combinatorial optimization problem, which is impossible to be solved using the traditional methods since it is non-differentiable. Evolutionary algorithm (EA), as an efficient metaheuristic stochastic search technique, which is highly competent to solve complex combinatorial optimization problems having mixed types of decision variables, may provide an effective way to address the challenges arising from MTS prediction. The main contributions are supported by the following investigations. First, we propose a discrete evolutionary model, which mainly focuses on seeking the influential subset of channels of MTS and the optimal time windows for each of the selected channels for the MTS prediction task. A comprehensively experimental study on a real-world electricity consumption data with auxiliary environmental factors demonstrates the efficiency and effectiveness of the proposed method in searching for the informative time series and respective time windows and parameters in a predictor in comparison to the result obtained through enumeration. Subsequently, we define the basic MTS prediction pipeline containing channel selection, feature extraction, and prediction (configuration). To perform these key operations, we propose an evolutionary model construction (EMC) framework to seek the optimal subset of channels of MTS, suitable feature extraction methods and respective time windows applied to the selected channels, and parameter settings in the predictor simultaneously for the best prediction performance. To implement EMC, a two-step EA is proposed, where the first step EA mainly focuses on channel selection while in the second step, a specially designed EA works on feature extraction and prediction (configuration). A real-world electricity data with exogenous environmental information is used and the whole dataset is split into another two datasets according to holiday and nonholiday events. The performance of EMC is demonstrated on all three datasets in comparison to hybrid models and some existing methods. Then, based on the prediction pipeline defined previously, we propose an evolutionary multi-objective ensemble learning model (EMOEL) by employing multi-objective evolutionary algorithm (MOEA) subjected to two conflicting objectives, i.e., accuracy and model diversity. MOEA leads to a pareto front (PF) composed of non-dominated optimal solutions, where each of them represents the optimal subset of the selected channels, the selected feature extraction methods and the selected time windows, and the selected parameters in the predictor. To boost ultimate prediction accuracy, the models with respect to these optimal solutions are linearly combined with combination coefficients being optimized via a single-objective task-oriented EA. The superiority of EMOEL is identified on electricity consumption data with climate information in comparison to several state-of-the-art models. We also propose a multi-resolution selective ensemble learning model, where multiple resolutions are constructed from the minimal granularity using statistics. At the current time stamp, the preceding time series data is sampled at different time intervals (i.e., resolutions) to constitute the time windows. For each resolution, multiple base learners with different parameters are first trained. Feature selection technique is applied to search for the optimal set of trained base learners and least square regression is used to combine them. The performance of the proposed ensemble model is verified on the electricity consumption data for the next-step and next-day prediction. Finally, based on EMOEL and multi-resolution, instead of only combining the models generated from each PF, we propose an evolutionary ensemble learning (EEL) framework, where multiple PFs are aggregated to produce a composite PF (CPF) after removing the same solutions in PFs and being sorted into different levels of non-dominated fronts (NDFs). Feature selection techniques are applied to exploit the optimal subset of models in level-accumulated NDF and least square is used to combine the selected models. The performance of EEL that chooses three different predictors as base learners is evaluated by the comprehensive analysis of the parameter sensitivity. The superiority of EEL is demonstrated in comparison to the best result from single-objective EA and the best individual from the PF, and several state-of-the-art models across electricity consumption and air quality datasets, both of which use the environmental factors from other domains as the auxiliary factors. In summary, this thesis provides studies on how to build efficient and effective models for MTS prediction. The built frameworks investigate the influential factors, consider the pipeline composed of channel selection, feature extraction, and prediction (configuration) simultaneously, and keep good generalization and accuracy across different applications. The proposed algorithms to implement the frameworks use techniques from evolutionary computation (single-objective EA and MOEA), machine learning and data mining areas. We believe that this research provides a significant step towards constructing robust and accurate models for solving MTS prediction problems. In addition, with the case study on electricity consumption prediction, it will contribute to helping decision-makers in determining the trend of future energy consumption for scheduling and planning of the operations of the energy supply system

    Forecasting methods in energy planning models

    Get PDF
    Energy planning models (EPMs) play an indispensable role in policy formulation and energy sector development. The forecasting of energy demand and supply is at the heart of an EPM. Different forecasting methods, from statistical to machine learning have been applied in the past. The selection of a forecasting method is mostly based on data availability and the objectives of the tool and planning exercise. We present a systematic and critical review of forecasting methods used in 483 EPMs. The methods were analyzed for forecasting accuracy; applicability for temporal and spatial predictions; and relevance to planning and policy objectives. Fifty different forecasting methods have been identified. Artificial neural network (ANN) is the most widely used method, which is applied in 40% of the reviewed EPMs. The other popular methods, in descending order, are: support vector machine (SVM), autoregressive integrated moving average (ARIMA), fuzzy logic (FL), linear regression (LR), genetic algorithm (GA), particle swarm optimization (PSO), grey prediction (GM) and autoregressive moving average (ARMA). In terms of accuracy, computational intelligence (CI) methods demonstrate better performance than that of the statistical ones, in particular for parameters with greater variability in the source data. However, hybrid methods yield better accuracy than that of the stand-alone ones. Statistical methods are useful for only short and medium range, while CI methods are preferable for all temporal forecasting ranges (short, medium and long). Based on objective, most EPMs focused on energy demand and load forecasting. In terms geographical coverage, the highest number of EPMs were developed on China. However, collectively, more models were established for the developed countries than the developing ones. Findings would benefit researchers and professionals in gaining an appreciation of the forecasting methods, and enable them to select appropriate method(s) to meet their needs

    Multivariate study of vehicle exhaust particles using machine learning and statistical techniques

    Get PDF
    This research has examined the application of machine learning and statistical methods for developing roadside particle (number/mass concentrations) prediction models that can be used for air quality management. Data collected from continuous monitoring stations including pollutants, traffic and meteorological variables were used for training the models. A hybrid feature selection method involving Genetic Algorithms and Random Forests was successfully used in selecting the most relevant predictor variables for the models from the variables selected based on their correlation with the PM10_{10}, PM2.5_{2.5} and PNC concentrations. The study found that the hybrid feature selection can be used with both statistical and machine learning methods to produce less expensive and more efficient air quality prediction models. Among the machine learning models studied the Boosted Regression Trees (BRT), Random Forests (RF), Extreme Learning Machines (ELM) and Deep Learning Algorithms were found to be the most suitable for the predictions of roadside PM10_{10}, PM2.5_{2.5}, and PNC concentrations. The machine learning models performed better than the ADMS-road model in spatiotemporal predictions involving monitoring sites locations. Moreover, they performed much better in predicting the concentrations in street Canyons. The ANN and BRT were found to be suitable for air quality management applications involving traffic management scenarios

    Big Data Analysis application in the renewable energy market: wind power

    Get PDF
    Entre as enerxías renovables, a enerxía eólica e unha das tecnoloxías mundiais de rápido crecemento. Non obstante, esta incerteza debería minimizarse para programar e xestionar mellor os activos de xeración tradicionais para compensar a falta de electricidade nas redes electricas. A aparición de técnicas baseadas en datos ou aprendizaxe automática deu a capacidade de proporcionar predicións espaciais e temporais de alta resolución da velocidade e potencia do vento. Neste traballo desenvólvense tres modelos diferentes de ANN, abordando tres grandes problemas na predición de series de datos con esta técnica: garantía de calidade de datos e imputación de datos non válidos, asignación de hiperparámetros e selección de funcións. Os modelos desenvolvidos baséanse en técnicas de agrupación, optimización e procesamento de sinais para proporcionar predicións de velocidade e potencia do vento a curto e medio prazo (de minutos a horas)

    Automatic analysis and classification of cardiac acoustic signals for long term monitoring

    Get PDF
    Objective: Cardiovascular diseases are the leading cause of death worldwide resulting in over 17.9 million deaths each year. Most of these diseases are preventable and treatable, but their progression and outcomes are significantly more positive with early-stage diagnosis and proper disease management. Among the approaches available to assist with the task of early-stage diagnosis and management of cardiac conditions, automatic analysis of auscultatory recordings is one of the most promising ones, since it could be particularly suitable for ambulatory/wearable monitoring. Thus, proper investigation of abnormalities present in cardiac acoustic signals can provide vital clinical information to assist long term monitoring. Cardiac acoustic signals, however, are very susceptible to noise and artifacts, and their characteristics vary largely with the recording conditions which makes the analysis challenging. Additionally, there are challenges in the steps used for automatic analysis and classification of cardiac acoustic signals. Broadly, these steps are the segmentation, feature extraction and subsequent classification of recorded signals using selected features. This thesis presents approaches using novel features with the aim to assist the automatic early-stage detection of cardiovascular diseases with improved performance, using cardiac acoustic signals collected in real-world conditions. Methods: Cardiac auscultatory recordings were studied to identify potential features to help in the classification of recordings from subjects with and without cardiac diseases. The diseases considered in this study for the identification of the symptoms and characteristics are the valvular heart diseases due to stenosis and regurgitation, atrial fibrillation, and splitting of fundamental heart sounds leading to additional lub/dub sounds in the systole or diastole interval of a cardiac cycle. The localisation of cardiac sounds of interest was performed using an adaptive wavelet-based filtering in combination with the Shannon energy envelope and prior information of fundamental heart sounds. This is a prerequisite step for the feature extraction and subsequent classification of recordings, leading to a more precise diagnosis. Localised segments of S1 and S2 sounds, and artifacts, were used to extract a set of perceptual and statistical features using wavelet transform, homomorphic filtering, Hilbert transform and mel-scale filtering, which were then fed to train an ensemble classifier to interpret S1 and S2 sounds. Once sound peaks of interest were identified, features extracted from these peaks, together with the features used for the identification of S1 and S2 sounds, were used to develop an algorithm to classify recorded signals. Overall, 99 features were extracted and statistically analysed using neighborhood component analysis (NCA) to identify the features which showed the greatest ability in classifying recordings. Selected features were then fed to train an ensemble classifier to classify abnormal recordings, and hyperparameters were optimized to evaluate the performance of the trained classifier. Thus, a machine learning-based approach for the automatic identification and classification of S1 and S2, and normal and abnormal recordings, in real-world noisy recordings using a novel feature set is presented. The validity of the proposed algorithm was tested using acoustic signals recorded in real-world, non-controlled environments at four auscultation sites (aortic valve, tricuspid valve, mitral valve, and pulmonary valve), from the subjects with and without cardiac diseases; together with recordings from the three large public databases. The performance metrics of the methodology in relation to classification accuracy (CA), sensitivity (SE), precision (P+), and F1 score, were evaluated. Results: This thesis proposes four different algorithms to automatically classify fundamental heart sounds – S1 and S2; normal fundamental sounds and abnormal additional lub/dub sounds recordings; normal and abnormal recordings; and recordings with heart valve disorders, namely the mitral stenosis (MS), mitral regurgitation (MR), mitral valve prolapse (MVP), aortic stenosis (AS) and murmurs, using cardiac acoustic signals. The results obtained from these algorithms were as follows: • The algorithm to classify S1 and S2 sounds achieved an average SE of 91.59% and 89.78%, and F1 score of 90.65% and 89.42%, in classifying S1 and S2, respectively. 87 features were extracted and statistically studied to identify the top 14 features which showed the best capabilities in classifying S1 and S2, and artifacts. The analysis showed that the most relevant features were those extracted using Maximum Overlap Discrete Wavelet Transform (MODWT) and Hilbert transform. • The algorithm to classify normal fundamental heart sounds and abnormal additional lub/dub sounds in the systole or diastole intervals of a cardiac cycle, achieved an average SE of 89.15%, P+ of 89.71%, F1 of 89.41%, and CA of 95.11% using the test dataset from the PASCAL database. The top 10 features that achieved the highest weights in classifying these recordings were also identified. • Normal and abnormal classification of recordings using the proposed algorithm achieved a mean CA of 94.172%, and SE of 92.38%, in classifying recordings from the different databases. Among the top 10 acoustic features identified, the deterministic energy of the sound peaks of interest and the instantaneous frequency extracted using the Hilbert Huang-transform, achieved the highest weights. • The machine learning-based approach proposed to classify recordings of heart valve disorders (AS, MS, MR, and MVP) achieved an average CA of 98.26% and SE of 95.83%. 99 acoustic features were extracted and their abilities to differentiate these abnormalities were examined using weights obtained from the neighborhood component analysis (NCA). The top 10 features which showed the greatest abilities in classifying these abnormalities using recordings from the different databases were also identified. The achieved results demonstrate the ability of the algorithms to automatically identify and classify cardiac sounds. This work provides the basis for measurements of many useful clinical attributes of cardiac acoustic signals and can potentially help in monitoring the overall cardiac health for longer duration. The work presented in this thesis is the first-of-its-kind to validate the results using both, normal and pathological cardiac acoustic signals, recorded for a long continuous duration of 5 minutes at four different auscultation sites in non-controlled real-world conditions.Open Acces

    Decarbonization cost of Bangladesh's energy sector: Influence of corruption

    Get PDF
    As a rapidly developing lower-middle income country, Bangladesh has been maintaining a steady growth of +5% in the gross domestic product (GDP) annually since 2004, eventually reaching 7.1% in 2016. The country is targeting to become uppermiddle- income and developed by 2021 and 2041 respectively, which translates to an annual GDP growth rate of 7.58% during this period. The bulk of this growth is expected to come from the manufacturing sector, the significant shift towards which started at the turn of this century. Energy intensity of manufacturing-based growth is higher, the evidence of which can be seen in the 3.17 times increase in national energy consumption between 2001 and 2014. Also, Bangladesh aims to achieve 100% electrification rate by 2021 against an annual population growth rate of 1.08%. With the increasing per capita income, there is now a growing middle class fuelling the growth in demand for convenient forms of energy. Considering the above drivers, the Bangladesh 2050 Pathways Model suggested 35 times higher energy demand than that of 2010 by 2050. The government and private sector have started a substantial amount of investments in the energy sector to meet the signi ficant future demand. Approximately US104billionwouldbeinvestedinthepowersectorofBangladeshforestablishing33GWinstalledcapacityby2030,themajorityofwhichwouldbefinancedbynationalandinternationalloans.However,Bangladeshisoneofthemostcorruptedcountryintheworldwhichmayinfluencetheenergyplanningdevelopment.ThecurrentpoliciesofBangladeshpowersectorpavedthefuturedirectiontowardspredominantlycoal−basedenergymixwhichwouldaugmentthegreenhousegas(GHG)emissionsfivetimes(117.5MtCO2e)in2030thanthatof2010.ByincreasingGHGemissions,thecountrywouldunderminetheworldwideeffortofkeepingglobaltemperaturerisein21stcenturybelow2°C,aspertheParisagreementandCOP21.VTheobjectiveofthisresearchwastodevelopaframeworktoexplorethecostofdecarbonizingtheBangladesh′senergysectorby2050.Forthestudy,sixemissionsscenariosbusinessasusual(BAU),currentpolicy(CPS),high−carbon(HCS),medium−carbon(MCS),low−carbon(LCS)andzero−carbonscenarios(ZCS),andthreeeconomicconditionshigh,averageandlowcostwereconsidered.Thecombinationofemissionsandeconomicscenariosrendered18differentemissionseconomicscenariosfortheresearch.TheresultsshowedthatBangladeshwouldemit343MtCO2eby2050withoutanyemissionsreductionstrategiesunderHCS.However,Bangladeshcanreduce23ofHCSbyadoptingdecarbonizationstrategiessuchasenergymixchangetowardsrenewableandnuclear.Ontheoptimisticside,theemissionscanbereduced73by2050underZCSthanthatofHCS.ThestudydemonstratedthatazerocarbonfutureisnotyetfeasibleforBangladeshby2050becausetheoperationalfossilfuelbasedplantswouldbeoperational.Therefore,theGHGemissionsaregoingtoriseevenifBangladeshadoptsrenewablesandnucleardominatingenergymix.However,itwillbepossibletokeeptheGHGemissionsapproximately2tCO2e/capitathresholdifthecountryadoptsLCS.Ontheotherhand,onlyMCSandLCScanmeettheprojectedenergydemandby2050.TheenergysectorcanmeettheprojecteddemandunderZCSonlyiftheelectricityconsumptionisreduced262050.Intermstotalcost,theMCSwasfoundtobe3.9LCSby2050.LCSwouldhaveahighercostthanthatofMCSupto2030,duetothehighcapitalcostofrenewabletechnologies.ThetotalcostunderLCSwouldstarttobelowerthanofMCSafter2035forthefossilfuelcost.Accumulatedfuelcostwouldreach104 billion would be invested in the power sector of Bangladesh for establishing 33 GW installed capacity by 2030, the majority of which would be financed by national and international loans. However, Bangladesh is one of the most corrupted country in the world which may influence the energy planning development. The current policies of Bangladesh power sector paved the future direction towards predominantly coal-based energy mix which would augment the greenhouse gas (GHG) emissions five times (117.5 MtCO2e) in 2030 than that of 2010. By increasing GHG emissions, the country would undermine the worldwide effort of keeping global temperature rise in 21st century below 2°C, as per the Paris agreement and COP21. V The objective of this research was to develop a framework to explore the cost of decarbonizing the Bangladesh's energy sector by 2050. For the study, six emissions scenarios business as usual (BAU), current policy (CPS), high-carbon (HCS), medium-carbon (MCS), low-carbon (LCS) and zero-carbon scenarios (ZCS), and three economic conditions high, average and low costwere considered. The combination of emissions and economic scenarios rendered 18 different emissionseconomic scenarios for the research. The results showed that Bangladesh would emit 343 MtCO2e by 2050 without any emissions reduction strategies under HCS. However, Bangladesh can reduce 23% GHG emissions by 2050 under LCS than that of HCS by adopting decarbonization strategies such as energy mix change towards renewable and nuclear. On the optimistic side, the emissions can be reduced 73% by 2050 under ZCS than that of HCS. The study demonstrated that a zero carbon future is not yet feasible for Bangladesh by 2050 because the operational fossil fuel based plants would be operational. Therefore, the GHG emissions are going to rise even if Bangladesh adopts renewables and nuclear dominating energy mix. However, it will be possible to keep the GHG emissions approximately 2 tCO2e/capita threshold if the country adopts LCS. On the other hand, only MCS and LCS can meet the projected energy demand by 2050. The energy sector can meet the projected demand under ZCS only if the electricity consumption is reduced 26% by 2050. In terms total cost, the MCS was found to be 3.9% expensive than that of LCS by 2050. LCS would have a higher cost than that of MCS up to 2030, due to the high capital cost of renewable technologies. The total cost under LCS would start to be lower than of MCS after 2035 for the fossil fuel cost. Accumulated fuel cost would reach 250 billion in 2050 under HCS, which can be reduced 23% under ZCS. The cost of decarbonization would be 3.6, 3.4 and 3.2 times under average cost of MCS, LCS, and ZCS, than that of HCS. As the energy sector of Bangladesh is under rapid development, the accumulated capital would be comparatively high by 2050. However, fuel cost can be significantly reduced under LCS and ZCS which would also ensure lower emissions. The study suggested that energy mix change, technological maturity, corruption and demand reduction can influence the cost of decarbonization. However, the most significant influencer for the decarbonization of Bangladeshi energy sector would be the corruption. Results showed that if Bangladesh can minimize the effect of corruption on the energy sector, it can reduce the cost of decarbonization 45-77% by 2050 under MCS, LCS, and ZCS

    A review of artificial neural network models for ambient air pollution prediction

    Get PDF
    Research activity in the field of air pollution forecasting using artificial neural networks (ANNs) has increased dramatically in recent years. However, the development of ANN models entails levels of uncertainty given the black-box nature of ANNs. In this paper, a protocol by Maier et al. (2010) for ANN model development is presented and applied to assess journal papers dealing with air pollution forecasting using ANN models. The majority of the reviewed works are aimed at the long-term forecasting of outdoor PM10, PM2.5, and oxides of nitrogen, and ozone. The vast majority of the identified works utilised meteorological and source emissions predictors almost exclusively. Furthermore, ad-hoc approaches are found to be predominantly used for determining optimal model predictors, appropriate data subsets and the optimal model structure. Multilayer perceptron and ensemble-type models are predominantly implemented. Overall, the findings highlight the need for developing systematic protocols for developing powerful ANN models

    Deep Learning-Based Machinery Fault Diagnostics

    Get PDF
    This book offers a compilation for experts, scholars, and researchers to present the most recent advancements, from theoretical methods to the applications of sophisticated fault diagnosis techniques. The deep learning methods for analyzing and testing complex mechanical systems are of particular interest. Special attention is given to the representation and analysis of system information, operating condition monitoring, the establishment of technical standards, and scientific support of machinery fault diagnosis

    Spatiotemporal and temporal forecasting of ambient air pollution levels through data-intensive hybrid artificial neural network models

    Get PDF
    Outdoor air pollution (AP) is a serious public threat which has been linked to severe respiratory and cardiovascular illnesses, and premature deaths especially among those residing in highly urbanised cities. As such, there is a need to develop early-warning and risk management tools to alleviate its effects. The main objective of this research is to develop AP forecasting models based on Artificial Neural Networks (ANNs) according to an identified model-building protocol from existing related works. Plain, hybrid and ensemble ANN model architectures were developed to estimate the temporal and spatiotemporal variability of hourly NO2 levels in several locations in the Greater London area. Wavelet decomposition was integrated with Multilayer Perceptron (MLP) and Long Short-term Memory (LSTM) models to address the issue of high variability of AP data and improve the estimation of peak AP levels. Block-splitting and crossvalidation procedures have been adapted to validate the models based on Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Willmott’s index of agreement (IA). The results of the proposed models present better performance than those from the benchmark models. For instance, the proposed wavelet-based hybrid approach provided 39.15% and 28.58% reductions in RMSE and MAE indices, respectively, on the performance of the benchmark MLP model results for the temporal forecasting of NO2 levels. The same approach reduced the RMSE and MAE indices of the benchmark LSTM model results by 12.45% and 20.08%, respectively, for the spatiotemporal estimation of NO2 levels in one site at Central London. The proposed hybrid deep learning approach offers great potential to be operational in providing air pollution forecasts in areas without a reliable database. The model-building protocol adapted in this thesis can also be applied to studies using measurements from other sites.Outdoor air pollution (AP) is a serious public threat which has been linked to severe respiratory and cardiovascular illnesses, and premature deaths especially among those residing in highly urbanised cities. As such, there is a need to develop early-warning and risk management tools to alleviate its effects. The main objective of this research is to develop AP forecasting models based on Artificial Neural Networks (ANNs) according to an identified model-building protocol from existing related works. Plain, hybrid and ensemble ANN model architectures were developed to estimate the temporal and spatiotemporal variability of hourly NO2 levels in several locations in the Greater London area. Wavelet decomposition was integrated with Multilayer Perceptron (MLP) and Long Short-term Memory (LSTM) models to address the issue of high variability of AP data and improve the estimation of peak AP levels. Block-splitting and crossvalidation procedures have been adapted to validate the models based on Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Willmott’s index of agreement (IA). The results of the proposed models present better performance than those from the benchmark models. For instance, the proposed wavelet-based hybrid approach provided 39.15% and 28.58% reductions in RMSE and MAE indices, respectively, on the performance of the benchmark MLP model results for the temporal forecasting of NO2 levels. The same approach reduced the RMSE and MAE indices of the benchmark LSTM model results by 12.45% and 20.08%, respectively, for the spatiotemporal estimation of NO2 levels in one site at Central London. The proposed hybrid deep learning approach offers great potential to be operational in providing air pollution forecasts in areas without a reliable database. The model-building protocol adapted in this thesis can also be applied to studies using measurements from other sites
    corecore