9 research outputs found

    The Forecastability of Underlying Building Electricity Demand from Time Series Data

    Full text link
    Forecasting building energy consumption has become a promising solution in Building Energy Management Systems for energy saving and optimization. Furthermore, it can play an important role in the efficient management of the operation of a smart grid. Different data-driven approaches to forecast the future energy demand of buildings at different scale, and over various time horizons, can be found in the scientific literature, including extensive Machine Learning and Deep Learning approaches. However, the identification of the most accurate forecaster model which can be utilized to predict the energy demand of such a building is still challenging.In this paper, the design and implementation of a data-driven approach to predict how forecastable the future energy demand of a building is, without first utilizing a data-driven forecasting model, is presented. The investigation utilizes a historical electricity consumption time series data set with a half-hour interval that has been collected from a group of residential buildings located in the City of London, United Kingdo

    LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity – Application to the Tox21 and Mutagenicity Datasets

    Get PDF
    Machine learning algorithms have attained widespread use in assessing the potential toxicities of pharmaceuticals and industrial chemicals because of their faster-speed and lower-cost compared to experimental bioassays. Gradient boosting is an effective algorithm that often achieves high predictivity, but historically the relative long computational time limited its applications in predicting large compound libraries or developing in silico predictive models that require frequent retraining. LightGBM, a recent improvement of the gradient boosting algorithm inherited its high predictivity but resolved its scalability and long computational time by adopting leaf-wise tree growth strategy and introducing novel techniques. In this study, we compared the predictive performance and the computational time of LightGBM to deep neural networks, random forests, support vector machines, and XGBoost. All algorithms were rigorously evaluated on publicly available Tox21 and mutagenicity datasets using a Bayesian optimization integrated nested 10-fold cross-validation scheme that performs hyperparameter optimization while examining model generalizability and transferability to new data. The evaluation results demonstrated that LightGBM is an effective and highly scalable algorithm offering the best predictive performance while consuming significantly shorter computational time than the other investigated algorithms across all Tox21 and mutagenicity datasets. We recommend LightGBM for applications in in silico safety assessment and also in other areas of cheminformatics to fulfill the ever-growing demand for accurate and rapid prediction of various toxicity or activity related endpoints of large compound libraries present in the pharmaceutical and chemical industry

    Light Gradient Boosting with Hyper Parameter Tuning Optimization for COVID-19 Prediction

    Get PDF
    The 2019 coronavirus disease (COVID-19) caused pandemic and a huge number of deaths in the world. COVID-19 screening is needed to identify suspected positive COVID-19 or not and it can reduce the spread of COVID-19. The polymerase chain reaction (PCR) test for COVID-19 is a test that analyzes the respiratory specimen. The blood test also can be used to show people who have been infected with SARS-CoV-2. In addition, age parameters also contribute to the susceptibility of COVID-19 transmission. This paper presents the extra trees classification with random over-sampling by considering blood and age parameters for COVID-19 screening. This research proposes enhanced preprocessing data by using KNN Imputer to handle large missing values. The experiments evaluated the existing classification methods such as Random Forest, Extra Trees, Ada Boost, Gradient Boosting, and the proposed Light Gradient Boosting with hyperparameter tuning to measure the predictions of patients infected with SARS-CoV-2. The experiments used Albert Einstein Hospital test data in Brazil that consisted of 5,644 sample data from 559 patients with infected SARS-CoV-2. The experimental results show that the proposed scheme achieves an accuracy of about 98,58%, recall of 98,58%, the precision of 98,61%, F1-Score of 98,61%, and AUC of 0,9682

    Deep Convolutional Neural Network Ensembles using ECOC

    Full text link
    Deep neural networks have enhanced the performance of decision making systems in many applications including image understanding, and further gains can be achieved by constructing ensembles. However, designing an ensemble of deep networks is often not very beneficial since the time needed to train the networks is very high or the performance gain obtained is not very significant. In this paper, we analyse error correcting output coding (ECOC) framework to be used as an ensemble technique for deep networks and propose different design strategies to address the accuracy-complexity trade-off. We carry out an extensive comparative study between the introduced ECOC designs and the state-of-the-art ensemble techniques such as ensemble averaging and gradient boosting decision trees. Furthermore, we propose a combinatory technique which is shown to achieve the highest classification performance amongst all.Comment: 13 pages double column IEEE transactions styl

    Hybrid forecasting method for wind power integrating spatial correlation and corrected numerical weather prediction

    Get PDF
    Wind power generation rapidly grows worldwide with declining costs and the pursuit of decarbonised energy systems. However, the utilization of wind energy remains challenging due to its strong stochastic nature. Accurate wind power forecasting is one of the effective ways to address this problem. Meteorological data are generally regarded as critical inputs for wind power forecasting. However, the direct use of numerical weather prediction in forecasting may not provide a high degree of accuracy due to unavoidable uncertainties, particularly for areas with complex topography. This study proposes a hybrid short-term wind power forecasting method, which integrates the corrected numerical weather prediction and spatial correlation into a Gaussian process. First, the Gaussian process model is built using the optimal combination of different kernel functions. Then, a correction model for the wind speed is designed by using an automatic relevance determination algorithm to correct the errors in the primary numerical weather prediction. Moreover, the spatial correlation of wind speed series between neighbouring wind farms is extracted to complement the input data. Finally, the modified numerical weather prediction and spatial correlation are incorporated into the hybrid model to enable reliable forecasting. The actual data in East China are used to demonstrate its performance. In comparison with the basic Gaussian process, in different seasons, the forecasting accuracy is improved by 7.02%–29.7% by using additional corrected numerical weather prediction, by 0.65–10.23% after integrating with the spatial correlation, and by 10.88–37.49% through using the proposed hybrid method.</p

    Previsão do valor Brix: aplicação de algoritmos de Machine Learning

    Get PDF
    Mestrado Bolonha em Métodos Quantitativos para a Decisão Económica e EmpresarialO consumo sustentável é um tema cada vez mais debatido na atualidade. Com o aumento da população mundial e a diminuição de recursos naturais, é necessário aplicar técnicas que conduzam a uma produção controlada combatendo assim o desperdício, pelo que a previsão da qualidade de produtos agrícolas é um tópico crucial na tomada de decisão. As áreas de Machine Learning e de Remote Sensing têm contribuído significativamente para responder a estas dificuldades, na medida em que o tempo de processamento desde a recolha de dados à previsão dos mesmos é relativamente curto. Desta forma, o principal propósito deste trabalho é estudar o potencial das imagens Sentinel-2, em parceria com a empresa Forging Lab, para a análise e previsão da qualidade de produtos agrícolas, pelo valor Brix, para que, posteriormente, se possam mitigar os riscos de perda e consequentemente aumentar os lucros. Ao longo do estudo utilizam-se várias abordagens de Machine Learning do ramo da aprendizagem supervisionada, nomeadamente, Regressão Linear (OLS), Support Vector Regression, Redes Neuronais, Random Forest e LightGBM. Na comparação dos resultados de previsão obtidos pelas várias abordagens em estudo, verifica-se que os modelos em que se aplicou o algoritmo Random Forest geram maior precisão e menores erros de previsão. O melhor modelo, do algoritmo Random Forest, apresentou um coeficiente de determinação de 87,87%, com erro absoluto médio de 0,2985 e erro quadrático médio de 0,2741.Sustainable consumption is an increasingly debated topic these days. Due to the current increase in population and the depletion of natural resources, it is urgent that we implement production control techniques so that, in turn, we can effectively combat waste. Thus, agricultural product quality prediction is a crucial topic in decision-making. Machine Learning and Remote Sensing have played a significant role in response to these challenges, as the processing time from data collection to data prediction is relatively short. Bearing this in mind, with this thesis we aim to study, in partnership with the company Forging Lab, the potential of Sentinel-2 images in agricultural product quality analysis and prediction, according to the Brix value, so that the risk of loss can be later mitigated and profit can, consequently, increase. Several approaches in the Machine Learning field are used in this research, namely Linear Regression (OLS), Support Vector Regression, Neural Networks, Random Forest, and LightGBM. When we compare the predicted results obtained by the approaches used in this study, we verify that the models in which the Random Forest algorithm was used generate higher accuracy and smaller forecast errors. The best Random Forest algorithm model presented a coefficient of determination of 87,87%, with a mean absolute error of 0,2985 and a mean square error of 0,2741.info:eu-repo/semantics/publishedVersio

    Advanced Wide-Area Monitoring System Design, Implementation, and Application

    Get PDF
    Wide-area monitoring systems (WAMSs) provide an unprecedented way to collect, store and analyze ultra-high-resolution synchrophasor measurements to improve the dynamic observability in power grids. This dissertation focuses on designing and implementing a wide-area monitoring system and a series of applications to assist grid operators with various functionalities. The contributions of this dissertation are below: First, a synchrophasor data collection system is developed to collect, store, and forward GPS-synchronized, high-resolution, rich-type, and massive-volume synchrophasor data. a distributed data storage system is developed to store the synchrophasor data. A memory-based cache system is discussed to improve the efficiency of real-time situation awareness. In addition, a synchronization system is developed to synchronize the configurations among the cloud nodes. Reliability and Fault-Tolerance of the developed system are discussed. Second, a novel lossy synchrophasor data compression approach is proposed. This section first introduces the synchrophasor data compression problem, then proposes a methodology for lossy data compression, and finally presents the evaluation results. The feasibility of the proposed approach is discussed. Third, a novel intelligent system, SynchroService, is developed to provide critical functionalities for a synchrophasor system. Functionalities including data query, event query, device management, and system authentication are discussed. Finally, the resiliency and the security of the developed system are evaluated. Fourth, a series of synchrophasor-based applications are developed to utilize the high-resolution synchrophasor data to assist power system engineers to monitor the performance of the grid as well as investigate the root cause of large power system disturbances. Lastly, a deep learning-based event detection and verification system is developed to provide accurate event detection functionality. This section introduces the data preprocessing, model design, and performance evaluation. Lastly, the implementation of the developed system is discussed

    Predicting the impact of academic articles on marketing research: Using machine learning to predict highly cited marketing articles

    Get PDF
    The citation count of an academic article is of great importance to researchers and readers. Due to the large increase in the publication of academic articles every year, it may be difficult to recognize the articles which are important to the field. This thesis collected data from Scopus with the purpose to analyze how paper, journal, and author related variables performed as drivers of article impact in the marketing field, and how well they could predict highly cited articles five years ahead in time. Social network analysis was used to find centrality metrics, and citation count one year after publication was included as the only time dependent variable. Our results found that citations after one year is a strong driver and predictor for future citations after five years. The analysis of the co-authorship network showed that closeness centrality and betweenness centrality are drivers of future citations in the marketing field, indicating that being close to the core of the network and having brokerage power is important in the field. With the use of machine learning methods, we found that a combination of paper, journal, and author related drivers perform better at predicting highly cited articles after five years, compared to using only one type of driver.nhhma

    Advanced forecasting algorithms for renewable power systems

    Get PDF
    1 online resource (x, 112 pages) : illustrations (some colour), charts (some colour), graphs (some colour)Includes abstract.Includes bibliographical references (pages 100-112).Wind and solar power prediction is a challenging but important area of research. The thesis you described explores various statistical models and deep learning methods to improve the accuracy of wind speed and solar radiation predictions. The use of autoregressive integrated moving average (ARIMA) models, long short-term memory (LSTM) based recurrent neural network (RNN) models, and multilayer perceptron (MLP) neural networks were studied to predict future wind speed values and the performance of a photovoltaic (PV) system. The results showed that the proposed models can effectively improve the accuracy of wind speed and solar radiation prediction and that the LSTM network outperformed the MLP network in predicting solar radiation and energy for different time periods. It is important to note that the performance of the models may vary depending on the specific dataset used, the hyperparameters, and the model architecture. Therefore, it is essential to carefully tune these parameters to achieve the best possible performance. Accurately predicting the performance of a PV system at short time intervals is particularly important in the context of renewable energy sources, as it can help optimize the usage of these resources and improve overall efficiency. This research can contribute to the development of more accurate and reliable prediction models, which can lead to more efficient use of wind and solar power, reduce costs, and promote the adoption of renewable energy sources
    corecore