661,245 research outputs found

    Transparent Forecasting Strategies in Database Management Systems

    Get PDF
    Whereas traditional data warehouse systems assume that data is complete or has been carefully preprocessed, increasingly more data is imprecise, incomplete, and inconsistent. This is especially true in the context of big data, where massive amount of data arrives continuously in real-time from vast data sources. Nevertheless, modern data analysis involves sophisticated statistical algorithm that go well beyond traditional BI and, additionally, is increasingly performed by non-expert users. Both trends require transparent data mining techniques that efficiently handle missing data and present a complete view of the database to the user. Time series forecasting estimates future, not yet available, data of a time series and represents one way of dealing with missing data. Moreover, it enables queries that retrieve a view of the database at any point in time - past, present, and future. This article presents an overview of forecasting techniques in database management systems. After discussing possible application areas for time series forecasting, we give a short mathematical background of the main forecasting concepts. We then outline various general strategies of integrating time series forecasting inside a database and discuss some individual techniques from the database community. We conclude this article by introducing a novel forecasting-enabled database management architecture that natively and transparently integrates forecast models

    A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring

    Get PDF
    ArticleIn recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical ‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework

    Evolving water point mapping to strategic decision making in rural Malawi

    Get PDF
    There is a need to evolve from the simple mapping of water points, now often numerous, to effective decision making using these data. This paper outlines new developments of mWater as the preferred online Management Information System (MIS) tool to analyse significant volumes of water and sanitation data in Malawi. mWater exemplifies an evolving strategic decision-making tool used to formulate rural water supply investment strategies. A time series of 25,000 water points have been mapped since 2011 to build a complete asset register of water infrastructure to support government endeavours to reach Sustainable Development Goal 6. This comprehensive live database allows real-time analysis of over sixty variables, including linkage to concurrent mWater sanitation and waste data. This paper briefly illustrates several emergent uses of the facility to exemplify its potential in strategic decision making using Big Data. It is currently being rolled out across the entire country

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Increasing resilience to natural hazards through crowd-sourcing in St. Vincent and the Grenadines

    Get PDF
    In this project we aim to demonstrate how volcanic environments exposed to multiple hazards tend to be characterised by a lack of relevant data available both in real time and over the longer term (e.g. months to years). This can be at least partially addressed by actively involving citizens, communities, scientists and other key stakeholders in the collection, analysis and sharing of observations, samples and measurements of changes in the environment. Such community monitoring and co-production of knowledge over time can also build trusting relationships and resilience (Stone et al. 2014). There are more than 100 institutions worldwide that monitor volcanoes and other natural hazards, contribute to early warning systems and are embedded in communities. They have a key role in building resilience alongside civil protection/emergency management agencies. In this report, we propose that such institutions are involved in big data initiatives and related research projects. In particular, we suggest that tools for crowd-sourcing may be of particular value. Citizen science, community monitoring and analysis of social media can build resilience by supporting: a) coordination and collaboration between scientists, authorities and citizens, b) decision-making by institutions and individuals, c) anticipation of natural hazards by monitoring institutions, authorities and citizens, d) capacity building of institutions and communities, and e) knowledge co-production. We propose a mobile phone app with a supporting website as an appropriate crowd-sourcing tool for St Vincent and the Grenadines. The monitoring institution is the key contact for users and leads on the required specifications based on local knowledge and experience. Remote support is provided from the UK on technical issues, research integration, data management, validation and evaluation. It is intended that the app facilitates building of long-term relationships between scientists, communities and authorities. Real-time contributions and analysis of social media support early warning, real-time awareness and real-time feedback enhancing the response of scientists and authorities. The app has potential to facilitate, for example, discussions on new or revised hazards maps, multiple hazard analysis and could contribute to real-time risk monitoring. Such an approach can be scaled up to facilitate regional use – and is transferable to other countries. Challenges of such an approach include data validation and quality assurance, redundancy in the system, motivating volunteers, managing expectations and ensuring safety. A combination of recruiting a core group of known and reliable users, training workshops, a code of conduct for users, identifying information influx thresholds beyond which external support might be needed, and continuing evaluation of both the data and the process will help to address these issues. The app is duplicated on the website in case mobile phone networks are down. Development of such approaches would fit well within research programmes on building resilience. Ideally such research should be interdisciplinary in acknowledgement of the diversity and complexity of topics that this embraces. There may be funding inequality between national monitoring institutions and international research institutions but these and other in-country institutions can help drive innovation and research if they are fully involved in problem-definition and research design. New innovations arising from increasing resolution (temporal and spatial) of EO products should lead to useful near-real time products from research and operational services. The app and website can ensure such diverse products from multiple sources are accessible to communities, scientists and authorities (as appropriate). Other innovations such as machine learning and data mining of time-series data collected by monitoring institutions may lead to new insights into physical processes which can support timely decision-making by scientists in particular (e.g. increasing alert levels)

    A Quantitative Analysis of Big Data Analytics Capabilities and Supply Chain Management

    Get PDF
    With the emergence of Big Data Technologies (BDT) and the growing application of Big Data Analytics (BDA), Supply Chain Management (SCM) researchers increasingly utilize BDA due to the opportunities from BDT and BDA present. Supply Chain (SC) data is inherently complex and results in an environment with high uncertainty, which presents a real challenge for SC decision-makers. This research study aimed to investigate and illustrate the application of BDA within the existing decision-making process. BDT allowed for the extraction and processing of SC data. BDA aided further understanding of SC inefficiencies and delivered valuable, actionable insights by validating the existence of the SC bullwhip phenomenon and its contributing factors. Furthermore, BDA enabled the pragmatic evaluation of linear and nonlinear regression SC relationships by applying machine learning techniques such as Principal Component Analysis (PCA) and multivariable regression analysis. Moreover, applying more sophisticated BDA time series and forecasting techniques such as Sarimax, Tbats, and neural networks improved forecasting accuracy. Ultimately, the improved demand planning and forecast accuracy will reduce SC uncertainty and the effects of the observed SC bullwhip phenomenon, thus creating a competitive advantage for all the members within the SC value chain

    Development of a National-Scale Big Data Analytics Pipeline to Study the Potential Impacts of Flooding on Critical Infrastructures and Communities

    Get PDF
    With the rapid development of the Internet of Things (IoT) and Big data infrastructure, crowdsourcing techniques have emerged to facilitate data processing and problem solving particularly for flood emergences purposes. A Flood Analytics Information System (FAIS) has been developed as a Python Web application to gather Big data from multiple servers and analyze flooding impacts during historical and real-time events. The application is smartly designed to integrate crowd intelligence, machine learning (ML), and natural language processing of tweets to provide flood warning with the aim to improve situational awareness for flood risk management and decision making. FAIS allows the user to submit search request from the United States Geological Survey (USGS) as well as Twitter through a series of queries, which is used to modify request URL sent to data sources. This national scale prototype combines flood peak rates and river level information with geotagged tweets to identify a dynamic set of at-risk locations to flooding. The list of prioritized areas can be updated every 15 minutes as the crowdsourced data and environmental information and condition change. In addition, FAIS uses Google Vision API (application programming interface) and image processing algorithms to detect objects (flood, road, vehicle, river, etc.) in time-lapse digital images and build valuable metadata into image catalog. The application performs Flood Frequency Analysis (FFA) and computes design flow values corresponding to specific return periods that can help engineers in designing safe structures and in protection against economic losses due to maintenance of civil infrastructure. FAIS is successfully tested in real-time during Hurricane Dorian flooding event across the Carolinas where the storm made extensive damage and disruption to critical infrastructure and the environment. The prototype is also verified during historical events such as Hurricanes Matthew and Florence flooding for the Lower PeeDee Basin in the Carolinas

    A study of multivariate behavior and anomaly patterns : tensor decomposition for multiway big data

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.A vast majority of the today’s information haul is through Cyber-Physical Systems (CPS). They represent the confluence of extensive data sets, tight time-constraints, latency issues and heterogeneous components. CPS architectures demand newer Big Data processing approaches. Typical applications span from the Internet-of-Things, across the World Wide Web to Smart Cities and Intelligent machines. A standard heterogeneous CPS installation, the Smart Energy Grid, is observed and the logistics are analyzed. The Smart Grid domain is weighed down by lack of unifying framework and systemic intelligence for autonomic management. Preliminary studies of the field under investigation shows how processing of Real-Time data, communication and control signaling is vital. Purely autonomic system governance is proven to be different from the contemporary definition. It takes the form of Interoperability (achieved through automation) instead of elemental Integration. That means autonomic (smart) management requires all elements to have fully controllable behavior. This dissertation teste the hypothesis of applying Tensor decompositions and Factorizations - a momentum-gaining arithmetic tool - to this problem. The aim is to validate the prospects of higher order Anomaly Pattern Processing to capture intelligence along multiple modes of data flow. Tensorial Data representation captures information flows in Big Data, while Multivariate Anomaly Detection performs tracking of the time-series behavioral changes. Together, they implement Autonomic management in CPS super-systems. Uniqueness of this approach is highlighted by the novel multi-modal data flow imaging and models. Requirements of traditional anomalous event definition and cataloging in Data streams are removed. Tensor algebra is then studied for the scope of implementation concerning features, significance, and interpretation in terms of multi-modal data. Standard Decomposition rules and their derivatives, literature analysis on contemporary applications of Tensor algebra, and its scope on prominent real-world data processing problems are studied. Finally, the decomposition tool for Multi-way analysis is inferred, and proposed methodology is derived. The Smart Grid Smart City Project commissioned by the Australian government is chosen as the data source investigated. The need for exhaustive examination of such repositories in the CPS Anomaly Detection context is also highlighted. Experimentation is done by applying Tensor Decomposition on the data set after normalization and pre-processing. Details of those phases, as well as the choice of coding platforms, the design of experimental frameworks, timelines estimated, and testing operations, are included in this work. The outcomes are the defined patterns extracted and their analysis-interpretation defended by proofs from actual events of the Project Trial phase

    A Time-Series Treatment Method to Obtain Electrical Consumption Patterns for Anomalies Detection Improvement in Electrical Consumption Profiles

    Full text link
    [EN] Electricity consumption patterns reveal energy demand behaviors and enable strategY implementation to increase efficiency using monitoring systems. However, incorrect patterns can be obtained when the time-series components of electricity demand are not considered. Hence, this research proposes a new method for handling time-series components that significantly improves the ability to obtain patterns and detect anomalies in electrical consumption profiles. Patterns are found using the proposed method and two widespread methods for handling the time-series components, in order to compare the results. Through this study, the conditions that electricity demand data must meet for making the time-series analysis useful are established. Finally, one year of real electricity consumption is analyzed for two different cases to evaluate the effect of time-series treatment in the detection of anomalies. The proposed method differentiates between periods of high or low energy demand, identifying contextual anomalies. The results indicate that it is possible to reduce time and effort involved in data analysis, and improve the reliability of monitoring systems, without adding complex procedures.Serrano-Guerrero, X.; Escrivá-Escrivá, G.; Luna-Romero, S.; Clairand, J. (2020). A Time-Series Treatment Method to Obtain Electrical Consumption Patterns for Anomalies Detection Improvement in Electrical Consumption Profiles. Energies. 13(5):1-23. https://doi.org/10.3390/en13051046S123135Hong, T., Yang, L., Hill, D., & Feng, W. (2014). Data and analytics to inform energy retrofit of high performance buildings. Applied Energy, 126, 90-106. doi:10.1016/j.apenergy.2014.03.052Ogunjuyigbe, A. S. O., Ayodele, T. R., & Akinola, O. A. (2017). User satisfaction-induced demand side load management in residential buildings with user budget constraint. Applied Energy, 187, 352-366. doi:10.1016/j.apenergy.2016.11.071Huang, Y., Sun, Y., & Yi, S. (2018). Static and Dynamic Networking of Smart Meters Based on the Characteristics of the Electricity Usage Information. Energies, 11(6), 1532. doi:10.3390/en11061532Lin, R., Ye, Z., & Zhao, Y. (2019). OPEC: Daily Load Data Analysis Based on Optimized Evolutionary Clustering. Energies, 12(14), 2668. doi:10.3390/en12142668Hunt, L. C., Judge, G., & Ninomiya, Y. (2003). Underlying trends and seasonality in UK energy demand: a sectoral analysis. Energy Economics, 25(1), 93-118. doi:10.1016/s0140-9883(02)00072-5Serrano-Guerrero, X., Escrivá-Escrivá, G., & Roldán-Blay, C. (2018). Statistical methodology to assess changes in the electrical consumption profile of buildings. Energy and Buildings, 164, 99-108. doi:10.1016/j.enbuild.2017.12.059Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection. ACM Computing Surveys, 41(3), 1-58. doi:10.1145/1541880.1541882Escrivá-Escrivá, G., Álvarez-Bel, C., Roldán-Blay, C., & Alcázar-Ortega, M. (2011). New artificial neural network prediction method for electrical consumption forecasting based on building end-uses. Energy and Buildings, 43(11), 3112-3119. doi:10.1016/j.enbuild.2011.08.008Serrano-Guerrero, X., Prieto-Galarza, R., Huilcatanda, E., Cabrera-Zeas, J., & Escriva-Escriva, G. (2017). Election of variables and short-term forecasting of electricity demand based on backpropagation artificial neural networks. 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC). doi:10.1109/ropec.2017.8261630Jain, R. K., Smith, K. M., Culligan, P. J., & Taylor, J. E. (2014). Forecasting energy consumption of multi-family residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy. Applied Energy, 123, 168-178. doi:10.1016/j.apenergy.2014.02.057Singh, S., & Yassine, A. (2018). Big Data Mining of Energy Time Series for Behavioral Analytics and Energy Consumption Forecasting. Energies, 11(2), 452. doi:10.3390/en11020452Jota, P. R. S., Silva, V. R. B., & Jota, F. G. (2011). Building load management using cluster and statistical analyses. International Journal of Electrical Power & Energy Systems, 33(8), 1498-1505. doi:10.1016/j.ijepes.2011.06.034Shareef, H., Ahmed, M. S., Mohamed, A., & Al Hassan, E. (2018). Review on Home Energy Management System Considering Demand Responses, Smart Technologies, and Intelligent Controllers. IEEE Access, 6, 24498-24509. doi:10.1109/access.2018.2831917Crespo Cuaresma, J., Hlouskova, J., Kossmeier, S., & Obersteiner, M. (2004). Forecasting electricity spot-prices using linear univariate time-series models. Applied Energy, 77(1), 87-106. doi:10.1016/s0306-2619(03)00096-5Janczura, J., Trück, S., Weron, R., & Wolff, R. C. (2013). Identifying spikes and seasonal components in electricity spot price data: A guide to robust modeling. Energy Economics, 38, 96-110. doi:10.1016/j.eneco.2013.03.013Angelos, E. W. S., Saavedra, O. R., Cortés, O. A. C., & de Souza, A. N. (2011). Detection and Identification of Abnormalities in Customer Consumptions in Power Distribution Systems. IEEE Transactions on Power Delivery, 26(4), 2436-2442. doi:10.1109/tpwrd.2011.2161621Milton, M.-A., Pedro, C.-O., Xavier, S.-G., & Guillermo, E.-E. (2018). Characterization and Classification of Daily Electricity Consumption Profiles: Shape Factors and k-Means Clustering Technique. E3S Web of Conferences, 64, 08004. doi:10.1051/e3sconf/20186408004Chicco, G. (2012). Overview and performance assessment of the clustering methods for electrical load pattern grouping. Energy, 42(1), 68-80. doi:10.1016/j.energy.2011.12.031Seem, J. E. (2005). Pattern recognition algorithm for determining days of the week with similar energy consumption profiles. Energy and Buildings, 37(2), 127-139. doi:10.1016/j.enbuild.2004.04.004Seem, J. E. (2007). Using intelligent data analysis to detect abnormal energy consumption in buildings. Energy and Buildings, 39(1), 52-58. doi:10.1016/j.enbuild.2006.03.033Li, X., Bowers, C. P., & Schnier, T. (2010). Classification of Energy Consumption in Buildings With Outlier Detection. IEEE Transactions on Industrial Electronics, 57(11), 3639-3644. doi:10.1109/tie.2009.2027926Capozzoli, A., Piscitelli, M. S., Brandi, S., Grassi, D., & Chicco, G. (2018). Automated load pattern learning and anomaly detection for enhancing energy management in smart buildings. Energy, 157, 336-352. doi:10.1016/j.energy.2018.05.127Jokar, P., Arianpoo, N., & Leung, V. C. M. (2016). Electricity Theft Detection in AMI Using Customers’ Consumption Patterns. IEEE Transactions on Smart Grid, 7(1), 216-226. doi:10.1109/tsg.2015.2425222Fenza, G., Gallo, M., & Loia, V. (2019). Drift-Aware Methodology for Anomaly Detection in Smart Grid. IEEE Access, 7, 9645-9657. doi:10.1109/access.2019.2891315Araya, D. B., Grolinger, K., ElYamany, H. F., Capretz, M. A. M., & Bitsuamlak, G. (2017). An ensemble learning framework for anomaly detection in building energy consumption. Energy and Buildings, 144, 191-206. doi:10.1016/j.enbuild.2017.02.058Hayes, M. A., & Capretz, M. A. (2015). Contextual anomaly detection framework for big sensor data. Journal of Big Data, 2(1). doi:10.1186/s40537-014-0011-yCui, W., & Wang, H. (2017). A New Anomaly Detection System for School Electricity Consumption Data. Information, 8(4), 151. doi:10.3390/info8040151Fan, C., Xiao, F., Zhao, Y., & Wang, J. (2018). Analytical investigation of autoencoder-based methods for unsupervised anomaly detection in building energy data. Applied Energy, 211, 1123-1135. doi:10.1016/j.apenergy.2017.12.005Cai, H., Shen, S., Lin, Q., Li, X., & Xiao, H. (2019). Predicting the Energy Consumption of Residential Buildings for Regional Electricity Supply-Side and Demand-Side Management. IEEE Access, 7, 30386-30397. doi:10.1109/access.2019.2901257Khan, I., Huang, J. Z., Masud, M. A., & Jiang, Q. (2016). Segmentation of Factories on Electricity Consumption Behaviors Using Load Profile Data. IEEE Access, 4, 8394-8406. doi:10.1109/access.2016.2619898Al-Jarrah, O. Y., Al-Hammadi, Y., Yoo, P. D., & Muhaidat, S. (2017). Multi-Layered Clustering for Power Consumption Profiling in Smart Grids. IEEE Access, 5, 18459-18468. doi:10.1109/access.2017.2712258Park, K.-J., & Son, S.-Y. (2019). A Novel Load Image Profile-Based Electricity Load Clustering Methodology. IEEE Access, 7, 59048-59058. doi:10.1109/access.2019.2914216Serrano-Guerrero, X., Siavichay, L.-F., Clairand, J.-M., & Escrivá-Escrivá, G. (2019). Forecasting Building Electric Consumption Patterns Through Statistical Methods. Advances in Emerging Trends and Technologies, 164-175. doi:10.1007/978-3-030-32033-1_16Li, Y., Zhang, H., Liang, X., & Huang, B. (2019). Event-Triggered-Based Distributed Cooperative Energy Management for Multienergy Systems. IEEE Transactions on Industrial Informatics, 15(4), 2008-2022. doi:10.1109/tii.2018.2862436Khalid, A., Javaid, N., Guizani, M., Alhussein, M., Aurangzeb, K., & Ilahi, M. (2018). Towards Dynamic Coordination Among Home Appliances Using Multi-Objective Energy Optimization for Demand Side Management in Smart Buildings. IEEE Access, 6, 19509-19529. doi:10.1109/access.2018.2791546Borovkova, S., & Geman, H. (2006). Analysis and Modelling of Electricity Futures Prices. Studies in Nonlinear Dynamics & Econometrics, 10(3). doi:10.2202/1558-3708.137

    Attributes of Big Data Analytics for Data-Driven Decision Making in Cyber-Physical Power Systems

    Get PDF
    Big data analytics is a virtually new term in power system terminology. This concept delves into the way a massive volume of data is acquired, processed, analyzed to extract insight from available data. In particular, big data analytics alludes to applications of artificial intelligence, machine learning techniques, data mining techniques, time-series forecasting methods. Decision-makers in power systems have been long plagued by incapability and weakness of classical methods in dealing with large-scale real practical cases due to the existence of thousands or millions of variables, being time-consuming, the requirement of a high computation burden, divergence of results, unjustifiable errors, and poor accuracy of the model. Big data analytics is an ongoing topic, which pinpoints how to extract insights from these large data sets. The extant article has enumerated the applications of big data analytics in future power systems through several layers from grid-scale to local-scale. Big data analytics has many applications in the areas of smart grid implementation, electricity markets, execution of collaborative operation schemes, enhancement of microgrid operation autonomy, management of electric vehicle operations in smart grids, active distribution network control, district hub system management, multi-agent energy systems, electricity theft detection, stability and security assessment by PMUs, and better exploitation of renewable energy sources. The employment of big data analytics entails some prerequisites, such as the proliferation of IoT-enabled devices, easily-accessible cloud space, blockchain, etc. This paper has comprehensively conducted an extensive review of the applications of big data analytics along with the prevailing challenges and solutions
    corecore