621 research outputs found

    Koneoppimiskehys OPC UA datalle (Industry 4.0)

    Get PDF
    Machine learning has rapidly gained popularity in all industries with the increase of computational power and data gathering capabilities. Process industry is a good candidate for machine learning based modeling due to the large amounts of data gathered and need for accurate process state predictions. In this work the viability of combining the OPC UA protocol with existing open source machine learning libraries to create data driven models and generate real time predictions was studied. Scikit-learn was used to generate soft sensor style models for the butane content of a debutanizer column output. The data for offline model training was dynamically fetched from an OCP UA server and with a trained model predictions could be generated in real time. The accuracy of the generated models needs to be further researched with better methodology and larger datasets.Koneoppiminen on kasvattanut suosiotaan nopeasti kaikilla toimialoilla laskentatehon ja datankeruun kasvaessa. Prosessiteollisuus on hyvä kandidaatti koneoppimispohjaiselle mallinnukselle suurien datamäärien sekä vaadittujen tarkkojen prosessimallien takia. Tässä työssä tutkittiin mahdollisuutta OPC UA protokollan yhdistämistä olemassaolevien avoimen lähdekoodin koneoppimiskirjastojen kanssa mittausdataan perustuvien mallien opettamiseksi ja reaaliaikaisten ennusteiden luomiseksi. Scikit-learn kirjastoa käytettiin luomaan malleja butaaninpoistokolonnin ulostulon butaanipitoisuuden ennustamiseen. Data mallien offline opetukseen ladattiin dynaamisesti OPC UA palvelimelta ja valmiiksi opetetulla mallilla ennusteita voitiin generoida reaaliaikaisesti. Luotujen mallien tarkkuutta täytyy tutkia tarkemmin paremmalla metodologialla ja suuremmilla datamäärillä

    Scalable aggregation predictive analytics: a query-driven machine learning approach

    Get PDF
    We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

    ETL and analysis of IoT data using OpenTSDB, Kafka, and Spark

    Get PDF
    Master's thesis in Computer scienceThe Internet of Things (IoT) is becoming increasingly prevalent in today's society. Innovations in storage and processing methodologies enable the processing of large amounts of data in a scalable manner, and generation of insights in near real-time. Data from IoT are typically time-series data but they may also have a strong spatial correlation. In addition, many time-series data are deployed in industries that still place the data in inappropriate relational databases. Many open-source time-series databases exist today with inspiring features in terms of storage, analytic representation, and visualization. Finding an efficient method to migrate data into a time-series database is the first objective of the thesis. In recent decades, machine learning has become one of the backbones of data innovation. With the constantly expanding amounts of information available, there is good reason to expect that smart data analysis will become more pervasive as an essential element for innovative progress. Methods for modeling time-series data in machine learning and migrating time-series data from a database to a big data machine learning framework, such as Apache Spark, is explored in this thesis

    Real-time big data processing for anomaly detection : a survey

    Get PDF
    The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed. © 2018 Elsevier Lt

    Leveraging intelligence from network CDR data for interference aware energy consumption minimization

    Get PDF
    Cell densification is being perceived as the panacea for the imminent capacity crunch. However, high aggregated energy consumption and increased inter-cell interference (ICI) caused by densification, remain the two long-standing problems. We propose a novel network orchestration solution for simultaneously minimizing energy consumption and ICI in ultra-dense 5G networks. The proposed solution builds on a big data analysis of over 10 million CDRs from a real network that shows there exists strong spatio-temporal predictability in real network traffic patterns. Leveraging this we develop a novel scheme to pro-actively schedule radio resources and small cell sleep cycles yielding substantial energy savings and reduced ICI, without compromising the users QoS. This scheme is derived by formulating a joint Energy Consumption and ICI minimization problem and solving it through a combination of linear binary integer programming, and progressive analysis based heuristic algorithm. Evaluations using: 1) a HetNet deployment designed for Milan city where big data analytics are used on real CDRs data from the Telecom Italia network to model traffic patterns, 2) NS-3 based Monte-Carlo simulations with synthetic Poisson traffic show that, compared to full frequency reuse and always on approach, in best case, proposed scheme can reduce energy consumption in HetNets to 1/8th while providing same or better Qo

    Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning

    Full text link
    Cortical plasticity is one of the main features that enable our ability to learn and adapt in our environment. Indeed, the cerebral cortex self-organizes itself through structural and synaptic plasticity mechanisms that are very likely at the basis of an extremely interesting characteristic of the human brain development: the multimodal association. In spite of the diversity of the sensory modalities, like sight, sound and touch, the brain arrives at the same concepts (convergence). Moreover, biological observations show that one modality can activate the internal representation of another modality when both are correlated (divergence). In this work, we propose the Reentrant Self-Organizing Map (ReSOM), a brain-inspired neural system based on the reentry theory using Self-Organizing Maps and Hebbian-like learning. We propose and compare different computational methods for unsupervised learning and inference, then quantify the gain of the ReSOM in a multimodal classification task. The divergence mechanism is used to label one modality based on the other, while the convergence mechanism is used to improve the overall accuracy of the system. We perform our experiments on a constructed written/spoken digits database and a DVS/EMG hand gestures database. The proposed model is implemented on a cellular neuromorphic architecture that enables distributed computing with local connectivity. We show the gain of the so-called hardware plasticity induced by the ReSOM, where the system's topology is not fixed by the user but learned along the system's experience through self-organization.Comment: Preprin
    • …
    corecore