Search CORE

57,192 research outputs found

Data analytics 2016: proceedings of the fifth international conference on data analytics

Author: Bhulai Sandjai
Semanjski Ivana
Publication venue: The International Academy, Research and Industry Association
Publication date: 01/01/2016
Field of study

VU Research Portal

Ghent University Academic Bibliography

Adaptive estimation and change detection of correlation and quantiles for evolving data streams

Author: Noble Jordan
Publication venue: Mathematics, Imperial College London
Publication date: 01/10/2023
Field of study

Streaming data processing is increasingly playing a central role in enterprise data architectures due to an abundance of available measurement data from a wide variety of sources and advances in data capture and infrastructure technology. Data streams arrive, with high frequency, as never-ending sequences of events, where the underlying data generating process always has the potential to evolve. Business operations often demand real-time processing of data streams for keeping models up-to-date and timely decision-making. For example in cybersecurity contexts, analysing streams of network data can aid the detection of potentially malicious behaviour. Many tools for statistical inference cannot meet the challenging demands of streaming data, where the computational cost of updates to models must be constant to ensure continuous processing as data scales. Moreover, these tools are often not capable of adapting to changes, or drift, in the data. Thus, new tools for modelling data streams with efficient data processing and model updating capabilities, referred to as streaming analytics, are required. Regular intervention for control parameter configuration is prohibitive to the truly continuous processing constraints of streaming data. There is a notable absence of such tools designed with both temporal-adaptivity to accommodate drift and the autonomy to not rely on control parameter tuning. Streaming analytics with these properties can be developed using an Adaptive Forgetting (AF) framework, with roots in adaptive filtering. The fundamental contributions of this thesis are to extend the streaming toolkit by using the AF framework to develop autonomous and temporally-adaptive streaming analytics. The first contribution uses the AF framework to demonstrate the development of a model, and validation procedure, for estimating time-varying parameters of bivariate data streams from cyber-physical systems. This is accompanied by a novel continuous monitoring change detection system that compares adaptive and non-adaptive estimates. The second contribution is the development of a streaming analytic for the correlation coefficient and an associated change detector to monitor changes to correlation structures across streams. This is demonstrated on cybersecurity network data. The third contribution is a procedure for estimating time-varying binomial data with thorough exploration of the nuanced behaviour of this estimator. The final contribution is a framework to enhance extant streaming quantile estimators with autonomous, temporally-adaptive properties. In addition, a novel streaming quantile procedure is developed and demonstrated, in an extensive simulation study, to show appealing performance.Open Acces

Spiral - Imperial College Digital Repository

Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017

Author: Belousov Maksim
Dixon William
Milosevic Nikola
Nenadic Goran
Publication venue
Publication date: 01/01/2018
Field of study

Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mix of rule-based, machine learning (CRF) and deep learning (BLSTM with word2vec embeddings) methodologies in order to annotate the data. The systems were submitted to adverse drug reaction shared task, organised during Text Analytics Conference in 2017 by National Institute for Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively.Comment: Paper describing submission for TAC ADR shared tas

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

An improved tool of water data analytics for flowmeters data

Author: Espin Santiago
Pesantez José Luis
Quevedo Casín Joseba Jokin
Roquet Jaume
Valero Fernando
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

This paper presents an improved tool for data validation and reconstruction of flowmeters. These sensors are installed in the Catalonia regional water network from Barcelona (Spain). Here a new time series model with exogenous variable is proposed with excellent results for data validation. It is postulated that the integration of the electronics alarms, along with other tests about the daily data accumulated and a later analysis of the data reconstruction allow to improve the results of the existing tools. This is accomplished by decreasing the false alarms and missing alarms of more than 6000 hourly data retrieved from more than 200 flowmeters each day. This new tool provides reliable information daily reliable information of the state of the water network. This information could potentially contribute to optimally control and manage this large and complex water network.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

A Process to Implement an Artificial Neural Network and Association Rules Techniques to Improve Asset Performance and Energy Efficiency

Author: Antomarioni Sara
Crespo Márquez Adolfo
Fuente Antonio de la
Publication venue: 'MDPI AG'
Publication date: 01/09/2019
Field of study

In this paper, we address the problem of asset performance monitoring, with the intention of both detecting any potential reliability problem and predicting any loss of energy consumption e ciency. This is an important concern for many industries and utilities with very intensive capitalization in very long-lasting assets. To overcome this problem, in this paper we propose an approach to combine an Artificial Neural Network (ANN) with Data Mining (DM) tools, specifically with Association Rule (AR) Mining. The combination of these two techniques can now be done using software which can handle large volumes of data (big data), but the process still needs to ensure that the required amount of data will be available during the assets’ life cycle and that its quality is acceptable. The combination of these two techniques in the proposed sequence di ers from previous works found in the literature, giving researchers new options to face the problem. Practical implementation of the proposed approach may lead to novel predictive maintenance models (emerging predictive analytics) that may detect with unprecedented precision any asset’s lack of performance and help manage assets’ O&M accordingly. The approach is illustrated using specific examples where asset performance monitoring is rather complex under normal operational conditions.Ministerio de Economía y Competitividad DPI2015-70842-

Multidisciplinary Digital Publishing Institute

idUS. Depósito de Investigación Universidad de Sevilla

Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)

Author: Brandt Sebastian
Horrocks Ian
Ioannidis Yannis
Kharlamov Evgeny
Kotidis Yannis
Lamparter Steffen
Mailis Theofilos
Möller Ralf
Neuenstadt Christian
Nikolaou Charalampos
Svingos Christoforos
Zheleznyakov Dmitriy
Özcep Özgür
Publication venue
Publication date: 01/01/2016
Field of study

Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data

arXiv.org e-Print Archive

Oxford University Research Archive

Big Data and the Internet of Things

Author: A Baaziz
A Kleiner
ED Feigelson
MA Waller
S Boyd
S Vandermerwe
Z Zhou
Publication venue
Publication date: 24/03/2015
Field of study

Advances in sensing and computing capabilities are making it possible to embed increasing computing power in small devices. This has enabled the sensing devices not just to passively capture data at very high resolution but also to take sophisticated actions in response. Combined with advances in communication, this is resulting in an ecosystem of highly interconnected devices referred to as the Internet of Things - IoT. In conjunction, the advances in machine learning have allowed building models on this ever increasing amounts of data. Consequently, devices all the way from heavy assets such as aircraft engines to wearables such as health monitors can all now not only generate massive amounts of data but can draw back on aggregate analytics to "improve" their performance over time. Big data analytics has been identified as a key enabler for the IoT. In this chapter, we discuss various avenues of the IoT where big data analytics either is already making a significant impact or is on the cusp of doing so. We also discuss social implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski (eds.) Big Data Analysis: New algorithms for a new society, Springer Series on Studies in Big Data, to appea

arXiv.org e-Print Archive

Crossref