57,192 research outputs found
Adaptive estimation and change detection of correlation and quantiles for evolving data streams
Streaming data processing is increasingly playing a central role in enterprise data architectures due to an abundance of available measurement data from a wide variety of sources and advances in data capture and infrastructure technology. Data streams arrive, with high frequency, as never-ending sequences of events, where the underlying data generating process always has the potential to evolve. Business operations often demand real-time processing of data streams for keeping models up-to-date and timely decision-making. For example in cybersecurity contexts, analysing streams of network data can aid the detection of potentially malicious behaviour. Many tools for statistical inference cannot meet the challenging demands of streaming data, where the computational cost of updates to models must be constant to ensure continuous processing as data scales. Moreover, these tools are often not capable of adapting to changes, or drift, in the data. Thus, new tools for modelling data streams with efficient data processing and model updating capabilities, referred to as streaming analytics, are required. Regular intervention for control parameter configuration is prohibitive to the truly continuous processing constraints of streaming data. There is a notable absence of such tools designed with both temporal-adaptivity to accommodate drift and the autonomy to not rely on control parameter tuning. Streaming analytics with these properties can be developed using an Adaptive Forgetting (AF) framework, with roots in adaptive filtering. The fundamental contributions of this thesis are to extend the streaming toolkit by using the AF framework to develop autonomous and temporally-adaptive streaming analytics.
The first contribution uses the AF framework to demonstrate the development of a model, and validation procedure, for estimating time-varying parameters of bivariate data streams from cyber-physical systems. This is accompanied by a novel continuous monitoring change detection system that compares adaptive and non-adaptive estimates. The second contribution is the development of a streaming analytic for the correlation coefficient and an associated change detector to monitor changes to correlation structures across streams. This is demonstrated on cybersecurity network data. The third contribution is a procedure for estimating time-varying binomial data with thorough exploration of the nuanced behaviour of this estimator. The final contribution is a framework to enhance extant streaming quantile estimators with autonomous, temporally-adaptive properties. In addition, a novel streaming quantile procedure is developed and demonstrated, in an extensive simulation study, to show appealing performance.Open Acces
Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017
Adverse drug reactions (ADRs) are unwanted or harmful effects experienced
after the administration of a certain drug or a combination of drugs,
presenting a challenge for drug development and drug administration. In this
paper, we present a set of taggers for extracting adverse drug reactions and
related entities, including factors, severity, negations, drug class and
animal. The systems used a mix of rule-based, machine learning (CRF) and deep
learning (BLSTM with word2vec embeddings) methodologies in order to annotate
the data. The systems were submitted to adverse drug reaction shared task,
organised during Text Analytics Conference in 2017 by National Institute for
Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively.Comment: Paper describing submission for TAC ADR shared tas
An improved tool of water data analytics for flowmeters data
This paper presents an improved tool for data validation and reconstruction of flowmeters. These sensors are installed in the Catalonia regional water network from Barcelona (Spain). Here a new time series model with exogenous variable is proposed with excellent results for data validation. It is postulated that the integration of the electronics alarms, along with other tests about the daily data accumulated and a later analysis of the data reconstruction allow to improve the results of the existing tools. This is accomplished by decreasing the false alarms and missing alarms of more than 6000 hourly data retrieved from more than 200 flowmeters each day. This new tool provides reliable information daily reliable information of the state of the water network. This information could potentially contribute to optimally control and manage this large and complex water network.Postprint (published version
A Process to Implement an Artificial Neural Network and Association Rules Techniques to Improve Asset Performance and Energy Efficiency
In this paper, we address the problem of asset performance monitoring, with the intention
of both detecting any potential reliability problem and predicting any loss of energy consumption
e ciency. This is an important concern for many industries and utilities with very intensive
capitalization in very long-lasting assets. To overcome this problem, in this paper we propose an
approach to combine an Artificial Neural Network (ANN) with Data Mining (DM) tools, specifically
with Association Rule (AR) Mining. The combination of these two techniques can now be done
using software which can handle large volumes of data (big data), but the process still needs to
ensure that the required amount of data will be available during the assets’ life cycle and that its
quality is acceptable. The combination of these two techniques in the proposed sequence di ers
from previous works found in the literature, giving researchers new options to face the problem.
Practical implementation of the proposed approach may lead to novel predictive maintenance models
(emerging predictive analytics) that may detect with unprecedented precision any asset’s lack of
performance and help manage assets’ O&M accordingly. The approach is illustrated using specific
examples where asset performance monitoring is rather complex under normal operational conditions.Ministerio de EconomÃa y Competitividad DPI2015-70842-
Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)
Real-time analytics that requires integration and aggregation of
heterogeneous and distributed streaming and static data is a typical task in
many industrial scenarios such as diagnostics of turbines in Siemens. OBDA
approach has a great potential to facilitate such tasks; however, it has a
number of limitations in dealing with analytics that restrict its use in
important industrial applications. Based on our experience with Siemens, we
argue that in order to overcome those limitations OBDA should be extended and
become analytics, source, and cost aware. In this work we propose such an
extension. In particular, we propose an ontology, mapping, and query language
for OBDA, where aggregate and other analytical functions are first class
citizens. Moreover, we develop query optimisation techniques that allow to
efficiently process analytical tasks over static and streaming data. We
implement our approach in a system and evaluate our system with Siemens turbine
data
Big Data and the Internet of Things
Advances in sensing and computing capabilities are making it possible to
embed increasing computing power in small devices. This has enabled the sensing
devices not just to passively capture data at very high resolution but also to
take sophisticated actions in response. Combined with advances in
communication, this is resulting in an ecosystem of highly interconnected
devices referred to as the Internet of Things - IoT. In conjunction, the
advances in machine learning have allowed building models on this ever
increasing amounts of data. Consequently, devices all the way from heavy assets
such as aircraft engines to wearables such as health monitors can all now not
only generate massive amounts of data but can draw back on aggregate analytics
to "improve" their performance over time. Big data analytics has been
identified as a key enabler for the IoT. In this chapter, we discuss various
avenues of the IoT where big data analytics either is already making a
significant impact or is on the cusp of doing so. We also discuss social
implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski
(eds.) Big Data Analysis: New algorithms for a new society, Springer Series
on Studies in Big Data, to appea
- …