50,818 research outputs found
Extracting information from the signature of a financial data stream
Market events such as order placement and order cancellation are examples of
the complex and substantial flow of data that surrounds a modern financial
engineer. New mathematical techniques, developed to describe the interactions
of complex oscillatory systems (known as the theory of rough paths) provides
new tools for analysing and describing these data streams and extracting the
vital information. In this paper we illustrate how a very small number of
coefficients obtained from the signature of financial data can be sufficient to
classify this data for subtle underlying features and make useful predictions.
This paper presents financial examples in which we learn from data and then
proceed to classify fresh streams. The classification is based on features of
streams that are specified through the coordinates of the signature of the
path. At a mathematical level the signature is a faithful transform of a
multidimensional time series. (Ben Hambly and Terry Lyons \cite{uniqueSig}),
Hao Ni and Terry Lyons \cite{NiLyons} introduced the possibility of its use to
understand financial data and pointed to the potential this approach has for
machine learning and prediction.
We evaluate and refine these theoretical suggestions against practical
examples of interest and present a few motivating experiments which demonstrate
information the signature can easily capture in a non-parametric way avoiding
traditional statistical modelling of the data. In the first experiment we
identify atypical market behaviour across standard 30-minute time buckets
sampled from the WTI crude oil future market (NYMEX). The second and third
experiments aim to characterise the market "impact" of and distinguish between
parent orders generated by two different trade execution algorithms on the FTSE
100 Index futures market listed on NYSE Liffe
Big Data and Reliability Applications: The Complexity Dimension
Big data features not only large volumes of data but also data with
complicated structures. Complexity imposes unique challenges in big data
analytics. Meeker and Hong (2014, Quality Engineering, pp. 102-116) provided an
extensive discussion of the opportunities and challenges in big data and
reliability, and described engineering systems that can generate big data that
can be used in reliability analysis. Meeker and Hong (2014) focused on large
scale system operating and environment data (i.e., high-frequency multivariate
time series data), and provided examples on how to link such data as covariates
to traditional reliability responses such as time to failure, time to
recurrence of events, and degradation measurements. This paper intends to
extend that discussion by focusing on how to use data with complicated
structures to do reliability analysis. Such data types include high-dimensional
sensor data, functional curve data, and image streams. We first provide a
review of recent development in those directions, and then we provide a
discussion on how analytical methods can be developed to tackle the challenging
aspects that arise from the complexity feature of big data in reliability
applications. The use of modern statistical methods such as variable selection,
functional data analysis, scalar-on-image regression, spatio-temporal data
models, and machine learning techniques will also be discussed.Comment: 28 pages, 7 figure
Deterministic Sampling and Range Counting in Geometric Data Streams
We present memory-efficient deterministic algorithms for constructing
epsilon-nets and epsilon-approximations of streams of geometric data. Unlike
probabilistic approaches, these deterministic samples provide guaranteed bounds
on their approximation factors. We show how our deterministic samples can be
used to answer approximate online iceberg geometric queries on data streams. We
use these techniques to approximate several robust statistics of geometric data
streams, including Tukey depth, simplicial depth, regression depth, the
Thiel-Sen estimator, and the least median of squares. Our algorithms use only a
polylogarithmic amount of memory, provided the desired approximation factors
are inverse-polylogarithmic. We also include a lower bound for non-iceberg
geometric queries.Comment: 12 pages, 1 figur
- …