Search CORE

1,351 research outputs found

An experimental and analytical study of visual detection in a spacecraft environment, 1 July 1968 - 1 July 1969

Author: Heinisch R. P.
Jolliffe C. L.
Schmidt R. N.
Publication venue
Publication date
Field of study

Predicting star magnitude which can be seen with naked eye or sextant through spacecraft windo

NASA Technical Reports Server

Data Mining to Uncover Heterogeneous Water Use Behaviors From Smart Meter Data

Author: Beal C.
Jolliffe I.
MacQueen J.
Mayer P. W.
Sensus
Stewart R.
van derMaaten L.
van derMaaten L.
Publication venue
Publication date: 01/01/2019
Field of study

Knowledge on the determinants and patterns of water demand for different consumers supports the design of customized demand management strategies. Smart meters coupled with big data analytics tools create a unique opportunity to support such strategies. Yet, at present, the information content of smart meter data is not fully mined and usually needs to be complemented with water fixture inventory and survey data to achieve detailed customer segmentation based on end use water usage. In this paper, we developed a data‐driven approach that extracts information on heterogeneous water end use routines, main end use components, and temporal characteristics, only via data mining existing smart meter readings at the scale of individual households. We tested our approach on data from 327 households in Australia, each monitored with smart meters logging water use readings every 5 s. As part of the approach, we first disaggregated the household‐level water use time series into different end uses via Autoflow. We then adapted a customer segmentation based on eigenbehavior analysis to discriminate among heterogeneous water end use routines and identify clusters of consumers presenting similar routines. Results revealed three main water end use profile clusters, each characterized by a primary end use: shower, clothes washing, and irrigation. Time‐of‐use and intensity‐of‐use differences exist within each class, as well as different characteristics of regularity and periodicity over time. Our customer segmentation analysis approach provides utilities with a concise snapshot of recurrent water use routines from smart meter data and can be used to support customized demand management strategies.TU Berlin, Open-Access-Mittel - 201

DepositOnce

GEO-LEOe-docs

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Adelaide Research & Scholarship

How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility

Author: Anderson C.
Bennett J.
Bottou L.
Chander A
Dan-Dan Z.
Jolliffe I.
Lee D. D.
Salakhutdinov R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2018
Field of study

Recommendation systems are ubiquitous and impact many domains; they have the potential to influence product consumption, individuals' perceptions of the world, and life-altering decisions. These systems are often evaluated or trained with data from users already exposed to algorithmic recommendations; this creates a pernicious feedback loop. Using simulations, we demonstrate how using data confounded in this way homogenizes user behavior without increasing utility

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Data mining: a tool for detecting cyclical disturbances in supply networks.

Author: Chan F. T. S.
Chatfield C.
Davis T.
Devijver P. A.
Fayyad U. M.
Forrester J. W.
Han J.
Harding J. A.
Jolliffe I. T.
Kaufman L.
Klösgen W.
Koopmans L. H.
Mason-Jones R.
Monostori L.
Pyle D.
Witten I. H.
Publication venue: 'SAGE Publications'
Publication date: 21/12/2007
Field of study

Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means clustering and the RULES-6 classification rule-learning algorithms, developed by the present authors’ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause

Crossref

Middlesex University Research Repository

Data-adaptive harmonic spectra and multilayer Stuart-Landau models

Author: Autonne L.
Brézis H.
Brézis H.
Chekroun M. D.
Chekroun M. D.
Collet P.
Da Prato G.
Da Prato G.
Dmitri Kondrashov
Drmac Z.
Embry M.
Engel K.-J.
Jolliffe I.
Kondrashov D.
Lax P.
Lorenz E.
Mickaël D. Chekroun
Pazy A.
Van Neerven J.
Vogel C. R.
Weinberger H.
Zhou J. T.
Publication venue: 'AIP Publishing'
Publication date: 13/06/2017
Field of study

Harmonic decompositions of multivariate time series are considered for which we adopt an integral operator approach with periodic semigroup kernels. Spectral decomposition theorems are derived that cover the important cases of two-time statistics drawn from a mixing invariant measure. The corresponding eigenvalues can be grouped per Fourier frequency, and are actually given, at each frequency, as the singular values of a cross-spectral matrix depending on the data. These eigenvalues obey furthermore a variational principle that allows us to define naturally a multidimensional power spectrum. The eigenmodes, as far as they are concerned, exhibit a data-adaptive character manifested in their phase which allows us in turn to define a multidimensional phase spectrum. The resulting data-adaptive harmonic (DAH) modes allow for reducing the data-driven modeling effort to elemental models stacked per frequency, only coupled at different frequencies by the same noise realization. In particular, the DAH decomposition extracts time-dependent coefficients stacked by Fourier frequency which can be efficiently modeled---provided the decay of temporal correlations is sufficiently well-resolved---within a class of multilayer stochastic models (MSMs) tailored here on stochastic Stuart-Landau oscillators. Applications to the Lorenz 96 model and to a stochastic heat equation driven by a space-time white noise, are considered. In both cases, the DAH decomposition allows for an extraction of spatio-temporal modes revealing key features of the dynamics in the embedded phase space. The multilayer Stuart-Landau models (MSLMs) are shown to successfully model the typical patterns of the corresponding time-evolving fields, as well as their statistics of occurrence.Comment: 26 pages, double columns; 15 figure

arXiv.org e-Print Archive

Crossref

HAL Descartes

Hal-Diderot

Efficient template attacks

Author: B Gierlichs
C Archambeau
C Rechberger
D Oswald
F-X Standaert
F-X Standaert
GEP Box
I Jolliffe
L Batina
O Ledoit
R Johnson
RA Fisher
S Chari
S Mangard
T Eisenbarth
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2014
Field of study

This is the accepted manuscript version. The final published version is available from http://link.springer.com/chapter/10.1007/978-3-319-08302-5_17.Template attacks remain a powerful side-channel technique to eavesdrop on tamper-resistant hardware. They model the probability distribution of leaking signals and noise to guide a search for secret data values. In practice, several numerical obstacles can arise when implementing such attacks with multivariate normal distributions. We propose efficient methods to avoid these. We also demonstrate how to achieve significant performance improvements, both in terms of information extracted and computational cost, by pooling covariance estimates across all data values. We provide a detailed and systematic overview of many different options for implementing such attacks. Our experimental evaluation of all these methods based on measuring the supply current of a byte-load instruction executed in an unprotected 8-bit microcontroller leads to practical guidance for choosing an attack algorithm.Omar Choudary is a recipient of the Google Europe Fellowship in Mobile Security, and this research is supported in part by this Google Fellowship

CiteSeerX

Crossref

Apollo (Cambridge)

Cryptology ePrint Archive

Globally sparse PLS regression

Author: A. Beck
A. Höskuldsson
A. L. Boulesteix
C. J. Braak ter
C. J. Braak ter
D. Rossouw
F. R. Bach
H. Chun
H. Hotelling
H. Martens
H. Wold
H. Wold
I. Jolliffe
I. T. Jolliffe
J. Magidson
M. Tenenhaus
S. Jong de
S. Jong de
S. Jong de
S. Wold
S. Wold
W. Gander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Volume 56 ; Print ISBN : 978-1-4614-8282-6Partial least squares (PLS) regression combines dimensionality reduction and prediction using a latent variable model. It provides better predictive ability than principle component analysis by taking into account both the independent and re- sponse variables in the dimension reduction procedure. However, PLS suffers from over-fitting problems for few samples but many variables. We formulate a new criterion for sparse PLS by adding a structured sparsity constraint to the global SIMPLS optimization. The constraint is a sparsity-inducing norm, which is useful for selecting the important variables shared among all the components. The optimization is solved by an augmented Lagrangian method to obtain the PLS components and to perform variable selection simultaneously. We propose a novel greedy algorithm to overcome the computation difficulties. Experiments demonstrate that our approach to PLS regression attains better performance with fewer selected predictor

HAL-CentraleSupelec

Crossref

HAL-Rennes 1

Nonlinear Mode Decomposition: a new noise-robust, adaptive decomposition method

Author: Aneta Stefanovska
Dmytro Iatsenko
G. Kaiser
I. Daubechies
I. T. Jolliffe
J. P. Saul
L. Cohen
P. S. Addison
Peter V. E. McClintock
S. C. Malpas
S. Mallat
Publication venue: 'American Physical Society (APS)'
Publication date: 13/03/2014
Field of study

We introduce a new adaptive decomposition tool, which we refer to as Nonlinear Mode Decomposition (NMD). It decomposes a given signal into a set of physically meaningful oscillations for any waveform, simultaneously removing the noise. NMD is based on the powerful combination of time-frequency analysis techniques - which together with the adaptive choice of their parameters make it extremely noise-robust - and surrogate data tests, used to identify interdependent oscillations and to distinguish deterministic from random activity. We illustrate the application of NMD to both simulated and real signals, and demonstrate its qualitative and quantitative superiority over the other existing approaches, such as (ensemble) empirical mode decomposition, Karhunen-Loeve expansion and independent component analysis. We point out that NMD is likely to be applicable and useful in many different areas of research, such as geophysics, finance, and the life sciences. The necessary MATLAB codes for running NMD are freely available at http://www.physics.lancs.ac.uk/research/nbmphysics/diats/nmd/.Comment: 38 pages, 13 figure

arXiv.org e-Print Archive

Crossref

Lancaster E-Prints

Multidimensional Data Visual Exploration by Interactive Information Segments

Author: A. Flexer
A. Mead
C. Chen
D. Asimov
D. Tang
D.A. Keim
D.A. Keim
D.A. Keim
D.F. Andrews
G.D. Battista
H. Chernoff
J. Jolliffe
L. Nowell
N. Lopez
P.J. Huber
Publication venue
Publication date: 01/01/2004
Field of study

Visualization techniques provide an outstanding role in KDD process for data analysis and mining. However, one image does not always convey successfully the inherent information from high dimensionality, very large databases. In this paper we introduce VSIS (Visual Set of Information Segments), an interactive tool to visually explore multidimensional, very large, numerical data. Within the supervised learning, our proposal approaches the problem of classification by searching of meaningful intervals belonging to the most relevant attributes. These intervals are displayed as multi–colored bars in which the degree of impurity with respect to the class membership can be easily perceived. Such bars can be re–explored interactively with new values of user–defined parameters. A case study of applying VSIS to some UCI repository data sets shows the usefulness of our tool in supporting the exploration of multidimensional and very large data

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

When the optimal is not the best: parameter estimation in complex biological models

Author: A Björck
A Gara
B Gompertz
BC Daniels
C Kreutz
C Moles
Cecilia Suárez
D Fernández Slezak
D Marquardt
Darren R. Flower
Diego Fernández Slezak
E Marinari
F James
G Cedersund
G Fracasso
Guillermo A. Cecchi
Guillermo Marshall
Gustavo Stolovitzky
H Kitano
IT Jolliffe
J Freyer
J Freyer
J Freyer
J Freyer
J Nelder
J Ward
JW Zwolak
JW Zwolak
K Levenberg
L Kunz-Schughart
L Preziosi
M Ashyraliyev
M Marusic
M Marusic
M Rodriguez-Fernandez
N Adiga
P Pacheco
R Araujo
R Fletcher
R Gutenkunst
R Sutherland
S Wise
Y Jiang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2010
Field of study

Background: The vast computational resources that became available during the past decade enabled the development and simulation of increasingly complex mathematical models of cancer growth. These models typically involve many free parameters whose determination is a substantial obstacle to model development. Direct measurement of biochemical parameters in vivo is often difficult and sometimes impracticable, while fitting them under data-poor conditions may result in biologically implausible values. Results: We discuss different methodological approaches to estimate parameters in complex biological models. We make use of the high computational power of the Blue Gene technology to perform an extensive study of the parameter space in a model of avascular tumor growth. We explicitly show that the landscape of the cost function used to optimize the model to the data has a very rugged surface in parameter space. This cost function has many local minima with unrealistic solutions, including the global minimum corresponding to the best fit. Conclusions: The case studied in this paper shows one example in which model parameters that optimally fit the data are not necessarily the best ones from a biological point of view. To avoid force-fitting a model to a dataset, we propose that the best model parameters should be found by choosing, among suboptimal parameters, those that match criteria other than the ones used to fit the model. We also conclude that the model, data and optimization approach form a new complex system, and point to the need of a theory that addresses this problem more generally

arXiv.org e-Print Archive

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Rectorado de la Universidad de Buenos Aires

Biblioteca Digital Biblioteca Digital de la Facultad de Ciencias Exactas y Naturales de la Universidad de Buenos Aires (Biblioteca Digital FCEN-UBA)