1,351 research outputs found
An experimental and analytical study of visual detection in a spacecraft environment, 1 July 1968 - 1 July 1969
Predicting star magnitude which can be seen with naked eye or sextant through spacecraft windo
Data Mining to Uncover Heterogeneous Water Use Behaviors From Smart Meter Data
Knowledge on the determinants and patterns of water demand for different consumers supports the design of customized demand management strategies. Smart meters coupled with big data analytics tools create a unique opportunity to support such strategies. Yet, at present, the information content of smart meter data is not fully mined and usually needs to be complemented with water fixture inventory and survey data to achieve detailed customer segmentation based on end use water usage. In this paper, we developed a dataâdriven approach that extracts information on heterogeneous water end use routines, main end use components, and temporal characteristics, only via data mining existing smart meter readings at the scale of individual households. We tested our approach on data from 327 households in Australia, each monitored with smart meters logging water use readings every 5 s. As part of the approach, we first disaggregated the householdâlevel water use time series into different end uses via Autoflow. We then adapted a customer segmentation based on eigenbehavior analysis to discriminate among heterogeneous water end use routines and identify clusters of consumers presenting similar routines. Results revealed three main water end use profile clusters, each characterized by a primary end use: shower, clothes washing, and irrigation. Timeâofâuse and intensityâofâuse differences exist within each class, as well as different characteristics of regularity and periodicity over time. Our customer segmentation analysis approach provides utilities with a concise snapshot of recurrent water use routines from smart meter data and can be used to support customized demand management strategies.TU Berlin, Open-Access-Mittel - 201
How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility
Recommendation systems are ubiquitous and impact many domains; they have the
potential to influence product consumption, individuals' perceptions of the
world, and life-altering decisions. These systems are often evaluated or
trained with data from users already exposed to algorithmic recommendations;
this creates a pernicious feedback loop. Using simulations, we demonstrate how
using data confounded in this way homogenizes user behavior without increasing
utility
Data mining: a tool for detecting cyclical disturbances in supply networks.
Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means
clustering and the RULES-6 classification rule-learning algorithms, developed by the present authorsâ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause
Data-adaptive harmonic spectra and multilayer Stuart-Landau models
Harmonic decompositions of multivariate time series are considered for which
we adopt an integral operator approach with periodic semigroup kernels.
Spectral decomposition theorems are derived that cover the important cases of
two-time statistics drawn from a mixing invariant measure.
The corresponding eigenvalues can be grouped per Fourier frequency, and are
actually given, at each frequency, as the singular values of a cross-spectral
matrix depending on the data. These eigenvalues obey furthermore a variational
principle that allows us to define naturally a multidimensional power spectrum.
The eigenmodes, as far as they are concerned, exhibit a data-adaptive character
manifested in their phase which allows us in turn to define a multidimensional
phase spectrum.
The resulting data-adaptive harmonic (DAH) modes allow for reducing the
data-driven modeling effort to elemental models stacked per frequency, only
coupled at different frequencies by the same noise realization. In particular,
the DAH decomposition extracts time-dependent coefficients stacked by Fourier
frequency which can be efficiently modeled---provided the decay of temporal
correlations is sufficiently well-resolved---within a class of multilayer
stochastic models (MSMs) tailored here on stochastic Stuart-Landau oscillators.
Applications to the Lorenz 96 model and to a stochastic heat equation driven
by a space-time white noise, are considered. In both cases, the DAH
decomposition allows for an extraction of spatio-temporal modes revealing key
features of the dynamics in the embedded phase space. The multilayer
Stuart-Landau models (MSLMs) are shown to successfully model the typical
patterns of the corresponding time-evolving fields, as well as their statistics
of occurrence.Comment: 26 pages, double columns; 15 figure
Efficient template attacks
This is the accepted manuscript version. The final published version is available from http://link.springer.com/chapter/10.1007/978-3-319-08302-5_17.Template attacks remain a powerful side-channel technique to eavesdrop on tamper-resistant hardware. They model the probability distribution of leaking signals and noise to guide a search for secret data values. In practice, several numerical obstacles can arise when implementing such attacks with multivariate normal distributions. We propose efficient methods to avoid these. We also demonstrate how to achieve significant performance improvements, both in terms of information extracted and computational cost, by pooling covariance estimates across all data values. We provide a detailed and systematic overview of many different options for implementing such attacks. Our experimental evaluation of all these methods based on measuring the supply current of a byte-load instruction executed in an unprotected 8-bit microcontroller leads to practical guidance for choosing an attack algorithm.Omar Choudary is a recipient of the Google Europe Fellowship in
Mobile Security, and this research is supported in part by this Google Fellowship
Globally sparse PLS regression
Volume 56 ; Print ISBN : 978-1-4614-8282-6Partial least squares (PLS) regression combines dimensionality reduction and prediction using a latent variable model. It provides better predictive ability than principle component analysis by taking into account both the independent and re- sponse variables in the dimension reduction procedure. However, PLS suffers from over-fitting problems for few samples but many variables. We formulate a new criterion for sparse PLS by adding a structured sparsity constraint to the global SIMPLS optimization. The constraint is a sparsity-inducing norm, which is useful for selecting the important variables shared among all the components. The optimization is solved by an augmented Lagrangian method to obtain the PLS components and to perform variable selection simultaneously. We propose a novel greedy algorithm to overcome the computation difficulties. Experiments demonstrate that our approach to PLS regression attains better performance with fewer selected predictor
Nonlinear Mode Decomposition: a new noise-robust, adaptive decomposition method
We introduce a new adaptive decomposition tool, which we refer to as
Nonlinear Mode Decomposition (NMD). It decomposes a given signal into a set of
physically meaningful oscillations for any waveform, simultaneously removing
the noise. NMD is based on the powerful combination of time-frequency analysis
techniques - which together with the adaptive choice of their parameters make
it extremely noise-robust - and surrogate data tests, used to identify
interdependent oscillations and to distinguish deterministic from random
activity. We illustrate the application of NMD to both simulated and real
signals, and demonstrate its qualitative and quantitative superiority over the
other existing approaches, such as (ensemble) empirical mode decomposition,
Karhunen-Loeve expansion and independent component analysis. We point out that
NMD is likely to be applicable and useful in many different areas of research,
such as geophysics, finance, and the life sciences. The necessary MATLAB codes
for running NMD are freely available at
http://www.physics.lancs.ac.uk/research/nbmphysics/diats/nmd/.Comment: 38 pages, 13 figure
Multidimensional Data Visual Exploration by Interactive Information Segments
Visualization techniques provide an outstanding role in KDD process for data analysis and mining. However, one image does not always convey successfully the inherent information from high dimensionality, very large databases. In this paper we introduce VSIS (Visual Set of Information Segments), an interactive tool to visually explore multidimensional, very large, numerical data. Within the supervised learning, our proposal approaches the problem of classification by searching of meaningful intervals belonging to the most relevant attributes. These intervals are displayed as multiâcolored bars in which the degree of impurity with respect to the class membership can be easily perceived. Such bars can be reâexplored interactively with new values of userâdefined parameters. A case study of applying VSIS to some UCI repository data sets shows the usefulness of our tool in supporting the exploration of multidimensional and very large data
When the optimal is not the best: parameter estimation in complex biological models
Background: The vast computational resources that became available during the
past decade enabled the development and simulation of increasingly complex
mathematical models of cancer growth. These models typically involve many free
parameters whose determination is a substantial obstacle to model development.
Direct measurement of biochemical parameters in vivo is often difficult and
sometimes impracticable, while fitting them under data-poor conditions may
result in biologically implausible values.
Results: We discuss different methodological approaches to estimate
parameters in complex biological models. We make use of the high computational
power of the Blue Gene technology to perform an extensive study of the
parameter space in a model of avascular tumor growth. We explicitly show that
the landscape of the cost function used to optimize the model to the data has a
very rugged surface in parameter space. This cost function has many local
minima with unrealistic solutions, including the global minimum corresponding
to the best fit.
Conclusions: The case studied in this paper shows one example in which model
parameters that optimally fit the data are not necessarily the best ones from a
biological point of view. To avoid force-fitting a model to a dataset, we
propose that the best model parameters should be found by choosing, among
suboptimal parameters, those that match criteria other than the ones used to
fit the model. We also conclude that the model, data and optimization approach
form a new complex system, and point to the need of a theory that addresses
this problem more generally
- âŠ