244 research outputs found

    High-throughput data analysis in behavior genetics

    Full text link
    In recent years, a growing need has arisen in different fields for the development of computational systems for automated analysis of large amounts of data (high-throughput). Dealing with nonstandard noise structure and outliers, that could have been detected and corrected in manual analysis, must now be built into the system with the aid of robust methods. We discuss such problems and present insights and solutions in the context of behavior genetics, where data consists of a time series of locations of a mouse in a circular arena. In order to estimate the location, velocity and acceleration of the mouse, and identify stops, we use a nonstandard mix of robust and resistant methods: LOWESS and repeated running median. In addition, we argue that protection against small deviations from experimental protocols can be handled automatically using statistical methods. In our case, it is of biological interest to measure a rodent's distance from the arena's wall, but this measure is corrupted if the arena is not a perfect circle, as required in the protocol. The problem is addressed by estimating robustly the actual boundary of the arena and its center using a nonparametric regression quantile of the behavioral data, with the aid of a fast algorithm developed for that purpose.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS304 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Combining inflation-free and iterative ensemble Kalman filters for strongly nonlinear systems

    Get PDF
    International audienceThe finite-size ensemble Kalman filter (EnKF-N) is an ensemble Kalman filter (EnKF) which, in perfect model condition, does not require inflation because it partially accounts for the ensemble sampling errors. For the Lorenz '63 and '95 toy-models, it was so far shown to perform as well or better than the EnKF with an optimally tuned inflation. The iterative ensemble Kalman filter (IEnKF) is an EnKF which was shown to perform much better than the EnKF in strongly nonlinear conditions, such as with the Lorenz '63 and '95 models, at the cost of iteratively updating the trajectories of the ensemble members. This article aims at further exploring the two filters and at combining both into an EnKF that does not require inflation in perfect model condition, and which is as efficient as the IEnKF in very nonlinear conditions. In this study, EnKF-N is first introduced and a new implementation is developed. It decomposes EnKF-N into a cheap two-step algorithm that amounts to computing an optimal inflation factor. This offers a justification of the use of the inflation technique in the traditional EnKF and why it can often be efficient. Secondly, the IEnKF is introduced following a new implementation based on the Levenberg-Marquardt optimisation algorithm. Then, the two approaches are combined to obtain the finite-size iterative ensemble Kalman filter (IEnKF-N). Several numerical experiments are performed on IEnKF-N with the Lorenz '95 model. These experiments demonstrate its numerical efficiency as well as its performance that offer, at least, the best of both filters. We have also selected a demanding case based on the Lorenz '63 model that points to ways to improve the finite-size ensemble Kalman filters. Eventually, IEnKF-N could be seen as the first brick of an efficient ensemble Kalman smoother for strongly nonlinear systems

    Joint state and parameter estimation with an iterative ensemble Kalman smoother

    Get PDF
    International audienceBoth ensemble filtering and variational data assimilation methods have proven useful in the joint estimation of state variables and parameters of geophysical models. Yet, their respective benefits and drawbacks in this task are distinct. An ensemble variational method, known as the iterative ensemble Kalman smoother (IEnKS) has recently been introduced. It is based on an adjoint model-free variational, but flow-dependent, scheme. As such, the IEnKS is a candidate tool for joint state and parameter estimation that may inherit the benefits from both the ensemble filtering and variational approaches. In this study, an augmented state IEnKS is tested on its estimation of the forcing parameter of the Lorenz-95 model. Since joint state and parameter estimation is especially useful in applications where the forcings are uncertain but nevertheless determining, typically in atmospheric chemistry, the augmented state IEnKS is tested on a new low-order model that takes its meteorological part from the Lorenz-95 model, and its chemical part from the advection diffusion of a tracer. In these experiments, the IEnKS is compared to the ensemble Kalman filter, the ensemble Kalman smoother, and a 4D-Var, which are considered the methods of choice to solve these joint estimation problems. In this low-order model context, the IEnKS is shown to significantly outperform the other methods regardless of the length of the data assimilation win- dow, and for present time analysis as well as retrospective analysis. Besides which, the performance of the IEnKS is even more striking on parameter estimation; getting close to the same performance with 4D-Var is likely to require both a long data assimilation window and a complex modeling of the background statistics

    Statistical Analysis of a Telephone Call Center: A Queueing-Science Perspective

    Get PDF
    A call center is a service network in which agents provide telephone-based services. Customers that seek these services are delayed in tele-queues. This paper summarizes an analysis of a unique record of call center operations. The data comprise a complete operational history of a small banking call center, call by call, over a full year. Taking the perspective of queueing theory, we decompose the service process into three fundamental components: arrivals, customer abandonment behavior and service durations. Each component involves different basic mathematical structures and requires a different style of statistical analysis. Some of the key empirical results are sketched, along with descriptions of the varied techniques required. Several statistical techniques are developed for analysis of the basic components. One of these is a test that a point process is a Poisson process. Another involves estimation of the mean function in a nonparametric regression with lognormal errors. A new graphical technique is introduced for nonparametric hazard rate estimation with censored data. Models are developed and implemented for forecasting of Poisson arrival rates. We then survey how the characteristics deduced from the statistical analyses form the building blocks for theoretically interesting and practically useful mathematical models for call center operations. Key Words: call centers, queueing theory, lognormal distribution, inhomogeneous Poisson process, censored data, human patience, prediction of Poisson rates, Khintchine-Pollaczek formula, service times, arrival rate, abandonment rate, multiserver queues.

    TOPAZ4: an ocean-sea ice data assimilation system for the North Atlantic and Arctic

    Get PDF
    We present a detailed description of TOPAZ4, the latest version of TOPAZ – a coupled ocean-sea ice data assimilation system for the North Atlantic Ocean and Arctic. It is the only operational, large-scale ocean data assimilation system that uses the ensemble Kalman filter. This means that TOPAZ features a time-evolving, state-dependent estimate of the state error covariance. Based on results from the pilot MyOcean reanalysis for 2003–2008, we demonstrate that TOPAZ4 produces a realistic estimate of the ocean circulation in the North Atlantic and the sea-ice variability in the Arctic. We find that the ensemble spread for temperature and sea-level remains fairly constant throughout the reanalysis demonstrating that the data assimilation system is robust to ensemble collapse. Moreover, the ensemble spread for ice concentration is well correlated with the actual errors. This indicates that the ensemble statistics provide reliable state-dependent error estimates – a feature that is unique to ensemble-based data assimilation systems. We demonstrate that the quality of the reanalysis changes when different sea surface temperature products are assimilated, or when in-situ profiles below the ice in the Arctic Ocean are assimilated. We find that data assimilation improves the match to independent observations compared to a free model. Improvements are particularly noticeable for ice thickness, salinity in the Arctic, and temperature in the Fram Strait, but not for transport estimates or underwater temperature. At the same time, the pilot reanalysis has revealed several flaws in the system that have degraded its performance. Finally, we show that a simple bias estimation scheme can effectively detect the seasonal or constant bias in temperature and sea-level

    Statistical Analysis of a Telephone Call Center

    Get PDF
    A call center is a service network in which agents provide telephone-based services. Customers who seek these services are delayed in tele-queues. This article summarizes an analysis of a unique record of call center operations. The data comprise a complete operational history of a small banking call center, call by call, over a full year. Taking the perspective of queueing theory, we decompose the service process into three fundamental components: arrivals, customer patience, and service durations. Each component involves different basic mathematical structures and requires a different style of statistical analysis. Some of the key empirical results are sketched, along with descriptions of the varied techniques required. Several statistical techniques are developed for analysis of the basic components. One of these techniques is a test that a point process is a Poisson process. Another involves estimation of the mean function in a nonparametric regression with lognormal errors. A new graphical technique is introduced for nonparametric hazard rate estimation with censored data. Models are developed and implemented for forecasting of Poisson arrival rates. Finally, the article surveys how the characteristics deduced from the statistical analyses form the building blocks for theoretically interesting and practically useful mathematical models for call center operations

    Extending the square root method to account for additive forecast noise in ensemble methods

    Get PDF
    A square root approach is considered for the problem of accounting for model noise in the forecast step of the ensemble Kalman filter (EnKF) and related algorithms. The primary aim is to replace the method of simulated, pseudo-random additive so as to eliminate the associated sampling errors. The core method is based on the analysis step of ensemble square root filters, and consists in the deterministic computation of a transform matrix. The theoretical advantages regarding dynamical consistency are surveyed, applying equally well to the square root method in the analysis step. A fundamental problem due to the limited size of the ensemble subspace is discussed, and novel solutions that complement the core method are suggested and studied. Benchmarks from twin experiments with simple, low-order dynamics indicate improved performance over standard approaches such as additive, simulated noise, and multiplicative inflation

    Dynamics of large-amplitude geostrophic flows over bottom topography

    No full text
    International audienceWe examine the interaction of near-surface and near- bottom flows over bottom topography. A set of asymptotic equations for geostrophic currents in a three-layer fluid is derived. The depths of the active (top/bottom) layers are assumed small, the slope of the bottom is weak, the interfacial displacement is comparable to the depths of the thinner layers. Using the equations derived, we examine the stability of parallel flows and circular eddies. It is demonstrated that eddies with non-zero near-surface component are always unstable; eddies localized in the near-bottom layer may be stable subject to additional restrictions imposed on their horizontal profiles and bottom topography
    corecore