41 research outputs found

    Parsimonious Time Series Clustering

    Full text link
    We introduce a parsimonious model-based framework for clustering time course data. In these applications the computational burden becomes often an issue due to the number of available observations. The measured time series can also be very noisy and sparse and a suitable model describing them can be hard to define. We propose to model the observed measurements by using P-spline smoothers and to cluster the functional objects as summarized by the optimal spline coefficients. In principle, this idea can be adopted within all the most common clustering frameworks. In this work we discuss applications based on a k-means algorithm. We evaluate the accuracy and the efficiency of our proposal by simulations and by dealing with drosophila melanogaster gene expression data

    Global media as an early warning tool for food fraud; an assessment of MedISys-FF

    Get PDF
    Food fraud is a serious problem that may compromise the safety of the food products being sold on the market. Previous studies have shown that food fraud is associated with a large variety of food products and the fraud type may vary from deliberate changing of the food product (i.e. substitution, tampering, dilution etc.) to the manipulation of documents. It is therefore important that all actors within the food supply chain (food producers, authorities), have methodologies and tools available to detect fraudulent products at an early stage so that preventative measures can be taken. Several of such systems exist (i.e. iRASFF, EMA, HorizonScan, AAC-FF, MedISys-FF), but currently only MedISys-FF is publicly online available. In this study, we analyzed food fraud cases collected by MedISys-FF over a 6-year period (2015–2020) and show global trends and developments in food fraud activities. In the period investigated, the system has collected 4375 articles on food fraud incidents from 164 countries in 41 different languages. Fraud with meat and meat products were most frequently reported (27.7%), followed by milk and milk products (10.5%), cereal and bakery products (8.3%), and fish and fish products (7.7%). Most of the fraud was related to expiration date (58.3%) followed by tampering (22.2%) and mislabeling of country of origin (11.4%). Network analysis showed that the focus of the articles was on food products being frauded. The validity of MedISys-FF as an early warning system was demonstrated with COVID- 19. The system has collected articles discussing potential food fraud risks due to the COVID-19 crisis. We therefore conclude that MedISys-FF is a very useful tool to detect early trends in food fraud and may be used by all actors in the food system to ensure safe, healthy, and authentic food

    Composite smooth estimation of the state price density implied in option prices

    Full text link
    peer reviewedWe propose a new semi-parametric approach for the estimation of the State Price Density (SPD) implied in option prices. Our procedure is inspired by a Penalized Composite Link Model (PCLM) approach and ensures smooth and arbitrage-free estimates

    Splines, differential equations and optimal smoothing

    Get PDF
    In many scientific areas it is of primary interest to describe the dynamics of a system, that is, how it evolves over time and/or space. In the simple one-dimensional case the state of a system at any time can be represented by a function u(t) which' values track the evolution of a given phenomenon over time. It is also possible to consider phenomena evolving in time and space by using a function u(t,x) depending on two independent variables. Thus knowing t and/or x it is possible to evaluate the state of the system at at a given point in space and/or time. One way to obtain u(.) is taking measurements at different values of the independent variable(s) and fit the data in order to estimate a formula for u. This is the point of view exploited in statistical data analysis. Overparametric regression (smoothing) techniques are usually applied in this kind of studies. On the other hand it is clear that such a model would tell us how the system evolves but is not able to clarify why the system behaves as has been observed. Therefore we try to formulate mathematical models summarizing the understanding we seek. Often these models are dynamic equations that relate the state function to one or more of its derivatives w.r.t. the independent variable(s). Such equations are called differential equations (or abbreviated as DE). Differential equations are common analytic tools in physics and engineering. The first approach we have cited has the advantage to make the description of an observed phenomenon really flexible, being able to exploit all the information provided by the observed measurements. It becomes clear if we consider the applicability of overparametric smoothing techniques. These approaches, by the way, do not allow for a physical interpretation of the observed dynamics. On the other hand, the main advantage of a differential modeling point of view is to highlight the physical determinants of a given phenomenon but completely ignore what has been observed. The aim of this work is to present a flexible way to combine the statistical and the dynamic modeling points of view. To reach this goal we combine in a convenient way the flexible data description provided by a semi-parametric regression analysis and the physical interpretability of dynamics summarized by differential equations

    Bayesian inference in an extended SEIR model with nonparametric disease transmission rate: an application to the Ebola epidemic in Sierra Leone

    Full text link
    The 2014 Ebola outbreak in Sierra Leone is analyzed using an extension of the SEIR compartmental model. The unknown parameters of the system of differential equations are estimated by combining data on the number of new (laboratory confirmed) Ebola cases reported by the Ministry of Health and prior distributions for the transition rates elicited using information collected by the WHO Response Team (2014) during the follow-up of specific Ebola cases. The evolution over time of the disease transmission rate is modeled nonparametrically using penalized B-splines. Our framework represents a valuable and robust stochastic tool for the study of an epidemic dynamic from irregular and possibly aggregated case data. Simulations and the analysis of the 2014 Sierra Leone Ebola data highlight the merits of the proposed methodology.IAP research network P7/06 (StUDyS

    Bayesian inference in an extended SEIR model with nonparametric disease transmission rate: an application to the Ebola epidemic in Sierra Leone

    No full text
    The 2014 Ebola outbreak in Sierra Leone is analyzed using a susceptible-exposed-infectious-removed (SEIR) epidemic compartmental model. The discrete time-stochastic model for the epidemic evolution is coupled to a set of ordinary differential equations describing the dynamics of the expected proportions of subjects in each epidemic state. The unknown parameters are estimated in a Bayesian framework by combining data on the number of new (laboratory confirmed) Ebola cases reported by the Ministry of Health and prior distributions for the transition rates elicited using information collected by the WHO during the follow-up of specific Ebola cases. The time-varying disease transmission rate is modeled in a flexible way using penalized B-splines. Our framework represents a valuable stochastic tool for the study of an epidemic dynamic even when only irregularly observed and possibly aggregated data are available. Simulations and the analysis of the 2014 Sierra Leone Ebola data highlight the merits of the proposed methodology. In particular, the flexible modeling of the disease transmission rate makes the estimation of the effective reproduction number robust to the misspecification of the initial epidemic states and to underreporting of the infectious cases

    L-surface and V-valley for optimal anisotropic 2D smoothing

    Full text link
    We present the L-surface as an attractive generalization of the L-curve framework for the selection of the optimal smoothing parameters in two dimensional applications. It preserves the desirable features of its unidimensional analogous. The optimal amount of smoothing is indicated by the pair of parameters located in the point of maximum (Gaussian) curvature. Locate this point on a discrete parametric surface can be not straightforward. We introduce the V-valley as a simplified selection criterion based on distance minimization

    An innovative procedure for smoothing parameter selection

    Full text link
    peer reviewedSmoothing with penalized splines calls for an automatic method to select the size of the penalty parameter λ. We propose a not well known smoothing parameter selection procedure: the L-curve method. AIC and (generalized) cross validation represent the most common choices in this kind of problems even if they indicate light smoothing when the data represent a smooth trend plus correlated noise. In those cases the L-curve is a computationally efficient alternative and robust alternative

    Smooth deconvolution of low-field NMR signals

    No full text
    Background: Low resolution nuclear magnetic resonance (LR-NMR) is a common technique to identify the constituents of complex materials (such as food and biological samples). The output of LR-NMR experiments is a relaxation signal which can be modelled as a type of convolution of an unknown density of relaxation times with decaying exponential functions, plus random Gaussian noise. The challenge is to estimate that density, a severely ill-posed problem. A complication is that non-negativity constraints need to be imposed in order to obtain valid results. Significance and novelty: We present a smooth deconvolution model for solution of the inverse estimation problem in LR-NMR relaxometry experiments. We model the logarithm of the relaxation time density as a smooth function using (adaptive) P-splines while matching the expected residual magnetisations with the observed ones. The roughness penalty removes the singularity of the deconvolution problem, and the estimated density is positive by design (since we model its logarithm). The model is non-linear, but it can be linearized easily. The penalty has to be tuned for each given sample. We describe an efficient EM-type algorithm to optimize the smoothing parameter(s). Results: We analyze a set of food samples (potato tubers). The relaxation spectra extracted using our method are similar to the ones described in the previous experiments but present sharper peaks. Using penalized signal regression we are able to accurately predict dry matter content of the samples using the estimated spectra as covariates.</p

    Smoothing parameter selection using the L-curve

    Full text link
    peer reviewedThe L-curve method has been used to select the penalty parameter in ridge regression. We show that it is also very attractive for smoothing, because of its low computational load. Surprisingly, it also is almost insensitive to serial correlation
    corecore