11 research outputs found

    Spline regression for zero-inflated models

    Full text link
    We propose a regression model for count data when the classical generalized linear model approach is too rigid due to a high outcome of zero counts and a nonlinear influence of continuous covariates. Zero-Inflation is applied to take into account the presence of excess zeros with separate link functions for the zero and the nonzero component. Nonlinearity in covariates is captured by spline functions based on B-splines. Our algorithm relies on maximum-likelihood estimation and allows for adaptive box-constrained knots, thus improving the goodness of the spline fit and allowing for detection of sensitivity changepoints. A simulation study substantiates the numerical stability of the algorithm to infer such models. The AIC criterion is shown to serve well for model selection, in particular if nonlinearities are weak such that BIC tends to overly simplistic models. We fit the introduced models to real data of children's dental sanity, linking caries counts with the so-called Body-Mass-Index (BMI) and other socioeconomic factors. This reveals a puzzling nonmonotonic influence of BMI on caries counts which is yet to be explained by clinical experts

    Broadband near-infrared astronomical spectrometer calibration and on-sky validation with an electro-optic laser frequency comb

    Get PDF
    The quest for extrasolar planets and their characterisation as well as studies of fundamental physics on cosmological scales rely on capabilities of high-resolution astronomical spectroscopy. A central requirement is a precise wavelength calibration of astronomical spectrographs allowing for extraction of subtle wavelength shifts from the spectra of stars and quasars. Here, we present an all-fibre, 400 nm wide near-infrared frequency comb based on electro-optic modulation with 14.5 GHz comb line spacing. Tests on the high-resolution, near-infrared spectrometer GIANO-B show a photon-noise limited calibration precision of <10 cm/s as required for Earth-like planet detection. Moreover, the presented comb provides detailed insight into particularities of the spectrograph such as detector inhomogeneities and differential spectrograph drifts. The system is validated in on-sky observations of a radial velocity standard star (HD221354) and telluric atmospheric absorption features. The advantages of the system include simplicity, robustness and turn-key operation, features that are valuable at the observation sites

    Spatial interpolation of high-frequency monitoring data

    Full text link
    Climate modelers generally require meteorological information on regular grids, but monitoring stations are, in practice, sited irregularly. Thus, there is a need to produce public data records that interpolate available data to a high density grid, which can then be used to generate meteorological maps at a broad range of spatial and temporal scales. In addition to point predictions, quantifications of uncertainty are also needed. One way to accomplish this is to provide multiple simulations of the relevant meteorological quantities conditional on the observed data taking into account the various uncertainties in predicting a space-time process at locations with no monitoring data. Using a high-quality dataset of minute-by-minute measurements of atmospheric pressure in north-central Oklahoma, this work describes a statistical approach to carrying out these conditional simulations. Based on observations at 11 stations, conditional simulations were produced at two other sites with monitoring stations. The resulting point predictions are very accurate and the multiple simulations produce well-calibrated prediction uncertainties for temporal changes in atmospheric pressure but are substantially overconservative for the uncertainties in the predictions of (undifferenced) pressure.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS208 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Parameter estimation of stochastic differential equation

    Get PDF
    Non-parametric modeling is a method which relies heavily on data and motivated by the smoothness properties in estimating a function which involves spline and non-spline approaches. Spline approach consists of regression spline and smoothing spline. Regression spline with Bayesian approach is considered in the first step of a two-step method in estimating the structural parameters for stochastic differential equation (SDE). The selection of knot and order of spline can be done heuristically based on the scatter plot. To overcome the subjective and tedious process of selecting the optimal knot and order of spline, an algorithm was proposed. A single optimal knot is selected out of all the points with exception of the first and the last data which gives the least value of Generalized Cross Validation (GCV) for each order of spline. The use is illustrated using observed data of opening share prices of Petronas Gas Bhd. The results showed that the Mean Square Errors (MSE) for stochastic model with parameters estimated using optimal knot for 1,000, 5,000 and 10,000 runs of Brownian motions are smaller than the SDE models with estimated parameters using knot selected heuristically. This verified the viability of the two-step method in the estimation of the drift and diffusion parameters of SDE with an improvement of a single knot selection

    DYNAMIC RELATIONSHIPS: E-COMMERCE SALES AND KEY EXOGENOUS VARIABLES IN THE PHILIPPINES

    Get PDF
    This study delves into the complex and evolving landscape of e-commerce in the Philippines, focusing on the relationship between E-Commerce Sales as the endogenous variable and a set of influential exogenous variables, including Digital Marketing Spending, GDP Growth, Internet Penetration, and Mobile Phone Ownership. This research employs a flexible spline modeling approach, uncovers non-linear associations, and offers significant implications for academic understanding and practical applications. The findings underscore the growing impact of Digital Marketing Spending on E-Commerce Sales, revealing the paramount role of online advertising and promotional strategies in the digital marketplace. Moreover, the study explains the intricate interplay between GDP Growth, Internet Penetration, Mobile Phone Ownership, and E-Commerce Sales, highlighting the non-linear nature of these relationships. As the Philippines continues its economic expansion and technological integration, these associations exhibit insightful implications for policymakers, businesses, and e-commerce stakeholders.  Article visualizations

    Effective strategies for segmenting data into coherent subsets

    Get PDF
    Automatic segmentation of data into coherent subsets is important in applications as varied as signal processing, bioinformatics and pharmacology. Under this general framework, we investigate the problem of data-driven reconstruction of an unknown, piecewise-constant density function and propose two methods to solve it; the first is directly inspired by the segmentation approach, whereas the second uses a maximum likelihood approach. Motivated by a problem in pharmacometrics, we then introduce a segmentation algorithm which fits into the same general framework and is used for automatically binning data for model assessment purposes

    Bayesian inference of virus evolutionary models from next-generation sequencing data

    Get PDF
    There is a rich tradition in mathematical biology of modeling virus population dynamics within hosts. Such models can reproduce trends in the progression of viral infections such as HIV-1, and have also generated insights on the emergence of drug resistance and treatment strategies. Existing mathematical work has focused on the problem of predicting dynamics given model parameters. The problem of estimating model parameters from observed data has received little attention. One reason is likely the historical difficulty of obtaining high-resolution samples of virus diversity within hosts. Now, next-generation sequencing (NGS) approaches developed in the past decade can supply such data. This thesis presents two Bayesian methods that harness classical models to generate testable hypotheses from NGS datasets. The quasispecies equilibrium explains genetic variation in virus populations as a balance between mutation and selection. We use this model to infer fitness effects of individual mutations and pairs of interacting mutations. Although our method provides a high resolution and accurate picture of the fitness landscape when equilibrium holds, we demonstrate the common observation of populations with coexisting, divergent viruses is unlikely to be consistent with equilibrium. Our second statistical method estimates virus growth rates and binding affinity between viruses and antibodies using the generalized Lotka-Volterra model. Immune responses can explain coexistence of abundant virus variants and their trajectories through time. Additionally, we can draw inferences about immune escape and antibody genetic variants responsible for improved virus recognition
    corecore