11 research outputs found
Spline regression for zero-inflated models
We propose a regression model for count data when the classical generalized
linear model approach is too rigid due to a high outcome of zero counts and a
nonlinear influence of continuous covariates. Zero-Inflation is applied to take
into account the presence of excess zeros with separate link functions for the
zero and the nonzero component. Nonlinearity in covariates is captured by
spline functions based on B-splines. Our algorithm relies on maximum-likelihood
estimation and allows for adaptive box-constrained knots, thus improving the
goodness of the spline fit and allowing for detection of sensitivity
changepoints. A simulation study substantiates the numerical stability of the
algorithm to infer such models. The AIC criterion is shown to serve well for
model selection, in particular if nonlinearities are weak such that BIC tends
to overly simplistic models. We fit the introduced models to real data of
children's dental sanity, linking caries counts with the so-called
Body-Mass-Index (BMI) and other socioeconomic factors. This reveals a puzzling
nonmonotonic influence of BMI on caries counts which is yet to be explained by
clinical experts
Broadband near-infrared astronomical spectrometer calibration and on-sky validation with an electro-optic laser frequency comb
The quest for extrasolar planets and their characterisation as well as
studies of fundamental physics on cosmological scales rely on capabilities of
high-resolution astronomical spectroscopy. A central requirement is a precise
wavelength calibration of astronomical spectrographs allowing for extraction of
subtle wavelength shifts from the spectra of stars and quasars. Here, we
present an all-fibre, 400 nm wide near-infrared frequency comb based on
electro-optic modulation with 14.5 GHz comb line spacing. Tests on the
high-resolution, near-infrared spectrometer GIANO-B show a photon-noise limited
calibration precision of <10 cm/s as required for Earth-like planet detection.
Moreover, the presented comb provides detailed insight into particularities of
the spectrograph such as detector inhomogeneities and differential spectrograph
drifts. The system is validated in on-sky observations of a radial velocity
standard star (HD221354) and telluric atmospheric absorption features. The
advantages of the system include simplicity, robustness and turn-key operation,
features that are valuable at the observation sites
Spatial interpolation of high-frequency monitoring data
Climate modelers generally require meteorological information on regular
grids, but monitoring stations are, in practice, sited irregularly. Thus, there
is a need to produce public data records that interpolate available data to a
high density grid, which can then be used to generate meteorological maps at a
broad range of spatial and temporal scales. In addition to point predictions,
quantifications of uncertainty are also needed. One way to accomplish this is
to provide multiple simulations of the relevant meteorological quantities
conditional on the observed data taking into account the various uncertainties
in predicting a space-time process at locations with no monitoring data. Using
a high-quality dataset of minute-by-minute measurements of atmospheric pressure
in north-central Oklahoma, this work describes a statistical approach to
carrying out these conditional simulations. Based on observations at 11
stations, conditional simulations were produced at two other sites with
monitoring stations. The resulting point predictions are very accurate and the
multiple simulations produce well-calibrated prediction uncertainties for
temporal changes in atmospheric pressure but are substantially overconservative
for the uncertainties in the predictions of (undifferenced) pressure.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS208 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Parameter estimation of stochastic differential equation
Non-parametric modeling is a method which relies heavily on data and motivated by the smoothness properties in estimating a function which involves spline and non-spline approaches. Spline approach consists of regression spline and smoothing spline. Regression spline with Bayesian approach is considered in the first step of a two-step method in estimating the structural parameters for stochastic differential equation (SDE). The selection of knot and order of spline can be done heuristically based on the scatter plot. To overcome the subjective and tedious process of selecting the optimal knot and order of spline, an algorithm was proposed. A single optimal knot is selected out of all the points with exception of the first and the last data which gives the least value of Generalized Cross Validation (GCV) for each order of spline. The use is illustrated using observed data of opening share prices of Petronas Gas Bhd. The results showed that the Mean Square Errors (MSE) for stochastic model with parameters estimated using optimal knot for 1,000, 5,000 and 10,000 runs of Brownian motions are smaller than the SDE models with estimated parameters using knot selected heuristically. This verified the viability of the two-step method in the estimation of the drift and diffusion parameters of SDE with an improvement of a single knot selection
DYNAMIC RELATIONSHIPS: E-COMMERCE SALES AND KEY EXOGENOUS VARIABLES IN THE PHILIPPINES
This study delves into the complex and evolving landscape of e-commerce in the Philippines, focusing on the relationship between E-Commerce Sales as the endogenous variable and a set of influential exogenous variables, including Digital Marketing Spending, GDP Growth, Internet Penetration, and Mobile Phone Ownership. This research employs a flexible spline modeling approach, uncovers non-linear associations, and offers significant implications for academic understanding and practical applications. The findings underscore the growing impact of Digital Marketing Spending on E-Commerce Sales, revealing the paramount role of online advertising and promotional strategies in the digital marketplace. Moreover, the study explains the intricate interplay between GDP Growth, Internet Penetration, Mobile Phone Ownership, and E-Commerce Sales, highlighting the non-linear nature of these relationships. As the Philippines continues its economic expansion and technological integration, these associations exhibit insightful implications for policymakers, businesses, and e-commerce stakeholders. Article visualizations
Effective strategies for segmenting data into coherent subsets
Automatic segmentation of data into coherent subsets is important in applications as varied as signal processing, bioinformatics and pharmacology. Under this general framework, we investigate the problem of data-driven reconstruction of an unknown, piecewise-constant density function and propose two methods to solve it; the first is directly inspired by the segmentation approach, whereas the second uses a maximum likelihood approach. Motivated by a problem in pharmacometrics, we then introduce a segmentation algorithm which fits into the same general framework and is used for automatically binning data for model assessment purposes
Bayesian inference of virus evolutionary models from next-generation sequencing data
There is a rich tradition in mathematical biology of modeling virus population dynamics within hosts. Such models can reproduce trends in the progression of viral infections such as HIV-1, and have also generated insights on the emergence of drug resistance and treatment strategies. Existing mathematical work has focused on the problem of predicting dynamics given model parameters. The problem of estimating model parameters from observed data has received little attention. One reason is likely the historical difficulty of obtaining high-resolution samples of virus diversity within hosts. Now, next-generation sequencing (NGS) approaches developed in the past decade can supply such data.
This thesis presents two Bayesian methods that harness classical models to generate testable hypotheses from NGS datasets. The quasispecies equilibrium explains genetic variation in virus populations as a balance between mutation and selection. We use this model to infer fitness effects of individual mutations and pairs of interacting mutations. Although our method provides a high resolution and accurate picture of the fitness landscape when equilibrium holds, we demonstrate the common observation of populations with coexisting, divergent viruses is unlikely to be consistent with equilibrium. Our second statistical method estimates virus growth rates and binding affinity between viruses and antibodies using the generalized Lotka-Volterra model. Immune responses can explain coexistence of abundant virus variants and their trajectories through time. Additionally, we can draw inferences about immune escape and antibody genetic variants responsible for improved virus recognition