13 research outputs found
Quantifying alternative splicing from paired-end RNA-sequencing data
RNA-sequencing has revolutionized biomedical research and, in particular, our
ability to study gene alternative splicing. The problem has important
implications for human health, as alternative splicing may be involved in
malfunctions at the cellular level and multiple diseases. However, the
high-dimensional nature of the data and the existence of experimental biases
pose serious data analysis challenges. We find that the standard data summaries
used to study alternative splicing are severely limited, as they ignore a
substantial amount of valuable information. Current data analysis methods are
based on such summaries and are hence suboptimal. Further, they have limited
flexibility in accounting for technical biases. We propose novel data summaries
and a Bayesian modeling framework that overcome these limitations and determine
biases in a nonparametric, highly flexible manner. These summaries adapt
naturally to the rapid improvements in sequencing technology. We provide
efficient point estimates and uncertainty assessments. The approach allows to
study alternative splicing patterns for individual samples and can also be the
basis for downstream analyses. We found a severalfold improvement in estimation
mean square error compared popular approaches in simulations, and substantially
higher consistency between replicates in experimental data. Our findings
indicate the need for adjusting the routine summarization and analysis of
alternative splicing RNA-seq studies. We provide a software implementation in
the R package casper.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS687 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org). With correction
A Bayesian time-varying autoregressive model for improved shortâterm and longâterm prediction
Motivated by the application to German interest rates, we propose a timeâvarying autoregressive model for shortâterm and longâterm prediction of time series that exhibit a temporary nonstationary behavior but are assumed to mean revert in the long run. We use a Bayesian formulation to incorporate prior assumptions on the mean reverting process in the model and thereby regularize predictions in the far future. We use MCMCâbased inference by deriving relevant full conditional distributions and employ a MetropolisâHastings within Gibbs sampler approach to sample from the posterior (predictive) distribution. In combining dataâdriven shortâterm predictions with longâterm distribution assumptions our model is competitive to the existing methods in the short horizon while yielding reasonable predictions in the long run. We apply our model to interest rate data and contrast the forecasting performance to that of a 2âAdditiveâFactor Gaussian model as well as to the predictions of a dynamic NelsonâSiegel model.Peer Reviewe
Elastic analysis of irregularly or sparsely sampled curves
We provide statistical analysis methods for samples of curves in two or more dimensions, where the image, but not the parameterization of the curves, is of interest and suitable alignment/registration is thus necessary. Examples are handwritten letters, movement paths, or object outlines. We focus in particular on the computation of (smooth) means and distances, allowing, for example, classification or clustering. Existing parameterization invariant analysis methods based on the elastic distance of the curves modulo parameterization, using the squareârootâvelocity framework, have limitations in common realistic settings where curves are irregularly and potentially sparsely observed. We propose using spline curves to model smooth or polygonal (FrĂ©chet) means of open or closed curves with respect to the elastic distance and show identifiability of the spline model modulo parameterization. We further provide methods and algorithms to approximate the elastic distance for irregularly or sparsely observed curves, via interpreting them as polygons. We illustrate the usefulness of our methods on two datasets. The first application classifies irregularly sampled spirals drawn by Parkinson's patients and healthy controls, based on the elastic distance to a mean spiral curve computed using our approach. The second application clusters sparsely sampled GPS tracks based on the elastic distance and computes smooth cluster means to find new paths on the Tempelhof field in Berlin. All methods are implemented in the Râpackage âelasdicsâ and evaluated in simulations.Peer Reviewe
Functional Additive Models on Manifolds of Planar Shapes and Forms
The âshapeâ of a planar curve and/or landmark configuration is considered its equivalence class under translation, rotation, and scaling, its âformâ its equivalence class under translation and rotation while scale is preserved. We extend generalized additive regression to models for such shapes/forms as responses respecting the resulting quotient geometry by employing the squared geodesic distance as loss function and a geodesic response function to map the additive predictor to the shape/form space. For fitting the model, we propose a Riemannian L2-Boosting algorithm well suited for a potentially large number of possibly parameter-intensive model terms, which also yields automated model selection. We provide novel intuitively interpretable visualizations for (even nonlinear) covariate effects in the shape/form space via suitable tensor-product factorization. The usefulness of the proposed framework is illustrated in an analysis of (a) astragalus shapes of wild and domesticated sheep and (b) cell forms generated in a biophysical model, as well as (c) in a realistic simulation study with response shapes and forms motivated from a dataset on bottle outlines. Supplementary materials for this article are available online.Peer Reviewe
Boosting Functional Response Models for Location, Scale and Shape with an Application to Bacterial Competition
We extend Generalized Additive Models for Location, Scale, and Shape (GAMLSS)
to regression with functional response. This allows us to simultaneously model
point-wise mean curves, variances and other distributional parameters of the
response in dependence of various scalar and functional covariate effects. In
addition, the scope of distributions is extended beyond exponential families.
The model is fitted via gradient boosting, which offers inherent model
selection and is shown to be suitable for both complex model structures and
highly auto-correlated response curves. This enables us to analyze bacterial
growth in \textit{Escherichia coli} in a complex interaction scenario,
fruitfully extending usual growth models.Comment: bootstrap confidence interval type uncertainty bounds added; minor
changes in formulation
Multivariate functional additive mixed models
Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.Peer Reviewe
Quantifying alternative splicing from paired-end RNA-sequencing data
RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing is involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence sub-optimal. Further, they have limited flexibility in accounting for technical biases. We propose novel data summaries and a Bayesian modeling framework that overcome these limitations and determine biases in a non-parametric, data-dependent manner. These summaries adapt naturally to the rapid improvements in sequencing technology. We provide efficient point estimates and uncertainty assessments. The approach allows to study alternative splicing patterns for individual samples and can also be the basis for downstream differential expression analysis. We found an over 5 fold improvement in estimation mean square error compared to a popular approach in simulations, and substantially higher correlations between replicates in experimental data. Our findings indicate the need for modifying the routine summarization and analysis of alternative splicing RNA-seq studies. We provide a software implementation in the R package casper
Recommended from our members
The effect of rapid relative humidity changes on fast filter-based aerosol-particle light-absorption measurements: Uncertainties and correction schemes
Measuring vertical profiles of the particle light-absorption coefficient by using absorption photometers may face the challenge of fast changes in relative humidity (RH). These absorption photometers determine the particle light-absorption coefficient due to a change in light attenuation through a particle-loaded filter. The filter material, however, takes up or releases water with changing relative humidity (RH in %), thus influencing the light attenuation. A sophisticated set of laboratory experiments was therefore conducted to investigate the effect of fast RH changes (dRH/dt) on the particle light-absorption coefficient (Ïabs in Mm-1) derived with two absorption photometers. The RH dependence was examined based on different filter types and filter loadings with respect to loading material and areal loading density. The Single Channel Tricolor Absorption Photometer (STAP) relies on quartz-fiber filter, and the microAethÂź MA200 is based on a polytetrafluoroethylene (PTFE) filter band. Furthermore, three cases were investigated: clean filters, filters loaded with black carbon (BC), and filters loaded with ammonium sulfate. The filter areal loading densities (Ïâ) ranged from 3.1 to 99.6âmgâm-2 in the case of the STAP and ammonium sulfate and 1.2 to 37.6âmgâm-2 in the case the MA200. Investigating BC-loaded cases, M8 scroll mrow miBCm 15pt was in the range of 2.9 to 43.0 and 1.1 to 16.3âmgâm-2 for the STAP and MA200, respectively. Both instruments revealed opposing responses to relative humidity changes ("RH) with different magnitudes. The STAP shows a linear dependence on relative humidity changes. The MA200 is characterized by a distinct exponential recovery after its filter was exposed to relative humidity changes. At a wavelength of 624ânm and for the default 60âs running average output, the STAP reveals an absolute change in Ïabs per absolute change of RH ("ÏabsÄâą"RH) of 0.14âMm-1â%-1 in the clean case, 0.29âMm-1â%-1 in the case of BC-loaded filters, and 0.21âMm-1â%-1 in the case filters loaded with ammonium sulfate. The 60âs running average of the particle light-absorption coefficient at 625ânm measured with the MA200 revealed a response of around -0.4âMm-1â%-1 for all three cases. Whereas the response of the STAP varies over the different loading materials, in contrast, the MA200 was quite stable. The response was, for the STAP, in the range of 0.17 to 0.24âMm-1â%-1 and, in the case of ammonium sulfate loading and in the BC-loaded case, 0.17 to 0.62âMm-1â%-1. In the ammonium sulfate case, the minimum response shown by the MA200 was -0.42 with a maximum of -0.36âMm-1â%-1 and a minimum of -0.42 and maximum -0.37âMm-1â%-1 in the case of BC. A linear correction function for the STAP was developed here. It is provided by correlating 1âHz resolved recalculated particle light-absorption coefficients and RH change rates. The linear response is estimated at 10.08âMm-1âs-1â%-1. A correction approach for the MA200 is also provided; however, the behavior of the MA200 is more complex. Further research and multi-instrument measurements have to be conducted to fully understand the underlying processes, since the correction approach resulted in different correction parameters across various experiments. However, the exponential recovery after the filter of the MA200 experienced a RH change could be reproduced. However, the given correction approach has to be estimated with other RH sensors as well, since each sensor has a different response time. And, for the given correction approaches, the uncertainties could not be estimated, which was mainly due to the response time of the RH sensor. Therefore, we do not recommend using the given approaches. But they point in the right direction, and despite the imperfections, they are useful for at least estimating the measurement uncertainties due to relative humidity changes. Due to our findings, we recommend using an aerosol dryer upstream of absorption photometers to reduce the RH effect significantly. Furthermore, when absorption photometers are used in vertical measurements, the ascending or descending speed through layers of large relative humidity gradients has to be low to minimize the observed RH effect. But this is simply not possible in some scenarios, especially in unmixed layers or clouds. Additionally, recording the RH of the sample stream allows correcting for the bias during post-processing of the data. This data correction leads to reasonable results, according to the given example in this study. © Author(s) 2019
Pedestrian exposure to black carbon and PM2.5 emissions in urban hot spots: new findings using mobile measurement techniques and flexible Bayesian regression models
Background
Data from extensive mobile measurements (MM) of air pollutants provide spatially resolved information on pedestriansâ exposure to particulate matter (black carbon (BC) and PM2.5 mass concentrations).
Objective
We present a distributional regression model in a Bayesian framework that estimates the effects of spatiotemporal factors on the pollutant concentrations influencing pedestrian exposure.
Methods
We modeled the mean and variance of the pollutant concentrations obtained from MM in two cities and extended commonly used lognormal models with a lognormal-normal convolution (logNNC) extension for BC to account for instrument measurement error.
Results
The logNNC extension significantly improved the BC model. From these model results, we found local sources and, hence, local mitigation efforts to improve air quality, have more impact on the ambient levels of BC mass concentrations than on the regulated PM2.5.
Significance
Firstly, this model (logNNC in bamlss package available in R) could be used for the statistical analysis of MM data from various study areas and pollutants with the potential for predicting pollutant concentrations in urban areas. Secondly, with respect to pedestrian exposure, it is crucial for BC mass concentration to be monitored and regulated in areas dominated by traffic-related air pollution