48,879 research outputs found
Binscatter Regressions
We introduce the \texttt{Stata} (and \texttt{R}) package \textsf{Binsreg},
which implements the binscatter methods developed in
\citet*{Cattaneo-Crump-Farrell-Feng_2019_Binscatter}. The package includes the
commands \texttt{binsreg}, \texttt{binsregtest}, and \texttt{binsregselect}.
The first command (\texttt{binsreg}) implements binscatter for the regression
function and its derivatives, offering several point estimation, confidence
intervals and confidence bands procedures, with particular focus on
constructing binned scatter plots. The second command (\texttt{binsregtest})
implements hypothesis testing procedures for parametric specification and for
nonparametric shape restrictions of the unknown regression function. Finally,
the third command (\texttt{binsregselect}) implements data-driven number of
bins selectors for binscatter implementation using either quantile-spaced or
evenly-spaced binning/partitioning. All the commands allow for covariate
adjustment, smoothness restrictions, weighting and clustering, among other
features. A companion \texttt{R} package with the same capabilities is also
available
Reference values: a review
Reference values are used to describe the dispersion of variables in healthy individuals. They are usually reported as population-based reference intervals (RIs) comprising 95% of the healthy population. International recommendations
state the preferred method as a priori nonparametric determination from at least 120 reference individuals, but acceptable alternative methods include transference or validation from previously established RIs. The most critical steps in the determination of reference values are the selection of reference individuals based on extensively documented inclusion and exclusion criteria and the use of quality-controlled analytical procedures. When only small numbers of values are available, RIs can be estimated by
new methods, but reference limits thus obtained may be highly imprecise. These recommendations are a challenge in veterinary clinical pathology, especially when only small numbers of reference individuals are available
Revisiting the contribution of transpiration to global terrestrial evapotranspiration
Even though knowing the contributions of transpiration (T), soil and open water evaporation (E), and interception (I) to terrestrial evapotranspiration (ET=T+E+I) is crucial for understanding the hydrological cycle and its connection to ecological processes, the fraction of T is unattainable by traditional measurement techniques over large scales. Previously reported global mean T/(E+T+I) from multiple independent sources, including satellite-based estimations, reanalysis, land surface models, and isotopic measurements, varies substantially from 24% to 90%. Here we develop a new ET partitioning algorithm, which combines global evapotranspiration estimates and relationships between leaf area index (LAI) and T/(E+T) for different vegetation types, to upscale a wide range of published site-scale measurements. We show that transpiration accounts for about 57.2% (with standard deviation6.8%) of global terrestrial ET. Our approach bridges the scale gap between site measurements and global model simulations,and can be simply implemented into current global climate models to improve biological CO2 flux simulations
Improved model identification for non-linear systems using a random subsampling and multifold modelling (RSMM) approach
In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance
Maximum likelihood and pseudo score approaches for parametric time-to-event analysis with informative entry times
We develop a maximum likelihood estimating approach for time-to-event Weibull
regression models with outcome-dependent sampling, where sampling of subjects
is dependent on the residual fraction of the time left to developing the event
of interest. Additionally, we propose a two-stage approach which proceeds by
iteratively estimating, through a pseudo score, the Weibull parameters of
interest (i.e., the regression parameters) conditional on the inverse
probability of sampling weights; and then re-estimating these weights (given
the updated Weibull parameter estimates) through the profiled full likelihood.
With these two new methods, both the estimated sampling mechanism parameters
and the Weibull parameters are consistently estimated under correct
specification of the conditional referral distribution. Standard errors for the
regression parameters are obtained directly from inverting the observed
information matrix in the full likelihood specification and by either
calculating bootstrap or robust standard errors for the hybrid pseudo
score/profiled likelihood approach. Loss of efficiency with the latter approach
is considered. Robustness of the proposed methods to misspecification of the
referral mechanism and the time-to-event distribution is also briefly examined.
Further, we show how to extend our methods to the family of parametric
time-to-event distributions characterized by the generalized gamma
distribution. The motivation for these two approaches came from data on time to
cirrhosis from hepatitis C viral infection in patients referred to the
Edinburgh liver clinic. We analyze these data here.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS725 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Multi-View Face Recognition From Single RGBD Models of the Faces
This work takes important steps towards solving the following problem of current interest: Assuming that each individual in a population can be modeled by a single frontal RGBD face image, is it possible to carry out face recognition for such a population using multiple 2D images captured from arbitrary viewpoints? Although the general problem as stated above is extremely challenging, it encompasses subproblems that can be addressed today. The subproblems addressed in this work relate to: (1) Generating a large set of viewpoint dependent face images from a single RGBD frontal image for each individual; (2) using hierarchical approaches based on view-partitioned subspaces to represent the training data; and (3) based on these hierarchical approaches, using a weighted voting algorithm to integrate the evidence collected from multiple images of the same face as recorded from different viewpoints. We evaluate our methods on three datasets: a dataset of 10 people that we created and two publicly available datasets which include a total of 48 people. In addition to providing important insights into the nature of this problem, our results show that we are able to successfully recognize faces with accuracies of 95% or higher, outperforming existing state-of-the-art face recognition approaches based on deep convolutional neural networks
- …