55 research outputs found
Nonparametric Estimation via Mixed Gradients
Traditional nonparametric estimation methods often lead to a slow convergence
rate in large dimensions and require unrealistically enormous sizes of datasets
for reliable conclusions. We develop an approach based on mixed gradients,
either observed or estimated, to effectively estimate the function at
near-parametric convergence rates. The novel approach and computational
algorithm could lead to methods useful to practitioners in many areas of
science and engineering. Our theoretical results reveal a behavior universal to
this class of nonparametric estimation problems. We explore a general setting
involving tensor product spaces and build upon the smoothing spline analysis of
variance (SS-ANOVA) framework. For -dimensional models under full
interaction, the optimal rates with gradient information on covariates are
identical to those for the -interaction models without gradients and,
therefore, the models are immune to the "curse of interaction". For additive
models, the optimal rates using gradient information are root-, thus
achieving the "parametric rate". We demonstrate aspects of the theoretical
results through synthetic and real data applications
Alzheimer's Disease Prediction Using Longitudinal and Heterogeneous Magnetic Resonance Imaging
Recent evidence has shown that structural magnetic resonance imaging (MRI) is
an effective tool for Alzheimer's disease (AD) prediction and diagnosis. While
traditional MRI-based diagnosis uses images acquired at a single time point, a
longitudinal study is more sensitive and accurate in detecting early
pathological changes of the AD. Two main difficulties arise in longitudinal
MRI-based diagnosis: (1) the inconsistent longitudinal scans among subjects
(i.e., different scanning time and different total number of scans); (2) the
heterogeneous progressions of high-dimensional regions of interest (ROIs) in
MRI. In this work, we propose a novel feature selection and estimation method
which can be applied to extract features from the heterogeneous longitudinal
MRI. A key ingredient of our method is the combination of smoothing splines and
the -penalty. We perform experiments on the Alzheimer's Disease
Neuroimaging Initiative (ADNI) database. The results corroborate the advantages
of the proposed method for AD prediction in longitudinal studies
Learning Strategies in Decentralized Matching Markets under Uncertain Preferences
We study the problem of decision-making in the setting of a scarcity of
shared resources when the preferences of agents are unknown a priori and must
be learned from data. Taking the two-sided matching market as a running
example, we focus on the decentralized setting, where agents do not share their
learned preferences with a central authority. Our approach is based on the
representation of preferences in a reproducing kernel Hilbert space, and a
learning algorithm for preferences that accounts for uncertainty due to the
competition among the agents in the market. Under regularity conditions, we
show that our estimator of preferences converges at a minimax optimal rate.
Given this result, we derive optimal strategies that maximize agents' expected
payoffs and we calibrate the uncertain state by taking opportunity costs into
account. We also derive an incentive-compatibility property and show that the
outcome from the learned strategies has a stability property. Finally, we prove
a fairness property that asserts that there exists no justified envy according
to the learned strategies
Li, Li, and Dai's Contribution to the Discussion of "Estimating Means of Bounded Random Variables by Betting" by Waudby-Smith and Aaditya Ramdas
We congratulate Waudby-Smith and Ramdas for their interesting paper
\cite{waudbysmith2022estimating} in generating confidence intervals and
time-uniform confidence sequences for mean estimation with bounded
observations. Their methodology utilizes composite nonnegative martingales and
establishes a connection to game-theoretic probability. Our comments will focus
on numerical comparisons with alternative methods.Comment: 3 pages; 2 figure
Incentive-Aware Recommender Systems in Two-Sided Markets
Online platforms in the Internet Economy commonly incorporate recommender
systems that recommend arms (e.g., products) to agents (e.g., users). In such
platforms, a myopic agent has a natural incentive to exploit, by choosing the
best product given the current information rather than to explore various
alternatives to collect information that will be used for other agents. We
propose a novel recommender system that respects agents' incentives and enjoys
asymptotically optimal performances expressed by the regret in repeated games.
We model such an incentive-aware recommender system as a multi-agent bandit
problem in a two-sided market which is equipped with an incentive constraint
induced by agents' opportunity costs. If the opportunity costs are known to the
principal, we show that there exists an incentive-compatible recommendation
policy, which pools recommendations across a genuinely good arm and an unknown
arm via a randomized and adaptive approach. On the other hand, if the
opportunity costs are unknown to the principal, we propose a policy that
randomly pools recommendations across all arms and uses each arm's cumulative
loss as feedback for exploration. We show that both policies also satisfy an
ex-post fairness criterion, which protects agents from over-exploitation
A Resampling Approach For causal Inference On Novel Two-Point Time-Series With Application To Identify Risk Factors For Type-2 Diabetes And Cardiovascular Disease
Two-point time-series data, characterized by baseline and follow-up
observations, are frequently encountered in health research. We study a novel
two-point time series structure without a control group, which is driven by an
observational routine clinical dataset collected to monitor key risk markers of
type- diabetes (T2D) and cardiovascular disease (CVD). We propose a
resampling approach called 'I-Rand' for independently sampling one of the two
time points for each individual and making inference on the estimated causal
effects based on matching methods. The proposed method is illustrated with data
from a service-based dietary intervention to promote a low-carbohydrate diet
(LCD), designed to impact risk of T2D and CVD. Baseline data contain a
pre-intervention health record of study participants, and health data after LCD
intervention are recorded at the follow-up visit, providing a two-point
time-series pattern without a parallel control group. Using this approach we
find that obesity is a significant risk factor of T2D and CVD, and an LCD
approach can significantly mitigate the risks of T2D and CVD. We provide code
that implements our method
- …