210 research outputs found
A concave pairwise fusion approach to subgroup analysis
An important step in developing individualized treatment strategies is to
correctly identify subgroups of a heterogeneous population, so that specific
treatment can be given to each subgroup. In this paper, we consider the
situation with samples drawn from a population consisting of subgroups with
different means, along with certain covariates. We propose a penalized approach
for subgroup analysis based on a regression model, in which heterogeneity is
driven by unobserved latent factors and thus can be represented by using
subject-specific intercepts. We apply concave penalty functions to pairwise
differences of the intercepts. This procedure automatically divides the
observations into subgroups. We develop an alternating direction method of
multipliers algorithm with concave penalties to implement the proposed approach
and demonstrate its convergence. We also establish the theoretical properties
of our proposed estimator and determine the order requirement of the minimal
difference of signals between groups in order to recover them. These results
provide a sound basis for making statistical inference in subgroup analysis.
Our proposed method is further illustrated by simulation studies and analysis
of the Cleveland heart disease dataset
Estimation in Semiparametric Quantile Factor Models
We propose an estimation methodology for a semiparametric quantile factor
panel model. We provide tools for inference that are robust to the existence of
moments and to the form of weak cross-sectional dependence in the idiosyncratic
error term. We apply our method to daily stock return data
Unsupervised Neural Machine Translation with SMT as Posterior Regularization
Without real bilingual corpus available, unsupervised Neural Machine
Translation (NMT) typically requires pseudo parallel data generated with the
back-translation method for the model training. However, due to weak
supervision, the pseudo data inevitably contain noises and errors that will be
accumulated and reinforced in the subsequent training process, leading to bad
translation performance. To address this issue, we introduce phrase based
Statistic Machine Translation (SMT) models which are robust to noisy data, as
posterior regularizations to guide the training of unsupervised NMT models in
the iterative back-translation process. Our method starts from SMT models built
with pre-trained language models and word-level translation tables inferred
from cross-lingual embeddings. Then SMT and NMT models are optimized jointly
and boost each other incrementally in a unified EM framework. In this way, (1)
the negative effect caused by errors in the iterative back-translation process
can be alleviated timely by SMT filtering noises from its phrase tables;
meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in
SMT. Experiments conducted on en-fr and en-de translation tasks show that our
method outperforms the strong baseline and achieves new state-of-the-art
unsupervised machine translation performance.Comment: To be presented at AAAI 2019; 9 pages, 4 figure
Detecting latent communities in network formation models
This paper proposes a logistic undirected network formation model which
allows for assortative matching on observed individual characteristics and the
presence of edge-wise fixed effects. We model the coefficients of observed
characteristics to have a latent community structure and the edge-wise fixed
effects to be of low rank. We propose a multi-step estimation procedure
involving nuclear norm regularization, sample splitting, iterative logistic
regression and spectral clustering to detect the latent communities. We show
that the latent communities can be exactly recovered when the expected degree
of the network is of order log n or higher, where n is the number of nodes in
the network. The finite sample performance of the new estimation and inference
methods is illustrated through both simulated and real datasets.Comment: 63 page
- …