529 research outputs found
Weighting-Based Treatment Effect Estimation via Distribution Learning
Existing weighting methods for treatment effect estimation are often built
upon the idea of propensity scores or covariate balance. They usually impose
strong assumptions on treatment assignment or outcome model to obtain unbiased
estimation, such as linearity or specific functional forms, which easily leads
to the major drawback of model mis-specification. In this paper, we aim to
alleviate these issues by developing a distribution learning-based weighting
method. We first learn the true underlying distribution of covariates
conditioned on treatment assignment, then leverage the ratio of covariates'
density in the treatment group to that of the control group as the weight for
estimating treatment effects. Specifically, we propose to approximate the
distribution of covariates in both treatment and control groups through
invertible transformations via change of variables. To demonstrate the
superiority, robustness, and generalizability of our method, we conduct
extensive experiments using synthetic and real data. From the experiment
results, we find that our method for estimating average treatment effect on
treated (ATT) with observational data outperforms several cutting-edge
weighting-only benchmarking methods, and it maintains its advantage under a
doubly-robust estimation framework that combines weighting with some advanced
outcome modeling methods.Comment: 33 pages, 16 tables, 7 figures, Github:
https://github.com/DLweighting/Distribution-Learning-based-weightin
Your Preference or Mine? A Randomized Field Experiment on Recommender Systems in Two-sided Matching Markets
The literature on recommender systems mainly focuses on product recommendation where buyer’s preferences are considered. However, for user recommendation in two-sided matching markets, potential matches’ preferences may also play a role in focal user’s decision-making. Hence, we seek to understand the impact of providing potential candidates’ preference in such settings. In collaboration with an online dating platform, we design and conduct a randomized field experiment and present users with recommendations based on i) their own preferences, ii) potential matches’ preferences, or iii) mutual preferences. Interestingly, we find that users are sensitive to the provision of potential candidates’ preferences, and they proactively reach out to those “who might prefer them” despite those candidates’ relatively lower desirability. This leads to a greater improvement in matching. The findings provide valuable insights on how to design user recommendation systems beyond the current practice of recommendations based on focal user’s preferences
Text Mining Patient-Doctor Online Forum Data from the Largest Online Health Community in China
The present study uses the data from the largest online health community in China, www.haodf.com, to examine what are the salient topics that Chinese health consumers discussed with their doctors online. The preliminary research found that there are 146,915 posts by patients and 123,059 posts by doctors from Aug. 2006 to Apr. 2014 on this open online forum. In total, there are 10,685 doctors have participated online forum discussion during this time period. The text mining results on topic modeling are still pending. But we already found the promising and unique quality of this data. We are also looking forward to more inspiring research questions to motivate us for this research
Support Neighbor Loss for Person Re-Identification
Person re-identification (re-ID) has recently been tremendously boosted due
to the advancement of deep convolutional neural networks (CNN). The majority of
deep re-ID methods focus on designing new CNN architectures, while less
attention is paid on investigating the loss functions. Verification loss and
identification loss are two types of losses widely used to train various deep
re-ID models, both of which however have limitations. Verification loss guides
the networks to generate feature embeddings of which the intra-class variance
is decreased while the inter-class ones is enlarged. However, training networks
with verification loss tends to be of slow convergence and unstable performance
when the number of training samples is large. On the other hand, identification
loss has good separating and scalable property. But its neglect to explicitly
reduce the intra-class variance limits its performance on re-ID, because the
same person may have significant appearance disparity across different camera
views. To avoid the limitations of the two types of losses, we propose a new
loss, called support neighbor (SN) loss. Rather than being derived from data
sample pairs or triplets, SN loss is calculated based on the positive and
negative support neighbor sets of each anchor sample, which contain more
valuable contextual information and neighborhood structure that are beneficial
for more stable performance. To ensure scalability and separability, a
softmax-like function is formulated to push apart the positive and negative
support sets. To reduce intra-class variance, the distance between the anchor's
nearest positive neighbor and furthest positive sample is penalized.
Integrating SN loss on top of Resnet50, superior re-ID results to the
state-of-the-art ones are obtained on several widely used datasets.Comment: Accepted by ACM Multimedia (ACM MM) 201
Leveraging Deep-learning and Field Experiment Response Heterogeneity to Enhance Customer Targeting Effectiveness
Firms seek to better understand heterogeneity in the customer response to marketing campaigns, which can boost customer targeting effectiveness. Motivated by the success of modern machine learning techniques, this paper presents a framework that leverages deep-learning algorithms and field experiment response heterogeneity to enhance customer targeting effectiveness. We recommend firms run a pilot randomized experiment and use the data to train various deep-learning models. By incorporating recurrent neural nets and deep perceptron nets, our optimal deep-learning model can capture both temporal and network effects in the purchase history, after addressing the common issues in most predictive models such as imbalanced training, data sparsity, temporality, and scalability. We then apply the learned optimal model to identify customer targets from the large amount of remaining customers with the highest predicted purchase probabilities. Our application with a large department store on a total of 2.8 million customers supports that optimal deep-learning models can identify higher-value customer targets and lead to better sales performance of marketing campaigns, compared to industry common practices of targeting by past purchase frequency or spending amount. We demonstrate that companies may achieve sub-optimal customer targeting not because they offer inferior campaign incentives, but because they leverage worse targeting rules and select low-value customer targets. The results inform managers that beyond gauging the causal impact of marketing interventions, data from field experiments can also be leveraged to identify high-value customer targets. Overall, deep-learning algorithms can be integrated with field experiment response heterogeneity to improve the effectiveness of targeted campaigns
- …