2 research outputs found
A mixture Cox-Logistic model for feature selection from survival and classification data
This paper presents an original approach for jointly fitting survival times
and classifying samples into subgroups. The Coxlogit model is a generalized
linear model with a common set of selected features for both tasks. Survival
times and class labels are here assumed to be conditioned by a common risk
score which depends on those features. Learning is then naturally expressed as
maximizing the joint probability of subgroup labels and the ordering of
survival events, conditioned to a common weight vector. The model is estimated
by minimizing a regularized log-likelihood through a coordinate descent
algorithm.
Validation on synthetic and breast cancer data shows that the proposed
approach outperforms a standard Cox model or logistic regression when both
predicting the survival times and classifying new samples into subgroups. It is
also better at selecting informative features for both tasks
Modeling Time to Open of Emails with a Latent State for User Engagement Level
Email messages have been an important mode of communication, not only for
work, but also for social interactions and marketing. When messages have time
sensitive information, it becomes relevant for the sender to know what is the
expected time within which the email will be read by the recipient. In this
paper we use a survival analysis framework to predict the time to open an email
once it has been received. We use the Cox Proportional Hazards (CoxPH) model
that offers a way to combine various features that might affect the event of
opening an email. As an extension, we also apply a mixture model (MM) approach
to CoxPH that distinguishes between recipients, based on a latent state of how
prone to opening the messages each individual is. We compare our approach with
standard classification and regression models. While the classification model
provides predictions on the likelihood of an email being opened, the regression
model provides prediction of the real-valued time to open. The use of survival
analysis based methods allows us to jointly model both the open event as well
as the time-to-open. We experimented on a large real-world dataset of marketing
emails sent in a 3-month time duration. The mixture model achieves the best
accuracy on our data where a high proportion of email messages go unopened.Comment: 9 pages, 5 figures, WSDM'18, February 5-9, 2018, Marina Del Rey, CA,
USA, https://dl.acm.org/citation.cfm?id=315968