24,864 research outputs found
"How May I Help You?": Modeling Twitter Customer Service Conversations Using Fine-Grained Dialogue Acts
Given the increasing popularity of customer service dialogue on Twitter,
analysis of conversation data is essential to understand trends in customer and
agent behavior for the purpose of automating customer service interactions. In
this work, we develop a novel taxonomy of fine-grained "dialogue acts"
frequently observed in customer service, showcasing acts that are more suited
to the domain than the more generic existing taxonomies. Using a sequential
SVM-HMM model, we model conversation flow, predicting the dialogue act of a
given turn in real-time. We characterize differences between customer and agent
behavior in Twitter customer service conversations, and investigate the effect
of testing our system on different customer service industries. Finally, we use
a data-driven approach to predict important conversation outcomes: customer
satisfaction, customer frustration, and overall problem resolution. We show
that the type and location of certain dialogue acts in a conversation have a
significant effect on the probability of desirable and undesirable outcomes,
and present actionable rules based on our findings. The patterns and rules we
derive can be used as guidelines for outcome-driven automated customer service
platforms.Comment: 13 pages, 6 figures, IUI 201
Catalog of quasars from the Kilo-Degree Survey Data Release 3
We present a catalog of quasars selected from broad-band photometric ugri
data of the Kilo-Degree Survey Data Release 3 (KiDS DR3). The QSOs are
identified by the random forest (RF) supervised machine learning model, trained
on SDSS DR14 spectroscopic data. We first cleaned the input KiDS data from
entries with excessively noisy, missing or otherwise problematic measurements.
Applying a feature importance analysis, we then tune the algorithm and identify
in the KiDS multiband catalog the 17 most useful features for the
classification, namely magnitudes, colors, magnitude ratios, and the stellarity
index. We used the t-SNE algorithm to map the multi-dimensional photometric
data onto 2D planes and compare the coverage of the training and inference
sets. We limited the inference set to r<22 to avoid extrapolation beyond the
feature space covered by training, as the SDSS spectroscopic sample is
considerably shallower than KiDS. This gives 3.4 million objects in the final
inference sample, from which the random forest identified 190,000 quasar
candidates. Accuracy of 97%, purity of 91%, and completeness of 87%, as derived
from a test set extracted from SDSS and not used in the training, are confirmed
by comparison with external spectroscopic and photometric QSO catalogs
overlapping with the KiDS footprint. The robustness of our results is
strengthened by number counts of the quasar candidates in the r band, as well
as by their mid-infrared colors available from WISE. An analysis of parallaxes
and proper motions of our QSO candidates found also in Gaia DR2 suggests that a
probability cut of p(QSO)>0.8 is optimal for purity, whereas p(QSO)>0.7 is
preferable for better completeness. Our study presents the first comprehensive
quasar selection from deep high-quality KiDS data and will serve as the basis
for versatile studies of the QSO population detected by this survey.Comment: Data available from the KiDS website at
http://kids.strw.leidenuniv.nl/DR3/quasarcatalog.php and the source code from
https://github.com/snakoneczny/kids-quasar
Analyzing Learned Molecular Representations for Property Prediction
Advancements in neural machinery have led to a wide range of algorithmic
solutions for molecular property prediction. Two classes of models in
particular have yielded promising results: neural networks applied to computed
molecular fingerprints or expert-crafted descriptors, and graph convolutional
neural networks that construct a learned molecular representation by operating
on the graph structure of the molecule. However, recent literature has yet to
clearly determine which of these two methods is superior when generalizing to
new chemical space. Furthermore, prior research has rarely examined these new
models in industry research settings in comparison to existing employed models.
In this paper, we benchmark models extensively on 19 public and 16 proprietary
industrial datasets spanning a wide variety of chemical endpoints. In addition,
we introduce a graph convolutional model that consistently matches or
outperforms models using fixed molecular descriptors as well as previous graph
neural architectures on both public and proprietary datasets. Our empirical
findings indicate that while approaches based on these representations have yet
to reach the level of experimental reproducibility, our proposed model
nevertheless offers significant improvements over models currently used in
industrial workflows
- …