75,600 research outputs found
Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection
The arm race between spambots and spambot-detectors is made of several cycles
(or generations): a new wave of spambots is created (and new spam is spread),
new spambot filters are derived and old spambots mutate (or evolve) to new
species. Recently, with the diffusion of the adversarial learning approach, a
new practice is emerging: to manipulate on purpose target samples in order to
make stronger detection models. Here, we manipulate generations of Twitter
social bots, to obtain - and study - their possible future evolutions, with the
aim of eventually deriving more effective detection techniques. In detail, we
propose and experiment with a novel genetic algorithm for the synthesis of
online accounts. The algorithm allows to create synthetic evolved versions of
current state-of-the-art social bots. Results demonstrate that synthetic bots
really escape current detection techniques. However, they give all the needed
elements to improve such techniques, making possible a proactive approach for
the design of social bot detection systems.Comment: This is the pre-final version of a paper accepted @ 11th ACM
Conference on Web Science, June 30-July 3, 2019, Boston, U
Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions
We consider the paradigm of a black box AI system that makes life-critical
decisions. We propose an "arguing machines" framework that pairs the primary AI
system with a secondary one that is independently trained to perform the same
task. We show that disagreement between the two systems, without any knowledge
of underlying system design or operation, is sufficient to arbitrarily improve
the accuracy of the overall decision pipeline given human supervision over
disagreements. We demonstrate this system in two applications: (1) an
illustrative example of image classification and (2) on large-scale real-world
semi-autonomous driving data. For the first application, we apply this
framework to image classification achieving a reduction from 8.0% to 2.8% top-5
error on ImageNet. For the second application, we apply this framework to Tesla
Autopilot and demonstrate the ability to predict 90.4% of system disengagements
that were labeled by human annotators as challenging and needing human
supervision
Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creative
Accurately predicting conversions in advertisements is generally a
challenging task, because such conversions do not occur frequently. In this
paper, we propose a new framework to support creating high-performing ad
creatives, including the accurate prediction of ad creative text conversions
before delivering to the consumer. The proposed framework includes three key
ideas: multi-task learning, conditional attention, and attention highlighting.
Multi-task learning is an idea for improving the prediction accuracy of
conversion, which predicts clicks and conversions simultaneously, to solve the
difficulty of data imbalance. Furthermore, conditional attention focuses
attention of each ad creative with the consideration of its genre and target
gender, thus improving conversion prediction accuracy. Attention highlighting
visualizes important words and/or phrases based on conditional attention. We
evaluated the proposed framework with actual delivery history data (14,000
creatives displayed more than a certain number of times from Gunosy Inc.), and
confirmed that these ideas improve the prediction performance of conversions,
and visualize noteworthy words according to the creatives' attributes.Comment: 9 pages, 6 figures. Accepted at The 25th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD 2019) as an applied data science
pape
Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes
PURPOSE: The medical literature relevant to germline genetics is growing
exponentially. Clinicians need tools monitoring and prioritizing the literature
to understand the clinical implications of the pathogenic genetic variants. We
developed and evaluated two machine learning models to classify abstracts as
relevant to the penetrance (risk of cancer for germline mutation carriers) or
prevalence of germline genetic mutations. METHODS: We conducted literature
searches in PubMed and retrieved paper titles and abstracts to create an
annotated dataset for training and evaluating the two machine learning
classification models. Our first model is a support vector machine (SVM) which
learns a linear decision rule based on the bag-of-ngrams representation of each
title and abstract. Our second model is a convolutional neural network (CNN)
which learns a complex nonlinear decision rule based on the raw title and
abstract. We evaluated the performance of the two models on the classification
of papers as relevant to penetrance or prevalence. RESULTS: For penetrance
classification, we annotated 3740 paper titles and abstracts and used 60% for
training the model, 20% for tuning the model, and 20% for evaluating the model.
The SVM model achieves 89.53% accuracy (percentage of papers that were
correctly classified) while the CNN model achieves 88.95 % accuracy. For
prevalence classification, we annotated 3753 paper titles and abstracts. The
SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 %
accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts
as relevant to penetrance or prevalence. By facilitating literature review,
this tool could help clinicians and researchers keep abreast of the burgeoning
knowledge of gene-cancer associations and keep the knowledge bases for clinical
decision support tools up to date
- …