93 research outputs found
Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists
Language Models (LMs) are important components in several Natural Language
Processing systems. Recurrent Neural Network LMs composed of LSTM units,
especially those augmented with an external memory, have achieved
state-of-the-art results. However, these models still struggle to process long
sequences which are more likely to contain long-distance dependencies because
of information fading and a bias towards more recent information. In this paper
we demonstrate an effective mechanism for retrieving information in a memory
augmented LSTM LM based on attending to information in memory in proportion to
the number of timesteps the LSTM gating mechanism persisted the information
Automatic Detection of Vague Words and Sentences in Privacy Policies
Website privacy policies represent the single most important source of
information for users to gauge how their personal data are collected, used and
shared by companies. However, privacy policies are often vague and people
struggle to understand the content. Their opaqueness poses a significant
challenge to both users and policy regulators. In this paper, we seek to
identify vague content in privacy policies. We construct the first corpus of
human-annotated vague words and sentences and present empirical studies on
automatic vagueness detection. In particular, we investigate context-aware and
context-agnostic models for predicting vague words, and explore
auxiliary-classifier generative adversarial networks for characterizing
sentence vagueness. Our experimental results demonstrate the effectiveness of
proposed approaches. Finally, we provide suggestions for resolving vagueness
and improving the usability of privacy policies.Comment: 10 page
ΠΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΡΡΠ΅ΠΉ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΊΠΎΡΠΎΡΠΊΠΈΡ ΠΌΠΎΡΠΈΠ²ΠΎΠ²
Sequence classification problems often arise in such areas as bioinformatics and natural language processing. In the last few year best results in this field were achieved by the deep learning methods, especially by architectures based on recurrent neural networks (RNN). However, the common problem of such models is a lack of interpretability, i.e., extraction of key features from data that affect the most the modelβs decision. Meanwhile, using of less complicated neural network leads to decreasing predictive performance thus limiting usage of state-of-art machine learning methods in many subject areas. In this work we propose a novel interpretable deep learning architecture based on extraction of principal sets of short substrings β sequence motifs. The presence of extracted motif in the input sequence is a marker for a certain class. The key component of proposed solution is differential alignment algorithm developed by us, which provides a smooth analog of classical string comparison methods such as Levenshtein edit distance, and SmithβWaterman local alignment. Unlike previous works devoted to the motif based classification, which used CNN for shift-invariant searching, ours model provide a way to shift and gap invariant extraction of motifs.ΠΠ°Π΄Π°ΡΠΈ, ΡΠ²ΡΠ·Π°Π½Π½ΡΠ΅ Ρ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠ΅ΠΉ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΡΡΠ΅ΠΉ ΡΠΈΠΌΠ²ΠΎΠ»ΠΎΠ² Π½Π΅ΠΊΠΎΡΠΎΡΠΎΠ³ΠΎ Π°Π»ΡΠ°Π²ΠΈΡΠ°, ΡΠ°ΡΡΠΎ Π²ΠΎΠ·Π½ΠΈΠΊΠ°ΡΡ Π² ΡΠ°ΠΊΠΈΡ
ΠΎΠ±Π»Π°ΡΡΡΡ
, ΠΊΠ°ΠΊ Π±ΠΈΠΎΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΊΠ° ΠΈ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠ° Π΅ΡΡΠ΅ΡΡΠ²Π΅Π½Π½ΠΎΠ³ΠΎ ΡΠ·ΡΠΊΠ°. ΠΠ΅ΡΠΎΠ΄Ρ Π³Π»ΡΠ±ΠΎΠΊΠΎΠ³ΠΎ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ, Π² ΠΎΡΠΎΠ±Π΅Π½Π½ΠΎΡΡΠΈ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΡΠ΅ΠΊΡΡΡΠ΅Π½ΡΠ½ΡΡ
Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ, Π² ΠΏΠΎΡΠ»Π΅Π΄Π½ΠΈΠ΅ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΎ Π»Π΅Ρ Π·Π°ΡΠ΅ΠΊΠΎΠΌΠ΅Π½Π΄ΠΎΠ²Π°Π»ΠΈ ΡΠ΅Π±Ρ ΠΊΠ°ΠΊ Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΠΉ ΡΠΏΠΎΡΠΎΠ± ΡΠ΅ΡΠ΅Π½ΠΈΡ ΠΏΠΎΠ΄ΠΎΠ±Π½ΡΡ
Π·Π°Π΄Π°Ρ. ΠΠ΄Π½Π°ΠΊΠΎ ΡΡΡΠ΅ΡΡΠ²ΡΡΡΠΈΠ΅ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Ρ ΠΈΠΌΠ΅ΡΡ ΡΠ΅ΡΡΠ΅Π·Π½ΡΠΉ Π½Π΅Π΄ΠΎΡΡΠ°ΡΠΎΠΊ β Π½ΠΈΠ·ΠΊΡΡ ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠΈΡΡΠ΅ΠΌΠΎΡΡΡ ΠΏΠΎΠ»ΡΡΠ°Π΅ΠΌΡΡ
ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΎΠ². ΠΡΠ°ΠΉΠ½Π΅ ΡΠ»ΠΎΠΆΠ½ΠΎ ΡΡΡΠ°Π½ΠΎΠ²ΠΈΡΡ ΠΊΠ°ΠΊΠΈΠ΅ ΠΈΠΌΠ΅Π½Π½ΠΎ ΡΠ²ΠΎΠΉΡΡΠ²Π° Π²Ρ
ΠΎΠ΄Π½ΠΎΠΉ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ ΠΎΡΠ²Π΅ΡΡΡΠ²Π΅Π½Π½Ρ Π·Π° Π΅Ρ ΠΏΡΠΈΠ½Π°Π΄Π»Π΅ΠΆΠ½ΠΎΡΡΡ ΠΊ ΡΠΎΠΌΡ ΠΈΠ»ΠΈ ΠΈΠ½ΠΎΠΌΡ ΠΊΠ»Π°ΡΡΡ. Π£ΠΏΡΠΎΡΠ΅Π½ΠΈΠ΅ ΠΆΠ΅ ΡΠ°ΠΊΠΈΡ
ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Ρ ΡΠ΅Π»ΡΡ ΠΏΠΎΠ²ΡΡΠ΅Π½ΠΈΡ ΠΈΡ
ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠΈΡΡΠ΅ΠΌΠΎΡΡΠΈ, Π² ΡΠ²ΠΎΡ ΠΎΡΠ΅ΡΠ΅Π΄Ρ, ΠΏΡΠΈΠ²ΠΎΠ΄ΠΈΡ ΠΊ ΡΠ½ΠΈΠΆΠ΅Π½ΠΈΡ ΠΊΠ°ΡΠ΅ΡΡΠ²Π° ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ. Π’Π°ΠΊΠΈΠ΅ Π½Π΅Π΄ΠΎΡΡΠ°ΡΠΊΠΈ ΠΎΠ³ΡΠ°Π½ΠΈΡΠΈΠ²Π°ΡΡ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ ΡΠΎΠ²ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ
ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² ΠΌΠ°ΡΠΈΠ½Π½ΠΎΠ³ΠΎ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ Π²ΠΎ ΠΌΠ½ΠΎΠ³ΠΈΡ
ΠΏΡΠ΅Π΄ΠΌΠ΅ΡΠ½ΡΡ
ΠΎΠ±Π»Π°ΡΡΡΡ
. Π Π½Π°ΡΡΠΎΡΡΠ΅ΠΉ ΡΠ°Π±ΠΎΡΠ΅ ΠΌΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΠ΅ΠΌ ΠΏΡΠΈΠ½ΡΠΈΠΏΠΈΠ°Π»ΡΠ½ΠΎ Π½ΠΎΠ²ΡΡ, ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠΈΡΡΠ΅ΠΌΡΡ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΡ Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ, ΠΎΡΠ½ΠΎΠ²Π°Π½Π½ΡΡ Π½Π° ΠΏΠΎΠΈΡΠΊΠ΅ Π½Π°Π±ΠΎΡΠ° ΠΊΠΎΡΠΎΡΠΊΠΈΡ
ΠΏΠΎΠ΄ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΡΡΠ΅ΠΉ β ΠΌΠΎΡΠΈΠ²ΠΎΠ², Π½Π°Π»ΠΈΡΠΈΠ΅ ΠΊΠΎΡΠΎΡΡΡ
Π²Π»ΠΈΡΠ΅Ρ Π½Π° ΠΏΡΠΈΠ½Π°Π΄Π»Π΅ΠΆΠ½ΠΎΡΡΡ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ ΠΊ ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠΌΡ ΠΊΠ»Π°ΡΡΡ. ΠΠ»ΡΡΠ΅Π²ΠΎΠΉ ΡΠΎΡΡΠ°Π²Π»ΡΡΡΠ΅ΠΉ ΠΏΡΠ΅Π΄Π»Π°Π³Π°Π΅ΠΌΠΎΠ³ΠΎ ΡΠ΅ΡΠ΅Π½ΠΈΡ ΡΠ²Π»ΡΠ΅ΡΡΡ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠ°Π½Π½ΡΠΉ Π½Π°ΠΌΠΈ Π°Π»Π³ΠΎΡΠΈΡΠΌ Π΄ΠΈΡΡΠ΅ΡΠ΅Π½ΡΠΈΡΡΠ΅ΠΌΠΎΠ³ΠΎ Π²ΡΡΠ°Π²Π½ΠΈΠ²Π°Π½ΠΈΡ, ΡΠ²Π»ΡΡΡΠΈΠΉΡΡ Π΄ΠΈΡΡΠ΅ΡΠ΅Π½ΡΠΈΡΡΠ΅ΠΌΡΠΌ Π°Π½Π°Π»ΠΎΠ³ΠΎΠΌ ΡΠ°ΠΊΠΈΡ
ΠΊΠ»Π°ΡΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΡΠΏΠΎΡΠΎΠ±ΠΎΠ² ΡΡΠ°Π²Π½Π΅Π½ΠΈΡ ΡΡΡΠΎΠΊ, ΠΊΠ°ΠΊ ΡΠ΅Π΄Π°ΠΊΡΠΈΠΎΠ½Π½ΠΎΠ΅ ΡΠ°ΡΡΡΠΎΡΠ½ΠΈΠ΅ ΠΠ΅Π²Π΅Π½ΡΡΠ΅ΠΉΠ½Π° ΠΈ Π°Π»Π³ΠΎΡΠΈΡΠΌ Π‘ΠΌΠΈΡΠ°βΠΠ°ΡΠ΅ΡΠΌΠ°Π½Π°. Π ΠΎΡΠ»ΠΈΡΠΈΠ΅ ΠΎΡ ΠΏΡΠ΅Π΄ΡΠ΄ΡΡΠΈΡ
ΡΠ°Π±ΠΎΡ, ΠΏΠΎΡΠ²ΡΡΠ΅Π½Π½ΡΡ
ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΡΠ½ΠΎΡΡΠ΅ΠΉ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΌΠΎΡΠΈΠ²ΠΎΠ², Π½ΠΎΠ²ΡΠΉ ΠΌΠ΅ΡΠΎΠ΄ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ Π½Π΅ ΡΠΎΠ»ΡΠΊΠΎ Π²ΡΠΏΠΎΠ»Π½ΡΡΡ ΠΏΠΎΠΈΡΠΊ Π² ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ»ΡΠ½ΠΎΠΉ ΡΠ°ΡΡΠΈ ΡΡΡΠΎΠΊΠΈ, Π½ΠΎ ΠΈ ΡΡΠΈΡΡΠ²Π°ΡΡ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΡΠ΅ Π²ΡΡΠ°Π²ΠΊΠΈ
ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning
To relieve the pain of manually selecting machine learning algorithms and
tuning hyperparameters, automated machine learning (AutoML) methods have been
developed to automatically search for good models. Due to the huge model search
space, it is impossible to try all models. Users tend to distrust automatic
results and increase the search budget as much as they can, thereby undermining
the efficiency of AutoML. To address these issues, we design and implement
ATMSeer, an interactive visualization tool that supports users in refining the
search space of AutoML and analyzing the results. To guide the design of
ATMSeer, we derive a workflow of using AutoML based on interviews with machine
learning experts. A multi-granularity visualization is proposed to enable users
to monitor the AutoML process, analyze the searched models, and refine the
search space in real time. We demonstrate the utility and usability of ATMSeer
through two case studies, expert interviews, and a user study with 13 end
users.Comment: Published in the ACM Conference on Human Factors in Computing Systems
(CHI), 2019, Glasgow, Scotland U
- β¦