93 research outputs found

    Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists

    Get PDF
    Language Models (LMs) are important components in several Natural Language Processing systems. Recurrent Neural Network LMs composed of LSTM units, especially those augmented with an external memory, have achieved state-of-the-art results. However, these models still struggle to process long sequences which are more likely to contain long-distance dependencies because of information fading and a bias towards more recent information. In this paper we demonstrate an effective mechanism for retrieving information in a memory augmented LSTM LM based on attending to information in memory in proportion to the number of timesteps the LSTM gating mechanism persisted the information

    Automatic Detection of Vague Words and Sentences in Privacy Policies

    Full text link
    Website privacy policies represent the single most important source of information for users to gauge how their personal data are collected, used and shared by companies. However, privacy policies are often vague and people struggle to understand the content. Their opaqueness poses a significant challenge to both users and policy regulators. In this paper, we seek to identify vague content in privacy policies. We construct the first corpus of human-annotated vague words and sentences and present empirical studies on automatic vagueness detection. In particular, we investigate context-aware and context-agnostic models for predicting vague words, and explore auxiliary-classifier generative adversarial networks for characterizing sentence vagueness. Our experimental results demonstrate the effectiveness of proposed approaches. Finally, we provide suggestions for resolving vagueness and improving the usability of privacy policies.Comment: 10 page

    ΠšΠ»Π°ΡΡΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΡ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚Π΅ΠΉ Π½Π° основС ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΈΡ… ΠΌΠΎΡ‚ΠΈΠ²ΠΎΠ²

    Get PDF
    Sequence classification problems often arise in such areas as bioinformatics and natural language processing. In the last few year best results in this field were achieved by the deep learning methods, especially by architectures based on recurrent neural networks (RNN). However, the common problem of such models is a lack of interpretability, i.e., extraction of key features from data that affect the most the model’s decision. Meanwhile, using of less complicated neural network leads to decreasing predictive performance thus limiting usage of state-of-art machine learning methods in many subject areas. In this work we propose a novel interpretable deep learning architecture based on extraction of principal sets of short substrings β€” sequence motifs. The presence of extracted motif in the input sequence is a marker for a certain class. The key component of proposed solution is differential alignment algorithm developed by us, which provides a smooth analog of classical string comparison methods such as Levenshtein edit distance, and Smith–Waterman local alignment. Unlike previous works devoted to the motif based classification, which used CNN for shift-invariant searching, ours model provide a way to shift and gap invariant extraction of motifs.Π—Π°Π΄Π°Ρ‡ΠΈ, связанныС с классификациСй ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚Π΅ΠΉ символов Π½Π΅ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ³ΠΎ Π°Π»Ρ„Π°Π²ΠΈΡ‚Π°, часто Π²ΠΎΠ·Π½ΠΈΠΊΠ°ΡŽΡ‚ Π² Ρ‚Π°ΠΊΠΈΡ… областях, ΠΊΠ°ΠΊ Π±ΠΈΠΎΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ΠΈΠΊΠ° ΠΈ ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠ° СстСствСнного языка. ΠœΠ΅Ρ‚ΠΎΠ΄Ρ‹ Π³Π»ΡƒΠ±ΠΎΠΊΠΎΠ³ΠΎ обучСния, Π² особСнности ΠΌΠΎΠ΄Π΅Π»ΠΈ Π½Π° основС Ρ€Π΅ΠΊΡƒΡ€Ρ€Π΅Π½Ρ‚Π½Ρ‹Ρ… Π½Π΅ΠΉΡ€ΠΎΠ½Π½Ρ‹Ρ… сСтСй, Π² послСдниС нСсколько Π»Π΅Ρ‚ Π·Π°Ρ€Π΅ΠΊΠΎΠΌΠ΅Π½Π΄ΠΎΠ²Π°Π»ΠΈ сСбя ΠΊΠ°ΠΊ Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ эффСктивный способ Ρ€Π΅ΡˆΠ΅Π½ΠΈΡ ΠΏΠΎΠ΄ΠΎΠ±Π½Ρ‹Ρ… Π·Π°Π΄Π°Ρ‡. Однако ΡΡƒΡ‰Π΅ΡΡ‚Π²ΡƒΡŽΡ‰ΠΈΠ΅ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄Ρ‹ ΠΈΠΌΠ΅ΡŽΡ‚ ΡΠ΅Ρ€ΡŒΠ΅Π·Π½Ρ‹ΠΉ нСдостаток β€” Π½ΠΈΠ·ΠΊΡƒΡŽ ΠΈΠ½Ρ‚Π΅Ρ€ΠΏΡ€Π΅Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌΠΎΡΡ‚ΡŒ ΠΏΠΎΠ»ΡƒΡ‡Π°Π΅ΠΌΡ‹Ρ… Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ². ΠšΡ€Π°ΠΉΠ½Π΅ слоТно ΡƒΡΡ‚Π°Π½ΠΎΠ²ΠΈΡ‚ΡŒ ΠΊΠ°ΠΊΠΈΠ΅ ΠΈΠΌΠ΅Π½Π½ΠΎ свойства Π²Ρ…ΠΎΠ΄Π½ΠΎΠΉ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ отвСтствСнны Π·Π° Π΅Ρ‘ ΠΏΡ€ΠΈΠ½Π°Π΄Π»Π΅ΠΆΠ½ΠΎΡΡ‚ΡŒ ΠΊ Ρ‚ΠΎΠΌΡƒ ΠΈΠ»ΠΈ ΠΈΠ½ΠΎΠΌΡƒ классу. Π£ΠΏΡ€ΠΎΡ‰Π΅Π½ΠΈΠ΅ ΠΆΠ΅ Ρ‚Π°ΠΊΠΈΡ… ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ с Ρ†Π΅Π»ΡŒΡŽ ΠΏΠΎΠ²Ρ‹ΡˆΠ΅Π½ΠΈΡ ΠΈΡ… интСрпрСтируСмости, Π² свою ΠΎΡ‡Π΅Ρ€Π΅Π΄ΡŒ, ΠΏΡ€ΠΈΠ²ΠΎΠ΄ΠΈΡ‚ ΠΊ сниТСнию качСства классификации. Π’Π°ΠΊΠΈΠ΅ нСдостатки ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡ΠΈΠ²Π°ΡŽΡ‚ ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ соврСмСнных ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ² машинного обучСния Π²ΠΎ ΠΌΠ½ΠΎΠ³ΠΈΡ… ΠΏΡ€Π΅Π΄ΠΌΠ΅Ρ‚Π½Ρ‹Ρ… областях. Π’ настоящСй Ρ€Π°Π±ΠΎΡ‚Π΅ ΠΌΡ‹ прСдставляСм ΠΏΡ€ΠΈΠ½Ρ†ΠΈΠΏΠΈΠ°Π»ΡŒΠ½ΠΎ Π½ΠΎΠ²ΡƒΡŽ, ΠΈΠ½Ρ‚Π΅Ρ€ΠΏΡ€Π΅Ρ‚ΠΈΡ€ΡƒΠ΅ΠΌΡƒΡŽ Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Ρƒ Π½Π΅ΠΉΡ€ΠΎΠ½Π½Ρ‹Ρ… сСтСй, ΠΎΡΠ½ΠΎΠ²Π°Π½Π½ΡƒΡŽ Π½Π° поискС Π½Π°Π±ΠΎΡ€Π° ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΈΡ… ΠΏΠΎΠ΄ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚Π΅ΠΉ β€” ΠΌΠΎΡ‚ΠΈΠ²ΠΎΠ², Π½Π°Π»ΠΈΡ‡ΠΈΠ΅ ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… влияСт Π½Π° ΠΏΡ€ΠΈΠ½Π°Π΄Π»Π΅ΠΆΠ½ΠΎΡΡ‚ΡŒ ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ ΠΊ ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½Π½ΠΎΠΌΡƒ классу. ΠšΠ»ΡŽΡ‡Π΅Π²ΠΎΠΉ ΡΠΎΡΡ‚Π°Π²Π»ΡΡŽΡ‰Π΅ΠΉ ΠΏΡ€Π΅Π΄Π»Π°Π³Π°Π΅ΠΌΠΎΠ³ΠΎ Ρ€Π΅ΡˆΠ΅Π½ΠΈΡ являСтся Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½Π½Ρ‹ΠΉ Π½Π°ΠΌΠΈ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ Π΄ΠΈΡ„Ρ„Π΅Ρ€Π΅Π½Ρ†ΠΈΡ€ΡƒΠ΅ΠΌΠΎΠ³ΠΎ выравнивания, ΡΠ²Π»ΡΡŽΡ‰ΠΈΠΉΡΡ Π΄ΠΈΡ„Ρ„Π΅Ρ€Π΅Π½Ρ†ΠΈΡ€ΡƒΠ΅ΠΌΡ‹ΠΌ Π°Π½Π°Π»ΠΎΠ³ΠΎΠΌ Ρ‚Π°ΠΊΠΈΡ… классичСских способов сравнСния строк, ΠΊΠ°ΠΊ Ρ€Π΅Π΄Π°ΠΊΡ†ΠΈΠΎΠ½Π½ΠΎΠ΅ расстояниС Π›Π΅Π²Π΅Π½ΡˆΡ‚Π΅ΠΉΠ½Π° ΠΈ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ Бмита–ВатСрмана. Π’ ΠΎΡ‚Π»ΠΈΡ‡ΠΈΠ΅ ΠΎΡ‚ ΠΏΡ€Π΅Π΄Ρ‹Π΄ΡƒΡ‰ΠΈΡ… Ρ€Π°Π±ΠΎΡ‚, посвящСнных классификации ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚Π΅ΠΉ Π½Π° основС ΠΌΠΎΡ‚ΠΈΠ²ΠΎΠ², Π½ΠΎΠ²Ρ‹ΠΉ ΠΌΠ΅Ρ‚ΠΎΠ΄ позволяСт Π½Π΅ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Π²Ρ‹ΠΏΠΎΠ»Π½ΡΡ‚ΡŒ поиск Π² ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ»ΡŒΠ½ΠΎΠΉ части строки, Π½ΠΎ ΠΈ ΡƒΡ‡ΠΈΡ‚Ρ‹Π²Π°Ρ‚ΡŒ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½Ρ‹Π΅ вставки

    ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning

    Full text link
    To relieve the pain of manually selecting machine learning algorithms and tuning hyperparameters, automated machine learning (AutoML) methods have been developed to automatically search for good models. Due to the huge model search space, it is impossible to try all models. Users tend to distrust automatic results and increase the search budget as much as they can, thereby undermining the efficiency of AutoML. To address these issues, we design and implement ATMSeer, an interactive visualization tool that supports users in refining the search space of AutoML and analyzing the results. To guide the design of ATMSeer, we derive a workflow of using AutoML based on interviews with machine learning experts. A multi-granularity visualization is proposed to enable users to monitor the AutoML process, analyze the searched models, and refine the search space in real time. We demonstrate the utility and usability of ATMSeer through two case studies, expert interviews, and a user study with 13 end users.Comment: Published in the ACM Conference on Human Factors in Computing Systems (CHI), 2019, Glasgow, Scotland U
    • …
    corecore