142 research outputs found

    Hyperparameter tuning for deep learning in natural language processing

    Get PDF
    Deep Neural Networks have advanced rapidly over the past several years. However, it still seems like a black art for many people to make use of them efficiently. The reason for this complexity is that obtaining a consistent and outstanding result from a deep architecture requires optimizing many parameters known as hyperparameters. Hyperparameter tuning is an essential task in deep learning, which can make significant changes in network performance. This paper is the essence of over 3000 GPU hours on optimizing a network for a text classification task on a wide array of hyperparameters. We provide a list of hyperparameters to tune in addition to their tuning impact on the network performance. The hope is that such a listing will provide the interested researchers a mean to prioritize their efforts and to modify their deep architecture for getting the best performance with the least effort

    Towards Integration of Statistical Hypothesis Tests into Deep Neural Networks

    Get PDF
    We report our ongoing work about a new deep architecture working in tandem with a statistical test procedure for jointly training texts and their label descriptions for multi-label and multi-class classification tasks. A statistical hypothesis testing method is used to extract the most informative words for each given class. These words are used as a class description for more label-aware text classification. Intuition is to help the model to concentrate on more informative words rather than more frequent ones. The model leverages the use of label descriptions in addition to the input text to enhance text classification performance. Our method is entirely data-driven, has no dependency on other sources of information than the training data, and is adaptable to different classification problems by providing appropriate training data without major hyper-parameter tuning. We trained and tested our system on several publicly available datasets, where we managed to improve the state-of-the-art on one set with a high margin, and to obtain competitive results on all other ones.Comment: Accepted to ACL 201

    Transfer learning and sentence level features for named entity recognition on tweets

    Get PDF
    We present our system for the WNUT 2017 Named Entity Recognition challenge on Twitter data. We describe two modifications of a basic neural network architecture for sequence tagging. First, we show how we exploit additional labeled data, where the Named Entity tags differ from the target task. Then, we propose a way to incorporate sentence level features. Our system uses both methods and ranked second for entity level annotations, achieving an F1-score of 40.78, and second for surface form annotations, achieving an F1-score of 39.33

    Sentiment analysis using convolutional neural networks with multi-task training and distant supervision on italian tweets

    Get PDF
    In this paper, we propose a classifier for predicting sentiments of Italian Twitter messages. This work builds upon a deep learning approach where we leverage large amounts of weakly labelled data to train a 2-layer convolutional neural network. To train our network we apply a form of multi-task training. Our system participated in the EvalItalia-2016 competition and outperformed all other approaches on the sentiment analysis task

    Syntactic manipulation for generating more diverse and interesting texts

    Get PDF
    Natural Language Generation plays an important role in the domain of dialogue systems as it determines how users perceive the system. Recently, deep-learning based systems have been proposed to tackle this task, as they generalize better and require less amounts of manual effort to implement them for new domains. However, deep learning systems usually adapt a very homogeneous sounding writing style which expresses little variation. In this work, we present our system for Natural Language Generation where we control various aspects of the surface realization in order to increase the lexical variability of the utterances, such that they sound more diverse and interesting. For this, we use a Semantically Controlled Long Short-term Memory Network (SCLSTM), and apply its specialized cell to control various syntactic features of the generated texts. We present an in-depth human evaluation where we show the effects of these surface manipulation on the perception of potential users

    End-to-end trainable system for enhancing diversity in natural language generation

    Get PDF
    Natural Language Generation plays an important role in the domain of dialogue systems as it determines how the users perceive the system. Recently, deep-learning based systems have been proposed to tackle this task, as they generalize better and do not require large amounts of manual effort to implement them for new domains. However, deep learning systems usually produce monotonous sounding texts. In this work, we present our system for Natural Language Generation where we control the first word of the surface realization. We show that with this simple control mechanism it is possible to increase the lexical variability and the complexity of the generated texts. For this, we apply a character-based version of the Semantically Controlled Long Short-term Memory Network (SC-LSTM), and apply its specialized cell to control the first word generated by the system. To ensure that the surface manipulation does not produce semantically incoherent texts we apply a semantic control component, which we also use for reranking purposes. We show that our model is capable of generating texts that are more sophisticated while decreasing the number of semantic errors made during the generation

    Toward automatic data curation for open data

    Get PDF
    In recent years large amounts of data have been made publicly available: literally thousands of open data sources exist, with genome data, temperature measurements, stock market prices, population and income statistics etc. However, accessing and combining data from different data sources is both non-trivial and very time consuming. These tasks typically take up to 80% of the time of data scientists. Automatic integration and curation of open data can facilitate this process

    Swiss-chocolate : sentiment detection using sparse SVMs and part-of-speech n-grams

    Get PDF
    We describe a classifier to predict the message-level sentiment of English microblog messages from Twitter. This paper describes the classifier submitted to the SemEval-2014 competition (Task 9B). Our approach was to build up on the system of the last year’s winning approach by NRC Canada 2013 (Mohammad et al., 2013), with some modifications and additions of features, and additional sentiment lexicons. Furthermore, we used a sparse (l1-regularized) SVM, instead of the more commonly used l2-regularization, resulting in a very sparse linear classifier

    JOINT_FORCES : unite competing sentiment classifiers with random forest

    Get PDF
    In this paper, we describe how we created a meta-classifier to detect the message-level sentiment of tweets. We participated in SemEval-2014 Task 9B by combining the results of several existing classifiers using a random forest. The results of 5 other teams from the competition as well as from 7 general purpose commercial classifiers were used to train the algorithm. This way, we were able to get a boost of up to 3.24 F1 score points

    How to throw chocolate at students : a survey of extrinsic means for increased audience attention

    Get PDF
    This paper presents an overview of established and innovative means and teaching approaches that contribute to higher students' attention during lecture. The results are based on an international survey among lecturers from eleven universities. This survey was initiated by three lecturers from different countries who met at EDUCON 2016. The objective was to collect teaching experiences about playful means that motivate students to be attentive during a lecture. The proposed teaching approaches fall into three categories: established teaching methods, unconventional extrinsic methods, and tools. We focus on the extrinsic methods and discuss 14 illustrative examples of these approaches
    • …
    corecore