142 research outputs found
Hyperparameter tuning for deep learning in natural language processing
Deep Neural Networks have advanced rapidly over the past several years. However, it still seems like a black art for many people to make use of them efficiently. The reason for this complexity is that obtaining a consistent and outstanding result from a deep architecture requires optimizing many parameters known as hyperparameters. Hyperparameter tuning is an essential task in deep learning, which can make significant changes in network performance. This paper is the essence of over 3000 GPU hours on optimizing a network for a text classification task on a wide array of hyperparameters. We provide a list of hyperparameters to tune in addition to their tuning impact on the network performance. The hope is that such a listing will provide the interested researchers a mean to prioritize their efforts and to modify their deep architecture for getting the best performance with the least effort
Towards Integration of Statistical Hypothesis Tests into Deep Neural Networks
We report our ongoing work about a new deep architecture working in tandem
with a statistical test procedure for jointly training texts and their label
descriptions for multi-label and multi-class classification tasks. A
statistical hypothesis testing method is used to extract the most informative
words for each given class. These words are used as a class description for
more label-aware text classification. Intuition is to help the model to
concentrate on more informative words rather than more frequent ones. The model
leverages the use of label descriptions in addition to the input text to
enhance text classification performance. Our method is entirely data-driven,
has no dependency on other sources of information than the training data, and
is adaptable to different classification problems by providing appropriate
training data without major hyper-parameter tuning. We trained and tested our
system on several publicly available datasets, where we managed to improve the
state-of-the-art on one set with a high margin, and to obtain competitive
results on all other ones.Comment: Accepted to ACL 201
Transfer learning and sentence level features for named entity recognition on tweets
We present our system for the WNUT 2017 Named Entity Recognition challenge on Twitter data. We describe two modifications of a basic neural network architecture for sequence tagging. First, we
show how we exploit additional labeled data, where the Named Entity tags differ from the target task. Then, we propose a way to incorporate sentence level features. Our system uses both methods and ranked second for entity level annotations, achieving an F1-score of 40.78, and second for surface form annotations, achieving an F1-score of 39.33
Sentiment analysis using convolutional neural networks with multi-task training and distant supervision on italian tweets
In this paper, we propose a classifier for predicting sentiments of Italian Twitter messages. This work builds upon a deep learning approach where we leverage large amounts of weakly labelled data to train a 2-layer convolutional neural network. To train our network we apply a form of multi-task training. Our system participated in the EvalItalia-2016 competition and outperformed all other approaches on the sentiment analysis task
Syntactic manipulation for generating more diverse and interesting texts
Natural Language Generation plays an important role in the domain of dialogue systems as it determines how users perceive the system. Recently, deep-learning based systems have been proposed to tackle this task, as they generalize better and require less amounts of manual effort to implement them for new domains. However, deep learning systems usually adapt a very homogeneous sounding writing style which expresses little variation. In this work, we present our system for Natural Language Generation where we control various aspects of the surface realization in order to increase the lexical variability of the utterances, such that they sound more diverse and interesting. For this, we use a Semantically Controlled Long Short-term Memory Network (SCLSTM), and apply its specialized cell to control various syntactic features of the generated texts. We present an in-depth human evaluation where we show the effects of these surface manipulation on the perception of potential users
End-to-end trainable system for enhancing diversity in natural language generation
Natural Language Generation plays an important role in the domain of dialogue systems as it determines how the users perceive the system. Recently, deep-learning based systems have been proposed to tackle this task, as they generalize better and do not require large amounts of manual effort to implement them for new domains. However, deep learning systems usually produce monotonous sounding texts. In this work, we present our system for Natural Language Generation where we control the first word of the surface realization. We show that with this simple control mechanism it is possible to increase the lexical variability and the complexity of the generated texts. For this, we apply a character-based version of the Semantically Controlled Long Short-term Memory Network (SC-LSTM), and apply its specialized cell to control the first word generated by the system. To ensure that the surface manipulation does not produce semantically incoherent texts we apply a semantic control component, which we also use for reranking purposes. We show that our model is capable of generating texts that are more sophisticated while decreasing the number of semantic errors made during the generation
Toward automatic data curation for open data
In recent years large amounts of data have been made publicly available: literally thousands of open data sources exist, with genome data, temperature measurements, stock market prices, population and income statistics etc. However, accessing and combining data from different data sources is both non-trivial and very time consuming. These tasks typically take up to 80% of the time of data scientists. Automatic integration and curation of open data can facilitate this process
Swiss-chocolate : sentiment detection using sparse SVMs and part-of-speech n-grams
We describe a classifier to predict the message-level sentiment of English microblog messages from Twitter. This paper describes the classifier submitted to the SemEval-2014 competition (Task 9B). Our approach was to build up on the system of the last year’s winning approach by NRC Canada 2013 (Mohammad et al., 2013), with some modifications and additions of features, and additional sentiment lexicons. Furthermore, we used a sparse (l1-regularized) SVM, instead of the more commonly used l2-regularization, resulting in a very sparse linear classifier
JOINT_FORCES : unite competing sentiment classifiers with random forest
In this paper, we describe how we created a meta-classifier to detect the message-level sentiment of tweets. We participated in SemEval-2014 Task 9B by combining the results of several existing classifiers using a random forest. The results of 5 other teams from the competition as well as from 7 general purpose commercial classifiers were used to train the algorithm. This way, we were able to get a boost of up to 3.24 F1 score points
How to throw chocolate at students : a survey of extrinsic means for increased audience attention
This paper presents an overview of established and innovative means and teaching approaches that contribute to higher students' attention during lecture. The results are based on an international survey among lecturers from eleven universities. This survey was initiated by three lecturers from different countries who met at EDUCON 2016. The objective was to collect teaching experiences about playful means that motivate students to be attentive during a lecture. The proposed teaching approaches fall into three categories: established teaching methods, unconventional extrinsic methods, and tools. We focus on the extrinsic methods and discuss 14 illustrative examples of these approaches
- …