20 research outputs found

    Generative and Discriminative Text Classification with Recurrent Neural Networks

    Full text link
    We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts---the same pattern that Ng & Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models

    Hash Embeddings for Efficient Word Representations

    Full text link
    We present hash embeddings, an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function (the hashing trick). In hash embeddings each token is represented by kk dd-dimensional embeddings vectors and one kk dimensional weight vector. The final dd dimensional representation of the token is the product of the two. Rather than fitting the embedding vectors for each token these are selected by the hashing trick from a shared pool of BB embedding vectors. Our experiments show that hash embeddings can easily deal with huge vocabularies consisting of millions of tokens. When using a hash embedding there is no need to create a dictionary before training nor to perform any kind of vocabulary pruning after training. We show that models trained using hash embeddings exhibit at least the same level of performance as models trained using regular embeddings across a wide range of tasks. Furthermore, the number of parameters needed by such an embedding is only a fraction of what is required by a regular embedding. Since standard embeddings and embeddings constructed using the hashing trick are actually just special cases of a hash embedding, hash embeddings can be considered an extension and improvement over the existing regular embedding types

    Explicit Interaction Model towards Text Classification

    Full text link
    Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multi-label and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.Comment: 8 page

    The importance of data classification using machine learning methods in microarray data

    Get PDF
    The detection of genetic mutations has attracted global attention. several methods have proposed to detect diseases such as cancers and tumours. One of them is microarrays, which is a type of representation for gene expression that is helpful in diagnosis. To unleash the full potential of microarrays, machine-learning algorithms and gene selection methods can be implemented to facilitate processing on microarrays and to overcome other potential challenges. One of these challenges involves high dimensional data that are redundant, irrelevant, and noisy. To alleviate this problem, this representation should be simplified. For example, the feature selection process can be implemented by reducing the number of features adopted in clustering and classification. A subset of genes can be selected from a pool of gene expression data recorded on DNA micro-arrays. This paper reviews existing classification techniques and gene selection methods. The effectiveness of emerging techniques, such as the swarm intelligence technique in feature selection and classification in microarrays, are reported as well. These emerging techniques can be used in detecting cancer. The swarm intelligence technique can be combined with other statistical methods for attaining better results

    Board of Directors' Profile: A Case for Deep Learning as a Valid Methodology to Finance Research

    Get PDF
    This paper presents a Deep Learning (DL) model for natural language processing of unstructured CVs to generate a six-dimensional profile of the professional experience of the Spanish companies' board of directors. We show the complete process starting with open data extraction and cleaning, the generation of a labeled dataset for supervised learning, the development, training and validation of a DL model capable of accurately analyzing the dataset, and, finally, a data analysis work based on the automated generation of the professional profiles of more than 6,000 directors of Spanish listed companies between 2003 and 2020. An RNN-LSTM neural network has been trained in three phases starting from a random initial state, (1) learning of basic structures of the Spanish language, (2) fine tuning for scientific texts in the field of economics and finance, and (3) regression modeling to generate a six-dimensional profile based on a generalization of sentiment classification systems. The complete training has been carried out with very low computational requirements, having a total duration of 120 hours of processing in a low-end GPU. The results obtained in the validation of the DL model show great accuracy, obtaining a value for the standard deviation of the mean error between 0.015 and 0.033. As a result, we have been able to outline with a high degree of reliability the profile of the listed Spanish companies' board of directors. We found that the predominant profile is that of directors with experience in executive or consultancy positions, followed by the financial profile. The results achieved show the potential of DL in social science research, particularly in Finance
    corecore