11 research outputs found

    Deep Active Learning for Named Entity Recognition

    Get PDF
    Deep learning has yielded state-of-the-art performance on many natural language processing tasks including named entity recognition (NER). However, this typically requires large amounts of labeled data. In this work, we demonstrate that the amount of labeled training data can be drastically reduced when deep learning is combined with active learning. While active learning is sample-efficient, it can be computationally expensive since it requires iterative retraining. To speed this up, we introduce a lightweight architecture for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and word encoders and a long short term memory (LSTM) tag decoder. The model achieves nearly state-of-the-art performance on standard datasets for the task while being computationally much more efficient than best performing models. We carry out incremental active learning, during the training process, and are able to nearly match state-of-the-art performance with just 25\% of the original training data

    The digital divide in Worcester

    No full text
    This IQP focuses on a review of the technology gap that is growing between various socioeconomic groups throughout the world, the United States, and locally in Worcester County. Certain people, the technology "haves," possess the best information technology that society has to offer. This opens to them a wealth of information. The technology "have-nots" lack these resources, and as such, lack the resources to succeed in the new information-based economy. The result has been dubbed the "digital divide.

    Improving Translation via Targeted Paraphrasing

    No full text
    Targeted paraphrasing is a new approach to the problem of obtaining cost-effective, reasonable quality translation that makes use of simple and inexpensive human computations by monolingual speakers in combination with machine translation. The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i.e. paraphrases) with only monolingual knowledge of the source language. Evaluations demonstrate that this approach can yield substantial improvements in translation quality.
    corecore