20,401 research outputs found

    Hybrid Model For Word Prediction Using Naive Bayes and Latent Information

    Get PDF
    Historically, the Natural Language Processing area has been given too much attention by many researchers. One of the main motivation beyond this interest is related to the word prediction problem, which states that given a set words in a sentence, one can recommend the next word. In literature, this problem is solved by methods based on syntactic or semantic analysis. Solely, each of these analysis cannot achieve practical results for end-user applications. For instance, the Latent Semantic Analysis can handle semantic features of text, but cannot suggest words considering syntactical rules. On the other hand, there are models that treat both methods together and achieve state-of-the-art results, e.g. Deep Learning. These models can demand high computational effort, which can make the model infeasible for certain types of applications. With the advance of the technology and mathematical models, it is possible to develop faster systems with more accuracy. This work proposes a hybrid word suggestion model, based on Naive Bayes and Latent Semantic Analysis, considering neighbouring words around unfilled gaps. Results show that this model could achieve 44.2% of accuracy in the MSR Sentence Completion Challenge

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    EVALUATING DISTRIBUTED WORD REPRESENTATIONS FOR PREDICTING MISSING WORDS IN SENTENCES

    Full text link
    In recent years, the distributed representation of words in vector space or word embeddings have become very popular as they have shown significant improvements in many statistical natural language processing (NLP) tasks as compared to traditional language models like Ngram. In this thesis, we explored various state-of-the-art methods like Latent Semantic Analysis, word2vec, and GloVe to learn the distributed representation of words. Their performance was compared based on the accuracy achieved when tasked with selecting the right missing word in the sentence, given five possible options. For this NLP task we trained each of these methods using a training corpus that contained texts of around five hundred 19th century novels from Project Gutenberg. The test set contained 1040 sentences where one word was missing from each sentence. The training and test set were part of the Microsoft Research Sentence Completion Challenge data set. In this work, word vectors obtained by training skip-gram model of word2vec showed the highest accuracy in finding the missing word in the sentences among all the methods tested. We also found that tuning hyperparameters of the models helped in capturing greater syntactic and semantic regularities among words
    • …
    corecore