20,401 research outputs found
Hybrid Model For Word Prediction Using Naive Bayes and Latent Information
Historically, the Natural Language Processing area has been given too much
attention by many researchers. One of the main motivation beyond this interest
is related to the word prediction problem, which states that given a set words
in a sentence, one can recommend the next word. In literature, this problem is
solved by methods based on syntactic or semantic analysis. Solely, each of
these analysis cannot achieve practical results for end-user applications. For
instance, the Latent Semantic Analysis can handle semantic features of text,
but cannot suggest words considering syntactical rules. On the other hand,
there are models that treat both methods together and achieve state-of-the-art
results, e.g. Deep Learning. These models can demand high computational effort,
which can make the model infeasible for certain types of applications. With the
advance of the technology and mathematical models, it is possible to develop
faster systems with more accuracy. This work proposes a hybrid word suggestion
model, based on Naive Bayes and Latent Semantic Analysis, considering
neighbouring words around unfilled gaps. Results show that this model could
achieve 44.2% of accuracy in the MSR Sentence Completion Challenge
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
EVALUATING DISTRIBUTED WORD REPRESENTATIONS FOR PREDICTING MISSING WORDS IN SENTENCES
In recent years, the distributed representation of words in vector space or word embeddings have become very popular as they have shown significant improvements in many statistical natural language processing (NLP) tasks as compared to traditional language models like Ngram. In this thesis, we explored various state-of-the-art methods like Latent Semantic Analysis, word2vec, and GloVe to learn the distributed representation of words. Their performance was compared based on the accuracy achieved when tasked with selecting the right missing word in the sentence, given five possible options. For this NLP task we trained each of these methods using a training corpus that contained texts of around five hundred 19th century novels from Project Gutenberg. The test set contained 1040 sentences where one word was missing from each sentence. The training and test set were part of the Microsoft Research Sentence Completion Challenge data set. In this work, word vectors obtained by training skip-gram model of word2vec showed the highest accuracy in finding the missing word in the sentences among all the methods tested. We also found that tuning hyperparameters of the models helped in capturing greater syntactic and semantic regularities among words
- …