535 research outputs found
SMDDH: Singleton Mention detection using Deep Learning in Hindi Text
Mention detection is an important component of coreference resolution system,
where mentions such as name, nominal, and pronominals are identified. These
mentions can be purely coreferential mentions or singleton mentions
(non-coreferential mentions). Coreferential mentions are those mentions in a
text that refer to the same entities in a real world. Whereas, singleton
mentions are mentioned only once in the text and do not participate in the
coreference as they are not mentioned again in the following text. Filtering of
these singleton mentions can substantially improve the performance of a
coreference resolution process. This paper proposes a singleton mention
detection module based on a fully connected network and a Convolutional neural
network for Hindi text. This model utilizes a few hand-crafted features and
context information, and word embedding for words. The coreference annotated
Hindi dataset comprising of 3.6K sentences, and 78K tokens are used for the
task. In terms of Precision, Recall, and F-measure, the experimental findings
obtained are excellent
Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines
Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF
A Comparative Analysis of Opinion Mining and Sentiment Classification in Non-english Languages
In the past decade many opinion mining and sentiment classification studies have been carried out for opinions in English. However, the amount of work done for non-English text opinions is very limited.In this review, we investigate opinion mining and sentiment classification studies in three non-English languages to find the classification methods and the efficiency of each algorithm used in these methods. It is found that most of the research conducted for non-English has followed the methods used in the English language with onlylimited usage of language specific properties, such as morphological variations. The application domains seem to be restricted to particular fields and significantly less research has been conducted in cross domains. Keywords—Natural Language processing, Text mining, Machine Learning
Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources
Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen
- …