Search CORE

8,906 research outputs found

Sentence similarity-based source context modelling in PBSMT

Author: Banchs Rafael E.
Costa-Jussá Marta
Haque Rejwanul
Kumar Naskar Sudip
Way Andy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2010
Field of study

Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation (PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of supertags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with supertag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a supertag-based feature

Crossref

Irish Universities

DCU Online Research Access Service

Sentence Semantic Similarity based Complex Network approach for Word Sense Disambiguation

Author: Gopal Mohadikar et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 02/11/2023
Field of study

Word Sense Disambiguation is a branch of Natural Language Processing(NLP) that deals with multi-sense words. The multi-sense words are referred to as the polysemous words. The term lexical ambiguity is introduced by the multi-sense words. The existing sense disambiguation module works effectively for single sentences with available context information. The word embedding plays a vital role in the process of disambiguation. The context-dependent word embedding model is used for disambiguation. The main goal of this research paper is to disambiguate the polysemous words by considering available context information. The main identified challenge of disambiguation is the ambiguous word without context information. The discussed complex network approach is disambiguating ambiguous sentences by considering the semantic similarities. The sentence semantic similarity-based network is constructed for disambiguating ambiguous sentences. The proposed methodology is trained with SemCor, Adaptive-Lex, and OMSTI standard lexical resources. The findings state that the discussed methodology is working fine for disambiguating large documents where the sense of ambiguous sentences is on the adjacent sentences

International Journal on Recent and Innovation Trends in Computing and Communication

Recommended from our members

NATURAL LANGUAGE PROCESSING BASED GENERATOR OF TESTING INSTRUMENTS

Author: Wang Qianqian
Publication venue: CSUSB ScholarWorks
Publication date: 01/09/2017
Field of study

Natural Language Processing (NLP) is the field of study that focuses on the interactions between human language and computers. By “natural language” we mean a language that is used for everyday communication by humans. Different from programming languages, natural languages are hard to be defined with accurate rules. NLP is developing rapidly and it has been widely used in different industries. Technologies based on NLP are becoming increasingly widespread, for example, Siri or Alexa are intelligent personal assistants using NLP build in an algorithm to communicate with people. “Natural Language Processing Based Generator of Testing Instruments” is a stand-alone program that generates “plausible” multiple-choice selections by analyzing word sense disambiguation and calculating semantic similarity between two natural language entities. The core is Word Sense Disambiguation (WSD), WSD is identifying which sense of a word is used in a sentence when the word has multiple meanings. WSD is considered as an AI-hard problem. The project presents several algorithms to resolve WSD problem and compute semantic similarity, along with experimental results demonstrating their effectiveness

CSUSB ScholarWorks

Syntactic and Semantic Analysis and Visualization of Unstructured English Texts

Author: Karmakar Saurav
Publication venue: ScholarWorks @ Georgia State University
Publication date: 14/12/2011
Field of study

People have complex thoughts, and they often express their thoughts with complex sentences using natural languages. This complexity may facilitate efficient communications among the audience with the same knowledge base. But on the other hand, for a different or new audience this composition becomes cumbersome to understand and analyze. Analysis of such compositions using syntactic or semantic measures is a challenging job and defines the base step for natural language processing. In this dissertation I explore and propose a number of new techniques to analyze and visualize the syntactic and semantic patterns of unstructured English texts. The syntactic analysis is done through a proposed visualization technique which categorizes and compares different English compositions based on their different reading complexity metrics. For the semantic analysis I use Latent Semantic Analysis (LSA) to analyze the hidden patterns in complex compositions. I have used this technique to analyze comments from a social visualization web site for detecting the irrelevant ones (e.g., spam). The patterns of collaborations are also studied through statistical analysis. Word sense disambiguation is used to figure out the correct sense of a word in a sentence or composition. Using textual similarity measure, based on the different word similarity measures and word sense disambiguation on collaborative text snippets from social collaborative environment, reveals a direction to untie the knots of complex hidden patterns of collaboration

ScholarWorks @ Georgia State University

Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning

Author: Cheng Jianpeng
Kartsaklis Dimitri
Publication venue
Publication date: 01/01/2015
Field of study

Deep compositional models of meaning acting on distributional representations of words in order to produce vectors of larger text constituents are evolving to a popular area of NLP research. We detail a compositional distributional framework based on a rich form of word embeddings that aims at facilitating the interactions between words in the context of a sentence. Embeddings and composition layers are jointly learned against a generic objective that enhances the vectors with syntactic information from the surrounding context. Furthermore, each word is associated with a number of senses, the most plausible of which is selected dynamically during the composition process. We evaluate the produced vectors qualitatively and quantitatively with positive results. At the sentence level, the effectiveness of the framework is demonstrated on the MSRPar task, for which we report results within the state-of-the-art range.Comment: Accepted for presentation at EMNLP 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Resolving Lexical Ambiguity in Tensor Regression Models of Meaning

Author: Kalchbrenner Nal
Kartsaklis Dimitri
Sadrzadeh Mehrnoosh
Publication venue
Publication date: 01/01/2014
Field of study

This paper provides a method for improving tensor-based compositional distributional models of meaning by the addition of an explicit disambiguation step prior to composition. In contrast with previous research where this hypothesis has been successfully tested against relatively simple compositional models, in our work we use a robust model trained with linear regression. The results we get in two experiments show the superiority of the prior disambiguation method and suggest that the effectiveness of this approach is model-independent

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Selective Sampling for Example-based Word Sense Disambiguation

Author: Fujii Atsushi
Inui Kentaro
Tanaka Hozumi
Tokunaga Takenobu
Publication venue
Publication date: 01/01/1998
Field of study

This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search). To counter these problems, our method selectively samples a smaller-sized effective subset from a given example set for use in word sense disambiguation. Our method is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system. The system progressively collects examples by selecting those with greatest utility. The paper reports the effectiveness of our method through experiments on about one thousand sentences. Compared to experiments with other example sampling methods, our method reduced both the overhead for supervision and the overhead for search, without the degeneration of the performance of the system.Comment: 25 pages, 14 Postscript figure

arXiv.org e-Print Archive

CiteSeerX