Search CORE

11,849 research outputs found

Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic

Author: Afify Mohamed
Deng Yonggang
Erdogan Hakan
Erdoğan Hakan
Gao Yuqing
Sarıkaya Ruhi
Sarikaya Ruhi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve

CiteSeerX

Crossref

Sabanci University Research Database

Using Description Logics for Recognising Textual Entailment

Author: Bedaride Paul
Publication venue
Publication date: 17/08/2007
Field of study

The aim of this paper is to show how we can handle the Recognising Textual Entailment (RTE) task by using Description Logics (DLs). To do this, we propose a representation of natural language semantics in DLs inspired by existing representations in first-order logic. But our most significant contribution is the definition of two novel inference tasks: A-Box saturation and subgraph detection which are crucial for our approach to RTE

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Robust Subgraph Generation Improves Abstract Meaning Representation Parsing

Author: Angeli Gabor
Manning Christopher
Werling Keenon
Publication venue
Publication date: 09/06/2015
Field of study

The Abstract Meaning Representation (AMR) is a representation for open-domain rich semantics, with potential use in fields like event extraction and machine translation. Node generation, typically done using a simple dictionary lookup, is currently an important limiting factor in AMR parsing. We propose a small set of actions that derive AMR subgraphs by transformations on spans of text, which allows for more robust learning of this stage. Our set of construction actions generalize better than the previous approach, and can be learned with a simple classifier. We improve on the previous state-of-the-art result for AMR parsing, boosting end-to-end performance by 3 F

_1

on both the LDC2013E117 and LDC2014T12 datasets.Comment: To appear in ACL 201

arXiv.org e-Print Archive

CiteSeerX

Thematic Annotation: extracting concepts out of documents

Author: Andrews Pierre
Rajman Martin
Publication venue
Publication date: 29/12/2004
Field of study

Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

On Yao's method of translation

Author: Hoede C.
Liu X.
Publication venue: Department of Applied Mathematics, University of Twente
Publication date: 01/01/2002
Field of study

Machine Translation, i.e., translating one kind of natural language to another kind of natural language by using a computer system, is a very important research branch in Artificial Intelligence. Yao developed a method of translation that he called ``Lexical-Semantic Driven". In his system he introduced 49 ``relation types" including case relations, event relations, semantic relations, and complex relations. The knowledge graph method is a new kind of method to represent an interlingua between natural languages. In this paper, we will give a comparison of these two methods. We will translate one Chinese sentence cited in Yao�s book by using these two methods. Finally, we will use the relations in knowledge graph theory to represent the ``relations" in Lexical-Semantic Driven, and partition the relations in Lexical-Semantic Driven into groups according to the relations in knowledge graph theory

University of Twente Research Information

The Organization of Words in Mental Lexicon: Evidence From Word Association Test

Author: Afrilita L. K. (Lidia)
Pranoto B. E. (Budi)
Publication venue: 'Universitas Teknokrat Indonesia'
Publication date: 01/01/2018
Field of study

Both in psychology and linguistics studies, memory is one of the core of interests amongst researchers. In linguistics, memory is the place where language processes consisting of language perception, storage, and access of words take place. Words, in memory, are stored in complex, clear, well-organized, and ordered networks called nodes, which can be represented by World Wide Web. This word organization in psycholinguistics is referred to mental lexicon. This study aims to investigate what kind of nodes representation stored in mental lexicon of foreign language learners. Word Association Test (WAT), the well-known study method in both psychology and linguistics studies, is employed by using English Swadesh word list as the stimulus to elicit the lexical relation amongst words. The basic principle of the test is giving a stimulus to respondents and asking them to give the very first word coming out of their mind. Respondents are undergraduate students of English Literature a university in Indonesia. Findings of this research support the previous findings stating that non-native speakers tend to make syntagmatic relation, which is mostly dominated by collocation association. Interestingly, the finding also shows that the words network in mental lexicon involves a dynamic development based on experience and perception of the respondents

Neliti