11,849 research outputs found
Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic
Language modeling for an inflected language
such as Arabic poses new challenges for speech recognition and
machine translation due to its rich morphology. Rich morphology
results in large increases in out-of-vocabulary (OOV) rate and
poor language model parameter estimation in the absence of large
quantities of data. In this study, we present a joint
morphological-lexical language model (JMLLM) that takes
advantage of Arabic morphology. JMLLM combines
morphological segments with the underlying lexical items and
additional available information sources with regards to
morphological segments and lexical items in a single joint model.
Joint representation and modeling of morphological and lexical
items reduces the OOV rate and provides smooth probability
estimates while keeping the predictive power of whole words.
Speech recognition and machine translation experiments in
dialectal-Arabic show improvements over word and morpheme
based trigram language models. We also show that as the
tightness of integration between different information sources
increases, both speech recognition and machine translation
performances improve
Using Description Logics for Recognising Textual Entailment
The aim of this paper is to show how we can handle the Recognising Textual
Entailment (RTE) task by using Description Logics (DLs). To do this, we propose
a representation of natural language semantics in DLs inspired by existing
representations in first-order logic. But our most significant contribution is
the definition of two novel inference tasks: A-Box saturation and subgraph
detection which are crucial for our approach to RTE
Robust Subgraph Generation Improves Abstract Meaning Representation Parsing
The Abstract Meaning Representation (AMR) is a representation for open-domain
rich semantics, with potential use in fields like event extraction and machine
translation. Node generation, typically done using a simple dictionary lookup,
is currently an important limiting factor in AMR parsing. We propose a small
set of actions that derive AMR subgraphs by transformations on spans of text,
which allows for more robust learning of this stage. Our set of construction
actions generalize better than the previous approach, and can be learned with a
simple classifier. We improve on the previous state-of-the-art result for AMR
parsing, boosting end-to-end performance by 3 F on both the LDC2013E117 and
LDC2014T12 datasets.Comment: To appear in ACL 201
Thematic Annotation: extracting concepts out of documents
Contrarily to standard approaches to topic annotation, the technique used in
this work does not centrally rely on some sort of -- possibly statistical --
keyword extraction. In fact, the proposed annotation algorithm uses a large
scale semantic database -- the EDR Electronic Dictionary -- that provides a
concept hierarchy based on hyponym and hypernym relations. This concept
hierarchy is used to generate a synthetic representation of the document by
aggregating the words present in topically homogeneous document segments into a
set of concepts best preserving the document's content.
This new extraction technique uses an unexplored approach to topic selection.
Instead of using semantic similarity measures based on a semantic resource, the
later is processed to extract the part of the conceptual hierarchy relevant to
the document content. Then this conceptual hierarchy is searched to extract the
most relevant set of concepts to represent the topics discussed in the
document. Notice that this algorithm is able to extract generic concepts that
are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure
On Yao's method of translation
Machine Translation, i.e., translating one kind of natural language to another kind of natural language by using a computer system, is a very important research branch in Artificial Intelligence. Yao developed a method of translation that he called ``Lexical-Semantic Driven". In his system he introduced 49 ``relation types" including case relations, event relations, semantic relations, and complex relations. The knowledge graph method is a new kind of method to represent an interlingua between natural languages. In this paper, we will give a comparison of these two methods. We will translate one Chinese sentence cited in Yao�s book by using these two methods. Finally, we will use the relations in knowledge graph theory to represent the ``relations" in Lexical-Semantic Driven, and partition the relations in Lexical-Semantic Driven into groups according to the relations in knowledge graph theory
The Organization of Words in Mental Lexicon: Evidence From Word Association Test
Both in psychology and linguistics studies, memory is one of the core of interests amongst researchers. In linguistics, memory is the place where language processes consisting of language perception, storage, and access of words take place. Words, in memory, are stored in complex, clear, well-organized, and ordered networks called nodes, which can be represented by World Wide Web. This word organization in psycholinguistics is referred to mental lexicon. This study aims to investigate what kind of nodes representation stored in mental lexicon of foreign language learners. Word Association Test (WAT), the well-known study method in both psychology and linguistics studies, is employed by using English Swadesh word list as the stimulus to elicit the lexical relation amongst words. The basic principle of the test is giving a stimulus to respondents and asking them to give the very first word coming out of their mind. Respondents are undergraduate students of English Literature a university in Indonesia. Findings of this research support the previous findings stating that non-native speakers tend to make syntagmatic relation, which is mostly dominated by collocation association. Interestingly, the finding also shows that the words network in mental lexicon involves a dynamic development based on experience and perception of the respondents
- …