Search CORE

1,737 research outputs found

A Survey of Paraphrasing and Textual Entailment Methods

Author: Androutsopoulos Ion
Malakasiotis Prodromos
Publication venue: 'AI Access Foundation'
Publication date: 30/05/2010
Field of study

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

arXiv.org e-Print Archive

Crossref

Treebank-based acquisition of LFG resources for Chinese

Author: Guo Yuqing
van Genabith Josef
Wang Haifeng
Publication venue: CSLI Publications
Publication date: 01/01/2007
Field of study

This paper presents a method to automatically acquire wide-coverage, robust, probabilistic Lexical-Functional Grammar resources for Chinese from the Penn Chinese Treebank (CTB). Our starting point is the earlier, proofof- concept work of (Burke et al., 2004) on automatic f-structure annotation, LFG grammar acquisition and parsing for Chinese using the CTB version 2 (CTB2). We substantially extend and improve on this earlier research as regards coverage, robustness, quality and fine-grainedness of the resulting LFG resources. We achieve this through (i) improved LFG analyses for a number of core Chinese phenomena; (ii) a new automatic f-structure annotation architecture which involves an intermediate dependency representation; (iii) scaling the approach from 4.1K trees in CTB2 to 18.8K trees in CTB version 5.1 (CTB5.1) and (iv) developing a novel treebank-based approach to recovering non-local dependencies (NLDs) for Chinese parser output. Against a new 200-sentence good standard of manually constructed f-structures, the method achieves 96.00% f-score for f-structures automatically generated for the original CTB trees and 80.01%for NLD-recovered f-structures generated for the trees output by Bikel’s parser

Irish Universities

DCU Online Research Access Service

Scientific Information Extraction with Semi-supervised Neural Tagging

Author: Hajishirzi Hannaneh
Luan Yi
Ostendorf Mari
Publication venue
Publication date: 01/01/2017
Field of study

This paper addresses the problem of extracting keyphrases from scientific articles and categorizing them as corresponding to a task, process, or material. We cast the problem as sequence tagging and introduce semi-supervised methods to a neural tagging model, which builds on recent advances in named entity recognition. Since annotated training data is scarce in this domain, we introduce a graph-based semi-supervised algorithm together with a data selection scheme to leverage unannotated articles. Both inductive and transductive semi-supervised learning strategies outperform state-of-the-art information extraction performance on the 2017 SemEval Task 10 ScienceIE task.Comment: accepted by EMNLP 201

arXiv.org e-Print Archive

Crossref

Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL

Author: Berzak Yevgeni
Katz Boris
Reichart Roi
Publication venue: Center for Brains, Minds and Machines (CBMM), arXiv
Publication date: 20/01/2016
Field of study

This work examines the impact of crosslinguistic transfer on grammatical errors in English as Second Language (ESL) texts. Using a computational framework that formalizes the theory of Contrastive Analysis (CA), we demonstrate that language specific error distributions in ESL writing can be predicted from the typological properties of the native language and their relation to the typology of English. Our typology driven model enables to obtain accurate estimates of such distributions without access to any ESL data for the target languages. Furthermore, we present a strategy for adjusting our method to low-resource languages that lack typological documentation using a bootstrapping approach which approximates native language typology from ESL texts. Finally, we show that our framework is instrumental for linguistic inquiry seeking to identify first language factors that contribute to a wide range of difficulties in second language acquisition.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF – 1231216

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Do infants have abstract grammatical knowledge of word order at 17 months? Evidence from Mandarin Chinese

Author: Franck Julie
Gavarró Algueró Anna
Rizzi Luigi
Università degli Studi di Siena. Centro Interdipartimentale di Studi Cognitivi sul Linguaggio
Zhu Jingtao
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2022
Field of study

We test the comprehension of transitive sentences in very young learners of Mandarin Chinese using a combination of the weird word order paradigm with the use of pseudo-verbs and the preferential looking paradigm, replicating the experiment of Franck et al. (2013) on French. Seventeen typically-developing Mandarin infants (mean age: 17.4 months) participated and the same experiment was conducted with eighteen adults. The results show that hearing well-formed NP-V-NP sentences triggered infants to fixate more on a transitive scene than on a reflexive scene. In contrast, when they heard deviant NP-NP-V sequences, no such preference pattern was found, a performance pattern that is adult-like. This is at variance with some of the results from Candan et al. (2012), who only found evidence for canonical word order comprehension at almost age 3 when considering fixation time. Furthermore, within the age range tested, performance showed no effect of age or vocabulary size

Diposit Digital de Documents de la UAB

Recommended from our members

Automatic Segmentation and Part-Of-Speech Tagging For Tibetan: A First Step Towards Machine Translation

Author: Hackett Paul G.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2000
Field of study

This paper presents what we believe to be the first reported work on Tibetan machine translation (MT). Of the three conceptually distinct components of a MT system — analysis, transfer, and generation — the first phase, consisting of POS tagging has been successfully completed. The combination POS tagger / word-segmenter was manually constructed as a rule-based multi-tagger relying on the Wilson formulation of Tibetan grammar. Partial parsing was also performed in combination with POS-tag sequence disambiguation. The component was evaluated at the task of document indexing for Information Retrieval (IR). Preliminary analysis indicated slightly better (though statistically comparable) performance to n-gram based approaches at a known-item IR task. Although segmentation is application specific, error analysis placed segmentation accuracy at 99%; the accuracy of the POS tagger is also estimated at 99% based on IR error analysis and random sampling

Columbia University Academic Commons

A Bootstrapping Method for Finer-Grained Opinion Mining Using Graph Model

Author: Meng Yao
Xia Yingju
Yu Hao
Zhang Shu
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability

Author: Byamugisha Joan
Publication venue: Department of Computer Science
Publication date: 04/03/2020
Field of study

Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, including personalized patient information. However, their application is limited in Africa because they generate text in English, yet indigenous languages are still predominantly spoken throughout the continent, especially in rural areas. The existing healthcare NLG systems cannot be reused for Bantu languages due to the complex grammatical structure, nor can the generated text be used in machine translation systems for Bantu languages because they are computationally under-resourced. This research aimed to verbalize ontologies in agglutinating Bantu languages. We had four research objectives: (1) noun pluralization and verb conjugation in Runyankore; (2) Runyankore verbalization patterns for the selected description logic constructors; (3) combining the pluralization, conjugation, and verbalization components to form a Runyankore grammar engine; and (4) generalizing the Runyankore and isiZulu approaches to ontology verbalization to other agglutinating Bantu languages. We used an approach that combines morphology with syntax and semantics to develop a noun pluralizer for Runyankore, and used Context-Free Grammars (CFGs) for verb conjugation. We developed verbalization algorithms for eight constructors in a description logic. We then combined these components into a grammar engine developed as a Protégé5X plugin. The investigation into generalizability used the bootstrap approach, and investigated bootstrapping for languages in the same language zone (intra-zone bootstrappability) and languages across language zones (inter-zone bootstrappability). We obtained verbalization patterns for Luganda and isiXhosa, in the same zones as Runyankore and isiZulu respectively, and chiShona, Kikuyu, and Kinyarwanda from different zones, and used the bootstrap metric that we developed to identify the most efficient source—target bootstrap pair. By regrouping Meinhof’s noun class system we were able to eliminate non-determinism during computation, and this led to the development of a generic noun pluralizer. We also showed that CFGs can conjugate verbs in the five additional languages. Finally, we proposed the architecture for an API that could be used to generate text in agglutinating Bantu languages. Our research provides a method for surface realization for an under-resourced and grammatically complex family of languages, Bantu languages. We leave the development of a complete NLG system based on the Runyankore grammar engine and of the API as areas for future work

Cape Town University OpenUCT