Search CORE

1,040 research outputs found

Evaluation of Computational Grammar Formalisms for Indian Languages

Author: Joshi Nisheeth
Mathur Iti
Publication venue
Publication date: 01/11/2010
Field of study

Natural Language Parsing has been the most prominent research area since the genesis of Natural Language Processing. Probabilistic Parsers are being developed to make the process of parser development much easier, accurate and fast. In Indian context, identification of which Computational Grammar Formalism is to be used is still a question which needs to be answered. In this paper we focus on this problem and try to analyze different formalisms for Indian languages

CogPrints Cognitive Sciences Eprint Archive

Cross-lingual RST Discourse Parsing

Author: Braud Chloé
Coavoux Maximin
Søgaard Anders
Publication venue
Publication date: 01/01/2017
Field of study

Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

A Pāniniān Framework for Analyzing Case Marker Errors in English-Urdu Machine Translation

Author: Behera Pitambar
Jha Girish Nath
Muzaffar Sharmin
Publication venue: The Author(s). Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractPanini's Kāraka Theory is solely based on the syntactico-semantic approach to understanding a natural language which takes into consideration the arguments of the verbs. It provides a framework for exhibiting the syntactic relations among constituents in terms of modifier-modified and semantic relations with respect to Kāraka-Vibhakt̪i (semantic role and postposition).In this paper, it has been argued that Pāniniān Dependency Framework can be considered to deal with the MT errors with special reference to case. Firstly, a corpus of approximately 500 English sentences as input have been provided to Google and Bing online MT platforms. Thereafter, all the output sentences in Urdu have been collated in bulk. Thirdly, all the sentences have been evaluated and errors pertaining to case have been categorized based on the Gold Standard. Finally, Pāniniān dependency framework has been proposed for addressing the case-related errors for Indian languages

Elsevier - Publisher Connector

Crossings as a side effect of dependency lengths

Author: Bick
Christensen
Conover
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Futrell
Gibson
Gildea
Gildea
Gómez-Rodríguez
Hays
Hochberg
Hudson
Iwatate
Jiang
Kawata
Kelih
Liu
Lu
Newman
Poirier
Popper
Prokhorov
Ramasamy
Tanaka
Temperley
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, i.e. sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in Complexity (Wiley

arXiv.org e-Print Archive

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

Author: Sandhan Jivnesh
Publication venue
Publication date: 17/08/2023
Field of study

The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also challenging for Sanskrit due to the context-sensitive semantic relation between components. All these challenges result in sub-optimal performance in NLP applications like question answering and machine translation. Finally, Sanskrit poetry has not been extensively studied in computational linguistics. While addressing these challenges, this thesis makes various contributions: (1) The thesis proposes linguistically-informed neural architectures for these tasks. (2) We showcase the interpretability and multilingual extension of the proposed systems. (3) Our proposed systems report state-of-the-art performance. (4) Finally, we present a neural toolkit named SanskritShala, a web-based application that provides real-time analysis of input for various NLP tasks. Overall, this thesis contributes to making Sanskrit manuscripts more accessible by developing robust NLP technology and releasing various resources, datasets, and web-based toolkit.Comment: Ph.D. dissertatio

arXiv.org e-Print Archive