Search CORE

31,624 research outputs found

Statistical Function Tagging and Grammatical Relations of Myanmar Sentences

Author: Htwe Tin Myat
Thant Win Win
Thein Ni Lar
Publication venue
Publication date: 25/09/2011
Field of study

This paper describes a context free grammar (CFG) based grammatical relations for Myanmar sentences which combine corpus-based function tagging system. Part of the challenge of statistical function tagging for Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex morphological system. Function tagging is a pre-processing step to show grammatical relations of Myanmar sentences. In the task of function tagging, which tags the function of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging and chunking information, we use Naive Bayesian theory to disambiguate the possible function tags of a word. We apply context free grammar (CFG) to find out the grammatical relations of the function tags. We also create a functional annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar sentences. Experiments show that our analysis achieves a good result with simple sentences and complex sentences.Comment: 16 pages, 7 figures, 8 tables, AIAA-2011 (India). arXiv admin note: text overlap with arXiv:0912.1820 by other author

arXiv.org e-Print Archive

CiteSeerX

MERAL Portal

Using machine-learning to assign function labels to parser output for Spanish

Author: Chrupała Grzegorz
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2006
Field of study

Data-driven grammatical function tag assignment has been studied for English using the Penn-II Treebank data. In this paper we address the question of whether such methods can be applied successfully to other languages and treebank resources. In addition to tag assignment accuracy and f-scores we also present results of a task-based evaluation. We use three machine-learning methods to assign Cast3LB function tags to sentences parsed with Bikel’s parser trained on the Cast3LB treebank. The best performing method, SVM, achieves an f-score of 86.87% on gold-standard trees and 66.67% on parser output - a statistically significant improvement of 6.74% over the baseline. In a task-based evaluation we generate LFG functional-structures from the function tag-enriched trees. On this task we achive an f-score of 75.67%, a statistically significant 3.4% improvement over the baseline

CiteSeerX

Irish Universities

DCU Online Research Access Service

A Machine learning approach to POS tagging

Author: Màrquez Villodre Lluís
Padró Lluís
Rodríguez Hontoria Horacio
Publication venue
Publication date: 01/01/1997
Field of study

We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Unsupervised continuous-valued word features for phrase-break prediction without a part-of-speech tagger.

Author: King Simon
Watts Oliver
Yamagishi Junichi
Publication venue
Publication date: 01/08/2011
Field of study

Edinburgh Research Explorer

Towards Universal Semantic Tagging

Author: Abzianidze Lasha
Bos Johan
Publication venue
Publication date: 29/09/2017
Field of study

The paper proposes the task of universal semantic tagging---tagging word tokens with language-neutral, semantically informative tags. We argue that the task, with its independent nature, contributes to better semantic analysis for wide-coverage multilingual text. We present the initial version of the semantic tagset and show that (a) the tags provide semantically fine-grained information, and (b) they are suitable for cross-lingual semantic parsing. An application of the semantic tagging in the Parallel Meaning Bank supports both of these points as the tags contribute to formal lexical semantics and their cross-lingual projection. As a part of the application, we annotate a small corpus with the semantic tags and present new baseline result for universal semantic tagging.Comment: 9 pages, International Conference on Computational Semantics (IWCS

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Using Incomplete Information for Complete Weight Annotation of Road Networks -- Extended Version

Author: Jensen Christian S.
Kaul Manohar
Yang Bin
Publication venue
Publication date: 01/01/2013
Field of study

We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective.Comment: This is an extended version of "Using Incomplete Information for Complete Weight Annotation of Road Networks," which is accepted for publication in IEEE TKD

arXiv.org e-Print Archive

CiteSeerX

VBN

Hypergraph model of social tagging networks

Author: Blattner M
Cattuto C
Cattuto C
Chuang Liu
Dellschaft K Staab S
Halpin H Robu V Shepherd H
Karypis G Aggarwal R Kumar V Shekhar S
Palla G
Sen S Lam S K Rashid A M Cosley D Frankowski D Osterhouse J Harper F M Riedl J
Shang M-S
Zi-Ke Zhang
Publication venue: 'IOP Publishing'
Publication date: 09/03/2010
Field of study

The past few years have witnessed the great success of a new family of paradigms, so-called folksonomy, which allows users to freely associate tags to resources and efficiently manage them. In order to uncover the underlying structures and user behaviors in folksonomy, in this paper, we propose an evolutionary hypergrah model to explain the emerging statistical properties. The present model introduces a novel mechanism that one can not only assign tags to resources, but also retrieve resources via collaborative tags. We then compare the model with a real-world dataset: \emph{Del.icio.us}. Indeed, the present model shows considerable agreement with the empirical data in following aspects: power-law hyperdegree distributions, negtive correlation between clustering coefficients and hyperdegrees, and small average distances. Furthermore, the model indicates that most tagging behaviors are motivated by labeling tags to resources, and tags play a significant role in effectively retrieving interesting resources and making acquaintance with congenial friends. The proposed model may shed some light on the in-depth understanding of the structure and function of folksonomy.Comment: 7 pages,7 figures, 32 reference

arXiv.org e-Print Archive

Crossref

Three New Probabilistic Models for Dependency Parsing: An Exploration

Author: Eisner Jason
Publication venue
Publication date: 01/01/1997
Field of study

After presenting a novel O(n^3) parsing algorithm for dependency grammar, we develop three contrasting ways to stochasticize it. We propose (a) a lexical affinity model where words struggle to modify each other, (b) a sense tagging model where words fluctuate randomly in their selectional preferences, and (c) a generative model where the speaker fleshes out each word's syntactic and conceptual structure without regard to the implications for the hearer. We also give preliminary empirical results from evaluating the three models' parsing performance on annotated Wall Street Journal training text (derived from the Penn Treebank). In these results, the generative (i.e., top-down) model performs significantly better than the others, and does about equally well at assigning part-of-speech tags.Comment: 6 pages, LaTeX 2.09 packaged with 4 .eps files, also uses colap.sty and acl.bs

arXiv.org e-Print Archive

CiteSeerX