31,624 research outputs found
Statistical Function Tagging and Grammatical Relations of Myanmar Sentences
This paper describes a context free grammar (CFG) based grammatical relations
for Myanmar sentences which combine corpus-based function tagging system. Part
of the challenge of statistical function tagging for Myanmar sentences comes
from the fact that Myanmar has free-phrase-order and a complex morphological
system. Function tagging is a pre-processing step to show grammatical relations
of Myanmar sentences. In the task of function tagging, which tags the function
of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging
and chunking information, we use Naive Bayesian theory to disambiguate the
possible function tags of a word. We apply context free grammar (CFG) to find
out the grammatical relations of the function tags. We also create a functional
annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar
sentences. Experiments show that our analysis achieves a good result with
simple sentences and complex sentences.Comment: 16 pages, 7 figures, 8 tables, AIAA-2011 (India). arXiv admin note:
text overlap with arXiv:0912.1820 by other author
Using machine-learning to assign function labels to parser output for Spanish
Data-driven grammatical function tag assignment has been studied for English using the Penn-II Treebank data. In this paper we address the question of whether such methods can be applied successfully to other languages and treebank resources. In addition to tag assignment accuracy
and f-scores we also present results of a task-based evaluation. We use three machine-learning methods to assign
Cast3LB function tags to sentences parsed with Bikel’s parser trained on the Cast3LB treebank. The best performing method, SVM, achieves an f-score of 86.87% on gold-standard trees and 66.67% on parser output - a statistically significant improvement of 6.74% over the baseline. In a
task-based evaluation we generate LFG functional-structures from the function tag-enriched trees. On this task we achive
an f-score of 75.67%, a statistically significant 3.4% improvement over the baseline
A Machine learning approach to POS tagging
We have applied inductive learning of statistical decision trees
and relaxation labelling to the Natural Language Processing (NLP)
task of morphosyntactic disambiguation (Part Of Speech Tagging).
The learning process is supervised and obtains a language
model oriented to resolve POS ambiguities. This model consists
of a set of statistical decision trees expressing distribution of
tags and words in some relevant contexts.
The acquired language models are complete enough to be directly
used as sets of POS disambiguation rules, and include more complex
contextual information than simple collections of n-grams usually
used in statistical taggers.
We have implemented a quite simple and fast tagger that has been
tested and evaluated on the Wall Street Journal (WSJ) corpus with
a remarkable accuracy.
However, better results can be obtained by translating the trees
into rules to feed a flexible relaxation labelling based tagger.
In this direction we describe a tagger which is able to use
information of any kind (n-grams, automatically acquired constraints,
linguistically motivated manually written constraints, etc.), and in
particular to incorporate the machine learned decision trees.
Simultaneously, we address the problem of tagging when only
small training material is available, which is crucial in any process
of constructing, from scratch, an annotated corpus. We show that quite
high accuracy can be achieved with our system in this situation.Postprint (published version
Towards Universal Semantic Tagging
The paper proposes the task of universal semantic tagging---tagging word
tokens with language-neutral, semantically informative tags. We argue that the
task, with its independent nature, contributes to better semantic analysis for
wide-coverage multilingual text. We present the initial version of the semantic
tagset and show that (a) the tags provide semantically fine-grained
information, and (b) they are suitable for cross-lingual semantic parsing. An
application of the semantic tagging in the Parallel Meaning Bank supports both
of these points as the tags contribute to formal lexical semantics and their
cross-lingual projection. As a part of the application, we annotate a small
corpus with the semantic tags and present new baseline result for universal
semantic tagging.Comment: 9 pages, International Conference on Computational Semantics (IWCS
Using Incomplete Information for Complete Weight Annotation of Road Networks -- Extended Version
We are witnessing increasing interests in the effective use of road networks.
For example, to enable effective vehicle routing, weighted-graph models of
transportation networks are used, where the weight of an edge captures some
cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions
or travel time. It is a precondition to using a graph model for routing that
all edges have weights. Weights that capture travel times and GHG emissions can
be extracted from GPS trajectory data collected from the network. However, GPS
trajectory data typically lack the coverage needed to assign weights to all
edges. This paper formulates and addresses the problem of annotating all edges
in a road network with travel cost based weights from a set of trips in the
network that cover only a small fraction of the edges, each with an associated
ground-truth travel cost. A general framework is proposed to solve the problem.
Specifically, the problem is modeled as a regression problem and solved by
minimizing a judiciously designed objective function that takes into account
the topology of the road network. In particular, the use of weighted PageRank
values of edges is explored for assigning appropriate weights to all edges, and
the property of directional adjacency of edges is also taken into account to
assign weights. Empirical studies with weights capturing travel time and GHG
emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark)
offer insight into the design properties of the proposed techniques and offer
evidence that the techniques are effective.Comment: This is an extended version of "Using Incomplete Information for
Complete Weight Annotation of Road Networks," which is accepted for
publication in IEEE TKD
Hypergraph model of social tagging networks
The past few years have witnessed the great success of a new family of
paradigms, so-called folksonomy, which allows users to freely associate tags to
resources and efficiently manage them. In order to uncover the underlying
structures and user behaviors in folksonomy, in this paper, we propose an
evolutionary hypergrah model to explain the emerging statistical properties.
The present model introduces a novel mechanism that one can not only assign
tags to resources, but also retrieve resources via collaborative tags. We then
compare the model with a real-world dataset: \emph{Del.icio.us}. Indeed, the
present model shows considerable agreement with the empirical data in following
aspects: power-law hyperdegree distributions, negtive correlation between
clustering coefficients and hyperdegrees, and small average distances.
Furthermore, the model indicates that most tagging behaviors are motivated by
labeling tags to resources, and tags play a significant role in effectively
retrieving interesting resources and making acquaintance with congenial
friends. The proposed model may shed some light on the in-depth understanding
of the structure and function of folksonomy.Comment: 7 pages,7 figures, 32 reference
Three New Probabilistic Models for Dependency Parsing: An Exploration
After presenting a novel O(n^3) parsing algorithm for dependency grammar, we
develop three contrasting ways to stochasticize it. We propose (a) a lexical
affinity model where words struggle to modify each other, (b) a sense tagging
model where words fluctuate randomly in their selectional preferences, and (c)
a generative model where the speaker fleshes out each word's syntactic and
conceptual structure without regard to the implications for the hearer. We also
give preliminary empirical results from evaluating the three models' parsing
performance on annotated Wall Street Journal training text (derived from the
Penn Treebank). In these results, the generative (i.e., top-down) model
performs significantly better than the others, and does about equally well at
assigning part-of-speech tags.Comment: 6 pages, LaTeX 2.09 packaged with 4 .eps files, also uses colap.sty
and acl.bs
- …