6,452 research outputs found
Content Differences in Syntactic and Semantic Representations
Syntactic analysis plays an important role in semantic parsing, but the
nature of this role remains a topic of ongoing debate. The debate has been
constrained by the scarcity of empirical comparative studies between syntactic
and semantic schemes, which hinders the development of parsing methods informed
by the details of target schemes and constructions. We target this gap, and
take Universal Dependencies (UD) and UCCA as a test case. After abstracting
away from differences of convention or formalism, we find that most content
divergences can be ascribed to: (1) UCCA's distinction between a Scene and a
non-Scene; (2) UCCA's distinction between primary relations, secondary ones and
participants; (3) different treatment of multi-word expressions, and (4)
different treatment of inter-clause linkage. We further discuss the long tail
of cases where the two schemes take markedly different approaches. Finally, we
show that the proposed comparison methodology can be used for fine-grained
evaluation of UCCA parsing, highlighting both challenges and potential sources
for improvement. The substantial differences between the schemes suggest that
semantic parsers are likely to benefit downstream text understanding
applications beyond their syntactic counterparts.Comment: NAACL-HLT 2019 camera read
One model, two languages: training bilingual parsers with harmonized treebanks
We introduce an approach to train lexicalized parsers using bilingual corpora
obtained by merging harmonized treebanks of different languages, producing
parsers that can analyze sentences in either of the learned languages, or even
sentences that mix both. We test the approach on the Universal Dependency
Treebanks, training with MaltParser and MaltOptimizer. The results show that
these bilingual parsers are more than competitive, as most combinations not
only preserve accuracy, but some even achieve significant improvements over the
corresponding monolingual parsers. Preliminary experiments also show the
approach to be promising on texts with code-switching and when more languages
are added.Comment: 7 pages, 4 tables, 1 figur
Creation of a Style Independent Intelligent Autonomous Citation Indexer to Support Academic Research
This paper describes the current state of RUgle, a system for
classifying and indexing papers made available on the
World Wide Web, in a domain-independent and universal
manner. By building RUgle with the most relaxed
restrictions possible on the formatting of the documents it
can process, we hope to create a system that can combine
the best features of currently available closed library
searches that are designed to facilitate academic research
with the inclusive nature of general purpose search engines
that continually crawl the web and add documents to their
indexed database
Organizing the Internet
This paper examines XML and its relationships with SGML (Standardized General Markup Language) and HTML (HyperText Markup Language). It examines the importance of metatags and the XML Document Type Definition (DTD) and proposed alternatives. It looks at the differences between the two types of XML data: “valid” and “well-formed” documents
- …