112 research outputs found
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
Selectional Restrictions in HPSG
Selectional restrictions are semantic sortal constraints imposed on the
participants of linguistic constructions to capture contextually-dependent
constraints on interpretation. Despite their limitations, selectional
restrictions have proven very useful in natural language applications, where
they have been used frequently in word sense disambiguation, syntactic
disambiguation, and anaphora resolution. Given their practical value, we
explore two methods to incorporate selectional restrictions in the HPSG theory,
assuming that the reader is familiar with HPSG. The first method employs HPSG's
Background feature and a constraint-satisfaction component pipe-lined after the
parser. The second method uses subsorts of referential indices, and blocks
readings that violate selectional restrictions during parsing. While
theoretically less satisfactory, we have found the second method particularly
useful in the development of practical systems
A Principled Framework for Constructing Natural Language Interfaces To Temporal Databases
Most existing natural language interfaces to databases (NLIDBs) were designed
to be used with ``snapshot'' database systems, that provide very limited
facilities for manipulating time-dependent data. Consequently, most NLIDBs also
provide very limited support for the notion of time. The database community is
becoming increasingly interested in _temporal_ database systems. These are
intended to store and manipulate in a principled manner information not only
about the present, but also about the past and future.
This thesis develops a principled framework for constructing English NLIDBs
for _temporal_ databases (NLITDBs), drawing on research in tense and aspect
theories, temporal logics, and temporal databases. I first explore temporal
linguistic phenomena that are likely to appear in English questions to NLITDBs.
Drawing on existing linguistic theories of time, I formulate an account for a
large number of these phenomena that is simple enough to be embodied in
practical NLITDBs. Exploiting ideas from temporal logics, I then define a
temporal meaning representation language, TOP, and I show how the HPSG grammar
theory can be modified to incorporate the tense and aspect account of this
thesis, and to map a wide range of English questions involving time to
appropriate TOP expressions. Finally, I present and prove the correctness of a
method to translate from TOP to TSQL2, TSQL2 being a temporal extension of the
SQL-92 database language. This way, I establish a sound route from English
questions involving time to a general-purpose temporal database language, that
can act as a principled framework for building NLITDBs. To demonstrate that
this framework is workable, I employ it to develop a prototype NLITDB,
implemented using ALE and Prolog.Comment: PhD thesis; 405 pages; LaTeX2e, uses the packages/macros: amstex,
xspace, avm, examples, dvips, varioref, makeidx, epic, eepic, ecltree;
postscript figures include
Deep Learning for User Comment Moderation
Experimenting with a new dataset of 1.6M user comments from a Greek news
portal and existing datasets of English Wikipedia comments, we show that an RNN
outperforms the previous state of the art in moderation. A deep,
classification-specific attention mechanism improves further the overall
performance of the RNN. We also compare against a CNN and a word-list baseline,
considering both fully automatic and semi-automatic moderation
- …