7,343 research outputs found
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
We present TriviaQA, a challenging reading comprehension dataset containing
over 650K question-answer-evidence triples. TriviaQA includes 95K
question-answer pairs authored by trivia enthusiasts and independently gathered
evidence documents, six per question on average, that provide high quality
distant supervision for answering the questions. We show that, in comparison to
other recently introduced large-scale datasets, TriviaQA (1) has relatively
complex, compositional questions, (2) has considerable syntactic and lexical
variability between questions and corresponding answer-evidence sentences, and
(3) requires more cross sentence reasoning to find answers. We also present two
baseline algorithms: a feature-based classifier and a state-of-the-art neural
network, that performs well on SQuAD reading comprehension. Neither approach
comes close to human performance (23% and 40% vs. 80%), suggesting that
TriviaQA is a challenging testbed that is worth significant future study. Data
and code available at -- http://nlp.cs.washington.edu/triviaqa/Comment: Added references, fixed typos, minor baseline updat
On Constructing a Knowledge Base of Chinese Criminal Cases
We are developing a knowledge base over Chinese judicial decision documents
to facilitate landscape analyses of Chinese Criminal Cases. We view judicial
decision documents as a mixed-granularity semi-structured text where different
levels of the text carry different semantic constructs and entailments. We use
a combination of context-sensitive grammar, dependency parsing and discourse
analysis to extract a formal and interpretable representation of these
documents. Our knowledge base is developed by constructing associations between
different elements of these documents. The interpretability is contributed in
part by our formal representation of the Chinese criminal laws, also as
semi-structured documents. The landscape analyses utilize these two
representations and enable a law researcher to ask legal pattern analysis
queries.Comment: submitted to JURIX 201
Morphologically complex words in L1 and L2 processing: Evidence from masked priming experiments in English
This paper reports results from masked priming experiments investigating regular past-tense forms and deadjectival nominalizations with -ness and -ity in adult native (L1) speakers of English and in different groups of advanced adult second language (L2) learners of English. While the L1 group showed efficient priming for both inflected and derived word forms, the L2 learners demonstrated repetition-priming effects (like the L1 group), but no priming for inflected and reduced priming for derived word forms. We argue that this striking contrast between L1 and L2 processing supports the view that adult L2 learners rely more on lexical storage and less on combinatorial processing of morphologically complex words than native speakers.</jats:p
Crossings as a side effect of dependency lengths
The syntactic structure of sentences exhibits a striking regularity:
dependencies tend to not cross when drawn above the sentence. We investigate
two competing explanations. The traditional hypothesis is that this trend
arises from an independent principle of syntax that reduces crossings
practically to zero. An alternative to this view is the hypothesis that
crossings are a side effect of dependency lengths, i.e. sentences with shorter
dependency lengths should tend to have fewer crossings. We are able to reject
the traditional view in the majority of languages considered. The alternative
hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in
Complexity (Wiley
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
- …