30,194 research outputs found
Sentence Completion Tests in a Virtual Laboratory
This paper describes a type of on-line test, the Sentence Completion Test (SCT), that tries to fill the gap between rigid MC tests and unreliable automatic essay grading approaches. We give a short overview of the main concepts, the implementation and show examplary use and applications. SCTs are used as one component in a fully operational virtual laboratory of Computational Linguistics in use at the University of Zurich
Hybrid Model For Word Prediction Using Naive Bayes and Latent Information
Historically, the Natural Language Processing area has been given too much
attention by many researchers. One of the main motivation beyond this interest
is related to the word prediction problem, which states that given a set words
in a sentence, one can recommend the next word. In literature, this problem is
solved by methods based on syntactic or semantic analysis. Solely, each of
these analysis cannot achieve practical results for end-user applications. For
instance, the Latent Semantic Analysis can handle semantic features of text,
but cannot suggest words considering syntactical rules. On the other hand,
there are models that treat both methods together and achieve state-of-the-art
results, e.g. Deep Learning. These models can demand high computational effort,
which can make the model infeasible for certain types of applications. With the
advance of the technology and mathematical models, it is possible to develop
faster systems with more accuracy. This work proposes a hybrid word suggestion
model, based on Naive Bayes and Latent Semantic Analysis, considering
neighbouring words around unfilled gaps. Results show that this model could
achieve 44.2% of accuracy in the MSR Sentence Completion Challenge
A Novel ILP Framework for Summarizing Content with High Lexical Variety
Summarizing content contributed by individuals can be challenging, because
people make different lexical choices even when describing the same events.
However, there remains a significant need to summarize such content. Examples
include the student responses to post-class reflective questions, product
reviews, and news articles published by different news agencies related to the
same events. High lexical diversity of these documents hinders the system's
ability to effectively identify salient content and reduce summary redundancy.
In this paper, we overcome this issue by introducing an integer linear
programming-based summarization framework. It incorporates a low-rank
approximation to the sentence-word co-occurrence matrix to intrinsically group
semantically-similar lexical items. We conduct extensive experiments on
datasets of student responses, product reviews, and news documents. Our
approach compares favorably to a number of extractive baselines as well as a
neural abstractive summarization system. The paper finally sheds light on when
and why the proposed framework is effective at summarizing content with high
lexical variety.Comment: Accepted for publication in the journal of Natural Language
Engineering, 201
- …