10,179 research outputs found
An application of distributional semantics for the analysis of the Holy Quran
In this contribution we illustrate the methodology and the results of an experiment we conducted by applying Distributional Semantics Models to the analysis of the Holy Quran. Our aim was to gather information on the potential differences in meanings that the same words might take on when used in Modern Standard Arabic w.r.t. their usage in the Quran. To do so we used the Penn Arabic Treebank as a contrastive corpu
Enriching Existing Test Collections with OXPath
Extending TREC-style test collections by incorporating external resources is
a time consuming and challenging task. Making use of freely available web data
requires technical skills to work with APIs or to create a web scraping program
specifically tailored to the task at hand. We present a light-weight
alternative that employs the web data extraction language OXPath to harvest
data to be added to an existing test collection from web resources. We
demonstrate this by creating an extended version of GIRT4 called GIRT4-XT with
additional metadata fields harvested via OXPath from the social sciences portal
Sowiport. This allows the re-use of this collection for other evaluation
purposes like bibliometrics-enhanced retrieval. The demonstrated method can be
applied to a variety of similar scenarios and is not limited to extending
existing collections but can also be used to create completely new ones with
little effort.Comment: Experimental IR Meets Multilinguality, Multimodality, and Interaction
- 8th International Conference of the CLEF Association, CLEF 2017, Dublin,
Ireland, September 11-14, 201
A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication
The strength with which a statement is made can have a significant impact on
the audience. For example, international relations can be strained by how the
media in one country describes an event in another; and papers can be rejected
because they overstate or understate their findings. It is thus important to
understand the effects of statement strength. A first step is to be able to
distinguish between strong and weak statements. However, even this problem is
understudied, partly due to a lack of data. Since strength is inherently
relative, revisions of texts that make claims are a natural source of data on
strength differences. In this paper, we introduce a corpus of sentence-level
revisions from academic writing. We also describe insights gained from our
annotation efforts for this task.Comment: 6 pages, to appear in Proceedings of ACL 2014 (short paper
- …