108,926 research outputs found
A Data-Oriented Model of Literary Language
We consider the task of predicting how literary a text is, with a gold
standard from human ratings. Aside from a standard bigram baseline, we apply
rich syntactic tree fragments, mined from the training set, and a series of
hand-picked features. Our model is the first to distinguish degrees of highly
and less literary novels using a variety of lexical and syntactic features, and
explains 76.0 % of the variation in literary ratings.Comment: To be published in EACL 2017, 11 page
Towards a flexible open-source software library for multi-layered scholarly textual studies: An Arabic case study dealing with semi-automatic language processing
This paper presents both the general model and a case study of the Computational and Collaborative Philology Library (CoPhiLib), an ongoing initiative underway at the Institute for Computational Linguistics (ILC) of the National Research Council (CNR), Pisa, Italy. The library, designed and organized as a reusable, abstract and open-source software component, aims at solving the needs of multi-lingual and cross-lingual analysis by exposing common Application Programming Interfaces (APIs). The core modules, coded by the Java programming language, constitute the groundwork of a Web platform designed to deal with textual scholarly needs. The Web application, implemented according to the Java Enterprise specifications, focuses on multi-layered analysis for the study of literary documents and related multimedia sources. This ambitious challenge seeks to obtain the management of textual resources, on the one hand by abstracting from current language, on the other hand by decoupling from the specific requirements of single projects. This goal is achieved thanks to methodologies declared by the 'agile process', and by putting into effect suitable use case modeling, design patterns, and component-based architectures. The reusability and flexibility of the system have been tested on an Arabic case study: the system allows users to choose the morphological engine (such as AraMorph or Al-Khalil), along with linguistic granularity (i.e. with or without declension). Finally, the application enables the construction of annotated resources for further statistical engines (training set). © 2014 IEEE
LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible
The Linguistic Annotation Framework (LAF) provides a general, extensible
stand-off markup system for corpora. This paper discusses LAF-Fabric, a new
tool to analyse LAF resources in general with an extension to process the
Hebrew Bible in particular. We first walk through the history of the Hebrew
Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric
may serve as an analysis tool for this corpus. Finally, we describe three
analytic projects/workflows that benefit from the new LAF representation:
1) the study of linguistic variation: extract cooccurrence data of common
nouns between the books of the Bible (Martijn Naaijer); 2) the study of the
grammar of Hebrew poetry in the Psalms: extract clause typology (Gino Kalkman);
3) construction of a parser of classical Hebrew by Data Oriented Parsing:
generate tree structures from the database (Andreas van Cranenburgh)
- …