2,584 research outputs found
What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
In this paper, we claim that Vector Cosine, which is generally considered one
of the most efficient unsupervised measures for identifying word similarity in
Vector Space Models, can be outperformed by a completely unsupervised measure
that evaluates the extent of the intersection among the most associated
contexts of two target words, weighting such intersection according to the rank
of the shared contexts in the dependency ranked lists. This claim comes from
the hypothesis that similar words do not simply occur in similar contexts, but
they share a larger portion of their most relevant contexts compared to other
related words. To prove it, we describe and evaluate APSyn, a variant of
Average Precision that, independently of the adopted parameters, outperforms
the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the
best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy
in the TOEFL dataset, beating therefore the non-English US college applicants
(whose average, as reported in the literature, is 64.50%) and several
state-of-the-art approaches.Comment: in LREC 201
Lessons Learned from EVALITA 2020 and Thirteen Years of Evaluation of Italian Language Technology
This paper provides a summary of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA2020) which was held online on December 17th, due to the 2020 COVID-19 pandemic. The 2020 edition of Evalita included 14 different tasks belonging to five research areas, namely: (i) Affect, Hate, and Stance, (ii) Creativity and Style, (iii) New Challenges in Long-standing Tasks, (iv) Semantics and Multimodality, (v) Time and Diachrony. This paper provides a description of the tasks and the key findings from the analysis of participant outcomes. Moreover, it provides a detailed analysis of the participants and task organizers which demonstrates the growing interest with respect to this campaign. Finally, a detailed analysis of the evaluation of tasks across the past seven editions is provided; this allows to assess how the research carried out by the Italian community dealing with Computational Linguistics has evolved in terms of popular tasks and paradigms during the last 13 years
Learning Greek and Latin Through Digital Annotation: The EuporiaEDU System
Gloria Mugelli, Giulia Re, Andrea Taddei & Federico Boschetti describe the 'EphoriaEDU' system, a resource for digital annotation of ancient texts developed by the Lab. of Anthropology of Ancient Greece (LAMA), the CoPhiLab at the ILC-CNR in Pisa and the Venice Digital and Public Humanities Department. The system allows to structure textual information by connecting keywords and creating networks of concepts such as ritual actions in Greek Tragedy. It is applicable to all kinds of linguistic or cultural observations, allowing a wide range of collaboration between teachers and students from high school to university
Recommended from our members
Keywords of written reflection - a comparison between reflective and descriptive datasets
This study investigates reflection keywords by contrasting two datasets, one of reflective sentences and another of descriptive sentences. The log-likelihood statistic reveals several reflection keywords that are discussed in the context of a model for reflective writing. These keywords are seen as a useful building block for tools that can automatically analyse reflection in texts
VerbAtlas: a novel large-scale verbal semantic resource and its application to semantic role labeling
We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments.
We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org
Models to represent linguistic linked data
As the interest of the Semantic Web and computational linguistics communities in linguistic linked data (LLD) keeps increasing and the number of contributions that dwell on LLD rapidly grows, scholars (and linguists in particular) interested in the development of LLD resources sometimes find it difficult to determine which mechanism is suitable for their needs and which challenges have already been addressed. This review seeks to present the state of the art on the models, ontologies and their extensions to represent language resources as LLD by focusing on the nature of the linguistic content they aim to encode. Four basic groups of models are distinguished in this work: models to represent the main elements of lexical resources (group 1), vocabularies developed as extensions to models in group 1 and ontologies that provide more granularity on specific levels of linguistic analysis (group 2), catalogues of linguistic data categories (group 3) and other models such as corpora models or service-oriented ones (group 4). Contributions encompassed in these four groups are described, highlighting their reuse by the community and the modelling challenges that are still to be faced
Not just paper: enhancement of archive cultural heritage
Oral archives and digital technologies have gone hand-in-hand for a very long time. Both sides benefit from this interdisciplinary junction: technology enhances the preservation and diffusion of oral materials, while exploiting them to develop cutting-edge tools for their treatment. This chapter deals with an Italian instantiation of this mutual relationship: the Archivio Vi.Vo. project. Offering innovative solutions concerning metadata, audio restoration, description , and access, Archivio Vi.Vo. aims to build an online platform to host the oral archives from Tuscany. The project is powered by CLARIN-IT, which guarantees its compliance with standards and offers resources for data access and discov-erability. Archivio Vi.Vo. has not been built from scratch: it is instead a cross-fertilization of previous initiatives and research projects (e.g., the Gra.fo project). Moreover, the chapter presents the related, contemporary work of a multidisciplinary group striving to synthesize a Vademecum for future generations of oral archive researchers. Lastly, a brief list of tentative ideas for future developments of the Archivio Vi.Vo. platform will be presented
- âŠ