Institute for Computational Linguistics “A. Zampolli”
ILC4CLARIN: Linguistic Data and NLP ToolNot a member yet
915 research outputs found
Sort by
CompL-it
CompL-it is a computational lexicon for Italian derived from LexicO (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-977), with the integration of following resources:
- M-GLF (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-1002), a list of lemmatized forms generated by the morphological analyzer MAGIC (Battista and Pirrelli, 1999, Pirrelli and Battista 2000);
- a set of treebanks for Italian (contained in https://lindat.cz/repository/xmlui/handle/11234/1-4611):
- ISDT;
- VIT;
- ParTUT;
- ParlaMint-it.
The resource contains a morphological layer (including lemmas, inflected forms, and morphological features) and a semantic layer (including senses and relations between them). Entries are encoded according to the OntoLex-Lemon model and made available as a semantic repository
Athloi: annotation of themes and motifs related to Iliad 23 and Odyssey 8
Annotation of themes and motifs related to Iliad 23 and Odyssey 8 through a Domain-Specific Language. The original annotations, the annotations converted in XML (with a proprietary scheme) and the CFG grammar are provided.
300 annotations have been encoded.
Further information can be asked to the Help Desk of The DiPText-KC: https://diptext-kc.clarin-it.it/helpdesk
Pan-Latin Geothermal Energy Lexicon
The Pan-Latin Geothermal Energy Lexicon (Lessico panlatino dell’energia geotermica), developed within the Realiter network, contains the basic terms related to geothermal energy in seven Romance languages (Italian, Catalan, Spanish, French, Galician, Portuguese, Romanian) and in English
EpiLexO
EpiLexO is a user friendly web application for the creation and editing of an integrated system of language resources for ancient fragmentary languages centered on the lexicon, in compliance with current digital humanities and Linked Open Data principles. EpiLexo allows for the editing of lexica with all relevant cross-references: for their linking to their testimonies, as well as to bibliographic information and other (external) resources and common vocabularies.
This front-end application rests on a Service-Oriented Architecture with two main back-end components, the LexO-server (\handle) and the CASH-server (1github), which manage lexica and textual documents respectively via Rest-ful APIs web-services, plus additional services for the management of other aspects such as access and authentication, XML rendering, etc.
All code is available on https://github.com/DigItAnt/
The application has been developed in the context of a project on the languages of fragmentary attestation of ancient Italy, but can be applied to other similar contexts
ItAntDSL
The bundle contains:
1. ANTLR Lexer and Parser for a Domain-Specific Language named ItAntDSL, compliant with the EpiDoc conceptual model, to describe inscriptions in the languages of ancient Italy (in particular Venetic and Faliscan);
2. Visitor to convert ItAntDSL in XML-ItAnt
The development of XSL(T) stylesheets to convert XML-ItAnt to XML-TEI/EpiDoc is in progres
Pan-Latin Photovoltaic Systems Lexicon
The Pan-Latin Photovoltaic Systems Lexicon (Lessico panlatino dei sistemi fotovoltaici), developed within the Realiter network, contains the basic terms related to photovoltaic systems in seven Romance languages (Italian, Catalan, Spanish, French, Galician, Portuguese, Romanian) and in English
Pan-Latin Smart City Lexicon
The Pan-Latin Smart City Lexicon (Lessico panlatino della Smart City), developed within the Realiter network, contains the basic terms related to the Smart City concept in seven Romance languages (Italian, Catalan, Spanish, French, Galician, Portuguese, Romanian) and in English
RAC - Recovery from Ana/Anorexia Corpus
RAC - Recovery from Ana/Anorexia Corpus is a collection of Italian ED-recovery community content downloaded from TikTok. It consists of 1000 videos from 27 TikTok channels (26 females and 1 male).
Given the wide variety of features and formatting styles that characterize TikTok videos, we organized the data into 4 categories:
1) "Speech-only" videos, in which the user was talking in the absence of background music and/or written text.
2) "Playback" videos, in which the user sings over a song that is played in the background.
3) "Text-only" videos, in which there is neither background music nor the users themselves speaking, but only written text.
4) "Mixed" videos, in which the above-mentioned features are present in various combinations.
"Speech-only" and "playback" videos were transcribed automatically using the Google Web Speech API. Afterward, transcriptions were manually checked. "Text-only" and "mixed" videos underwent manual transcription
Pan-Latin Lexicon of Collars and Sleeves in Fashion and Costume
The Pan-Latin Lexicon of Collars and Sleeves in Fashion and Costume, developed within the Pan-Latin Terminology Network (REALITER), aims at collecting the main terms designating collars and sleeves in fashion and costume. It proposes a semiotic reference for a common referent, in order to try to establish terminological equivalences in this very technical and specialised field, characterised by several cultural traditions. The Lexicon intends to give a multilingual (Italian, Catalan, Spanish, French, Portuguese, English) terminological description in this field, in order to provide a useful reference for those interested in this sector, those who study, translate, write and work on fashion and costume. In the case of the Spanish language, the equivalents in the Spanish of Spain, Argentina and Mexico are provided. For the Portuguese language, the Brazilian Portuguese equivalents are also given
ItaASD: Italian speech corpus Autism Spectrum Disorder
This is a corpus of semi-spontaneous speech produced by 34 children between 6 and 13 years of age, residents in the Campania region of Italy. Half of the participating children were diagnosed with high-functioning Autism Spectrum Disorder, and the other half were neurotypical children matched for age, gender, and geographical origin. All participants were administered three tasks: a complex image description task, a story-telling task, and a story-retelling task. This resulted in 4 hours and 19 minutes of recorded speech, which were then transcribed and annotated using ELAN. This research project was approved by the Bioethics Committee of the Alma Mater Studiorum - University of Bologna (no. 0173455/2022). Due to the Italian privacy policy, raw data of the corpus (i.e., speech recordings, transcriptions, and clinical information of the participants) is not available. Processed data (i.e., tables of acoustic/rhythmic/lexical/syntactic values, with the name of the speakers masked through an alphanumeric acronym to ensure anonymity) are available from the contact person upon reasonable request