Institute for Computational Linguistics “A. Zampolli”

ILC4CLARIN: Linguistic Data and NLP Tool
Not a member yet
    915 research outputs found

    CompL-it

    No full text
    CompL-it is a computational lexicon for Italian derived from LexicO (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-977), with the integration of following resources: - M-GLF (https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-1002), a list of lemmatized forms generated by the morphological analyzer MAGIC (Battista and Pirrelli, 1999, Pirrelli and Battista 2000); - a set of treebanks for Italian (contained in https://lindat.cz/repository/xmlui/handle/11234/1-4611): - ISDT; - VIT; - ParTUT; - ParlaMint-it. The resource contains a morphological layer (including lemmas, inflected forms, and morphological features) and a semantic layer (including senses and relations between them). Entries are encoded according to the OntoLex-Lemon model and made available as a semantic repository

    Athloi: annotation of themes and motifs related to Iliad 23 and Odyssey 8

    No full text
    Annotation of themes and motifs related to Iliad 23 and Odyssey 8 through a Domain-Specific Language. The original annotations, the annotations converted in XML (with a proprietary scheme) and the CFG grammar are provided. 300 annotations have been encoded. Further information can be asked to the Help Desk of The DiPText-KC: https://diptext-kc.clarin-it.it/helpdesk

    Pan-Latin Geothermal Energy Lexicon

    No full text
    The Pan-Latin Geothermal Energy Lexicon (Lessico panlatino dell’energia geotermica), developed within the Realiter network, contains the basic terms related to geothermal energy in seven Romance languages (Italian, Catalan, Spanish, French, Galician, Portuguese, Romanian) and in English

    EpiLexO

    No full text
    EpiLexO is a user friendly web application for the creation and editing of an integrated system of language resources for ancient fragmentary languages centered on the lexicon, in compliance with current digital humanities and Linked Open Data principles. EpiLexo allows for the editing of lexica with all relevant cross-references: for their linking to their testimonies, as well as to bibliographic information and other (external) resources and common vocabularies. This front-end application rests on a Service-Oriented Architecture with two main back-end components, the LexO-server (\handle) and the CASH-server (1github), which manage lexica and textual documents respectively via Rest-ful APIs web-services, plus additional services for the management of other aspects such as access and authentication, XML rendering, etc. All code is available on https://github.com/DigItAnt/ The application has been developed in the context of a project on the languages of fragmentary attestation of ancient Italy, but can be applied to other similar contexts

    ItAntDSL

    No full text
    The bundle contains: 1. ANTLR Lexer and Parser for a Domain-Specific Language named ItAntDSL, compliant with the EpiDoc conceptual model, to describe inscriptions in the languages of ancient Italy (in particular Venetic and Faliscan); 2. Visitor to convert ItAntDSL in XML-ItAnt The development of XSL(T) stylesheets to convert XML-ItAnt to XML-TEI/EpiDoc is in progres

    Pan-Latin Photovoltaic Systems Lexicon

    No full text
    The Pan-Latin Photovoltaic Systems Lexicon (Lessico panlatino dei sistemi fotovoltaici), developed within the Realiter network, contains the basic terms related to photovoltaic systems in seven Romance languages (Italian, Catalan, Spanish, French, Galician, Portuguese, Romanian) and in English

    Pan-Latin Smart City Lexicon

    No full text
    The Pan-Latin Smart City Lexicon (Lessico panlatino della Smart City), developed within the Realiter network, contains the basic terms related to the Smart City concept in seven Romance languages (Italian, Catalan, Spanish, French, Galician, Portuguese, Romanian) and in English

    RAC - Recovery from Ana/Anorexia Corpus

    No full text
    RAC - Recovery from Ana/Anorexia Corpus is a collection of Italian ED-recovery community content downloaded from TikTok. It consists of 1000 videos from 27 TikTok channels (26 females and 1 male). Given the wide variety of features and formatting styles that characterize TikTok videos, we organized the data into 4 categories: 1) "Speech-only" videos, in which the user was talking in the absence of background music and/or written text. 2) "Playback" videos, in which the user sings over a song that is played in the background. 3) "Text-only" videos, in which there is neither background music nor the users themselves speaking, but only written text. 4) "Mixed" videos, in which the above-mentioned features are present in various combinations. "Speech-only" and "playback" videos were transcribed automatically using the Google Web Speech API. Afterward, transcriptions were manually checked. "Text-only" and "mixed" videos underwent manual transcription

    Pan-Latin Lexicon of Collars and Sleeves in Fashion and Costume

    No full text
    The Pan-Latin Lexicon of Collars and Sleeves in Fashion and Costume, developed within the Pan-Latin Terminology Network (REALITER), aims at collecting the main terms designating collars and sleeves in fashion and costume. It proposes a semiotic reference for a common referent, in order to try to establish terminological equivalences in this very technical and specialised field, characterised by several cultural traditions. The Lexicon intends to give a multilingual (Italian, Catalan, Spanish, French, Portuguese, English) terminological description in this field, in order to provide a useful reference for those interested in this sector, those who study, translate, write and work on fashion and costume. In the case of the Spanish language, the equivalents in the Spanish of Spain, Argentina and Mexico are provided. For the Portuguese language, the Brazilian Portuguese equivalents are also given

    ItaASD: Italian speech corpus Autism Spectrum Disorder

    No full text
    This is a corpus of semi-spontaneous speech produced by 34 children between 6 and 13 years of age, residents in the Campania region of Italy. Half of the participating children were diagnosed with high-functioning Autism Spectrum Disorder, and the other half were neurotypical children matched for age, gender, and geographical origin. All participants were administered three tasks: a complex image description task, a story-telling task, and a story-retelling task. This resulted in 4 hours and 19 minutes of recorded speech, which were then transcribed and annotated using ELAN. This research project was approved by the Bioethics Committee of the Alma Mater Studiorum - University of Bologna (no. 0173455/2022). Due to the Italian privacy policy, raw data of the corpus (i.e., speech recordings, transcriptions, and clinical information of the participants) is not available. Processed data (i.e., tables of acoustic/rhythmic/lexical/syntactic values, with the name of the speakers masked through an alphanumeric acronym to ensure anonymity) are available from the contact person upon reasonable request

    1

    full texts

    915

    metadata records
    Updated in last 30 days.
    ILC4CLARIN: Linguistic Data and NLP Tool is based in Italy
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇