22 research outputs found
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
The automatic processing of multiword expressions in Irish
It is well-documented that Multiword Expressions (MWEs) pose a unique challenge
to a variety of NLP tasks such as machine translation, parsing, information retrieval,
and more. For low-resource languages such as Irish, these challenges can be exacerbated by the scarcity of data, and a lack of research in this topic. In order to
improve handling of MWEs in various NLP tasks for Irish, this thesis will address
both the lack of resources specifically targeting MWEs in Irish, and examine how
these resources can be applied to said NLP tasks.
We report on the creation and analysis of a number of lexical resources as part
of this PhD research. Ilfhocail, a lexicon of Irish MWEs, is created through extract-
ing MWEs from other lexical resources such as dictionaries. A corpus annotated
with verbal MWEs in Irish is created for the inclusion of Irish in the PARSEME
Shared Task 1.2. Additionally, MWEs were tagged in a bilingual EN-GA corpus
for inclusion in experiments in machine translation. For the purposes of annotation, a categorisation scheme for nine categories of MWEs in Irish is created, based
on combining linguistic analysis on these types of constructions and cross-lingual
frameworks for defining MWEs.
A case study in applying MWEs to NLP tasks is undertaken, with the exploration of incorporating MWE information while training Neural Machine Translation
systems. Finally, the topic of automatic identification of Irish MWEs is explored,
documenting the training of a system capable of automatically identifying Irish
MWEs from a variety of categories, and the challenges associated with developing
such a system.
This research contributes towards a greater understanding of Irish MWEs and
their applications in NLP, and provides a foundation for future work in exploring
other methods for the automatic discovery and identification of Irish MWEs, and
further developing the MWE resources described above
Tune your brown clustering, please
Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal
Irish dependency treebanking and parsing
Despite enjoying the status of an official EU language, Irish is considered a minority language. As with most minority languages, it is a `low-density' language, which means it lacks important linguistic and Natural Language Processing (NLP) resources. Relative to better-resourced languages such as English or French, for example, little research has been carried out on computational analysis or processing of Irish. Parsing is the method of analysing the linguistic structure of text, and it is an invaluable processing step that is required for many different types of language technology applications. As a verb-initial language, Irish has several features that are uncharacteristic of many languages previously studied in parsing research. Our work broadens the application of NLP methods to less studied language structures and provides a basis on which future work in Irish NLP is possible. We report on the development of a dependency treebank that serves as training data for the first full Irish dependency parser. We discuss the linguistic structures of Irish, and the motivation behind the design of our annotation scheme. Our work also examines various methods of employing semi-automated approaches to treebank development. We overcome the relatively small pool of linguistic and technological resources available for the Irish language with these approaches, and show that even in early stages of development, parsing results for Irish are promising. What counts as a sufficient number of trees for training a parser varies according to languages. Through empirical methods, we explore the impact our treebank's size and content has on parsing accuracy for Irish. We also discuss our work in crosslingual studies through converting our treebank to a universal annotation scheme. Finally we extend our Irish NLP work to the unstructured user-generated text of Irish tweets. We report on the creation of a POS-tagged corpus of Irish tweets and the training of statistical POS-tagging models. We show how existing resources can be leveraged for this domain-adapted resource development
Hizkuntza-ulermenari ekarpenak: N-gramen arteko atentzio eta lerrokatzeak antzekotasun eta inferentzia interpretagarrirako.
148 p.Hizkuntzaren Prozesamenduaren bitartez hezkuntzaren alorreko sistemaadimendunak hobetzea posible da, ikasleen eta irakasleen lan-karganabarmenki arinduz. Tesi honetan esaldi-mailako hizkuntza-ulermena aztertueta proposamen berrien bitartez sistema adimendunen hizkuntza-ulermenaareagotzen dugu, sistemei erabiltzailearen esaldiak modu zehatzagoaninterpretatzeko gaitasuna emanez. Esaldiak modu finean interpretatzekogaitasunak feedbacka modu automatikoan sortzeko aukera ematen baitu.Tesi hau garatzeko hizkuntza-ulermenean sakondu dugu antzekotasunsemantikoari eta inferentzia logikoari dagokien ezaugarriak eta sistemakaztertuz. Bereziki, esaldi barneko hitzak multzotan egituratuz eta lerrokatuzesaldiak hobeto modelatu daitezkeela erakutsi dugu. Horretarako, hitz solteaklerrokatzen dituen aurrekarien egoerako neurona-sare sistema batinplementatu eta n-grama arbitrarioak lerrokatzeko moldaketak egin ditugu.Hitzen arteko lerrokatzea aspalditik ezaguna bada ere, tesi honek, lehen aldiz,n-grama arbitrarioak atentzio-mekanismo baten bitartez lerrokatzekoproposamenak plazaratzen ditu.Gainera, esaldien arteko antzekotasunak eta desberdintasunak moduzehatzean identifikatzeko, esaldien interpretagarritasuna areagotzeko etaikasleei feedback zehatza emateko geruza berri bat sortu dugu: iSTS.Antzekotasun semantikoa eta inferentzia logikoa biltzen dituen geruzahorrekin chunkak lerrokatu ditugu, eta ikasleei feedback zehatza emateko gaiizan garela frogatu dugu hezkuntzaren testuinguruko bi ebaluazioeszenariotan.Tesi honekin batera hainbat sistema eta datu-multzo argitaratu diraetorkizunean komunitate zientifikoak ikertzen jarrai dezan
EVALITA Evaluation of NLP and Speech Tools for Italian Proceedings of the Final Workshop
Editor of the proceedings of EVALITA 2016