Search CORE

5 research outputs found

Recommended from our members

Syllable Neural Language Models for English Poem Generation

Author: Alonso E.
Lewis D.
Zugarini A.
Publication venue
Publication date
Field of study

Automatic Poem Generation is an ambitious Natural Language Generation (NLG) problem. Indeed, models have to replicate the precise structure of poems, rhymes, meters, while producing creative and emotional verses. Furthermore, the lack of abundant poetic corpora, especially for ancient poetry, is a serious limitation for the development of strong poem generators. In this paper, we propose a syllable neural language model to the case of English language, focusing on the generation of verses with the style of a target author: William Wordsworth. To alleviate the problem of limited available data, we exploit transfer learning. Furthermore, we bias the generation of verses according to a combination of different scoring functions based on meter, style and gram-mar in order to select lines more compliant with the author’s characteristics. The results of both quantitative and human evaluations shows the effectiveness of our approach. In particular, human judges struggle to recognize real verses from the generated ones

City Research Online

Contrastive Language-Image Pretrained Models are Zero-Shot Human Scanpath Predictors

Author: Altstidl Thomas R.
Dietz Simon
Eskofier Bjoern
Ndjeuha Mark A. Turban
Schwinn Leo
Zanca Dario
Zugarini Andrea
Publication venue
Publication date: 23/05/2023
Field of study

Understanding the mechanisms underlying human attention is a fundamental challenge for both vision science and artificial intelligence. While numerous computational models of free-viewing have been proposed, less is known about the mechanisms underlying task-driven image exploration. To address this gap, we present CapMIT1003, a database of captions and click-contingent image explorations collected during captioning tasks. CapMIT1003 is based on the same stimuli from the well-known MIT1003 benchmark, for which eye-tracking data under free-viewing conditions is available, which offers a promising opportunity to concurrently study human attention under both tasks. We make this dataset publicly available to facilitate future research in this field. In addition, we introduce NevaClip, a novel zero-shot method for predicting visual scanpaths that combines contrastive language-image pretrained (CLIP) models with biologically-inspired neural visual attention (NeVA) algorithms. NevaClip simulates human scanpaths by aligning the representation of the foveated visual stimulus and the representation of the associated caption, employing gradient-driven visual exploration to generate scanpaths. Our experimental results demonstrate that NevaClip outperforms existing unsupervised computational models of human visual attention in terms of scanpath plausibility, for both captioning and free-viewing tasks. Furthermore, we show that conditioning NevaClip with incorrect or misleading captions leads to random behavior, highlighting the significant impact of caption guidance in the decision-making process. These findings contribute to a better understanding of mechanisms that guide human attention and pave the way for more sophisticated computational approaches to scanpath prediction that can integrate direct top-down guidance of downstream tasks

arXiv.org e-Print Archive

Learning in text streams: discovery and disambiguation of entity and relation instances

Author: Maggini M.
Marra G.
Melacci S.
Zugarini A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

We consider a scenario where an artificial agent is reading a stream of text composed of a set of narrations, and it is informed about the identity of some of the individuals that are mentioned in the text portion that is currently being read. The agent is expected to learn to follow the narrations, thus disambiguating mentions and discovering new individuals. We focus on the case in which individuals are entities and relations and propose an end-to-end trainable memory network that learns to discover and disambiguate them in an online manner, performing one-shot learning and dealing with a small number of sparse supervisions. Our system builds a not-given-in-advance knowledge base, and it improves its skills while reading the unsupervised text. The model deals with abrupt changes in the narration, considering their effects when resolving coreferences. We showcase the strong disambiguation and discovery skills of our model on a corpus of Wikipedia documents and on a newly introduced data set that we make publicly available

arXiv.org e-Print Archive

Archivio della Ricerca - Università degli Studi di Siena

Linguistic Feature Injection for Efficient Natural Language Processing

Author: Diligenti M.
Fioravanti S.
Giannini F.
Maggini M.
Rigutini L.
Zugarini A.
Publication venue: place:New York
Publication date: 01/01/2023
Field of study

Transformers have been established as one of the most effective neural approach in performing various Natural Language Processing tasks. However, following common trend in modern deep architectures, their scale has quickly grown to an extent that reduces the concrete possibility for several enterprises to train such models from scratch. Indeed, despite their high-level performances, Transformers have the general drawback of requiring a huge amount of training data, computational resources and energy consumption to be successfully optimized. For this reason, more recent architectures like Bidirectional Encoder Representations from Transformers rely on unlabeled data to pre-train the model, which is later fine-tuned for a specific downstream task using a relatively smaller amount of training data. In a similar fashion, this paper considers a plug-and-play framework that can be used to inject multiple syntactic features, like Part-of-Speech Tagging or Dependency Parsing, into any kind of pre-trained Transformer. This novel approach allows to perform sequence-to-sequence labeling tasks by exploiting: (i) the (more abundant) available training data that is also used to learn the syntactic features, (ii) the language data that is used to pre-train the transformer model. The experimental results show that our approach improves over the baseline performances of the underlying model in different datasets, thus proving the effectiveness of employing syntactic language information for semantic regularization. In addition, we show that our architecture has a huge efficiency advantage over pure large language models. Indeed, by using a model with limited size, but whose input data are enriched with syntactic information, we show that it is possible to obtain a significant reduction of CO2 emissions without decreasing the prediction performances

Archivio della Ricerca - Università degli Studi di Siena

Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech Tagging

Author: Akbik Alan
Amidu Assibi Apatewon
Arvi H.
Assylbekov Zhenisbek
Auli Michael
Balodis Kaspars
Bengio Yoshua
Botha Jan
Cao Kris
Cho Kyunghyun
Choi Sanghyuk
Cicero
Collobert Ronan
Cotterell Ryan
Creutz Mathias
Devlin Jacob
Du Jiachen
Elwell R.
Gelas Hadrien
Grave Edouard
Gupta Prakhar
He Xiaodong
Heigold Georg
Huang Zhiheng
Hurskainen Arvi
Inoue Go
Joshi Ishani
Kim Yoon
Krizhevsky Alex
Lai Siwei
Lazaridou Angeliki
LeCun Yann
LeCun Yann
Ling Wang
Luong Thang
Manning Christopher D.
Marra Giuseppe
McCann Bryan
Meng Fandong
Mikolov Tomáš
Mikolov Tomáš
Mikolov Tomáš
Mikolov Tomáš
Min Seonwoo
Ng’ang’a Wanjiku
Pauw Guy De
Pauw Guy De
Pauw Guy De
Pauw Guy De
Peters Matthew E.
Qiu Siyu
Shankar Devashish
Shikali Casper S.
Shwe Sin Yi Mon
Srivastava Rupesh K.
Srivastava Rupesh Kumar
Sun Chi
Sutskever Ilya
Sèmiyou A. Adedjouma
Søgaard Anders
Vania Clara
Veres Csaba
Wieting John
Wu Huijia
Wu Yonghui
Yu Seunghak
Zaremba Wojciech
Zhao Jinman
Zhou Chunting
Zugarini Andrea
Üstün Ahmet
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref