Search CORE

967 research outputs found

A parallel corpus of Python functions and documentation strings for automated code documentation and code generation

Author: Barone Antonio Valerio Miceli
Sennrich Rico
Publication venue
Publication date: 07/07/2017
Field of study

Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of parallel corpora of code and natural language descriptions, which tend to be small and constrained to specific domains. In this work we introduce a large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings ("docstrings") generated by scraping open source repositories on GitHub. We describe baseline results for the code documentation and code generation tasks obtained by neural machine translation. We also experiment with data augmentation techniques to further increase the amount of training data. We release our datasets and processing scripts in order to stimulate research in these areas.Comment: 5 pages, 1 figure, 3 table

arXiv.org e-Print Archive

Edinburgh Research Explorer

Copied Monolingual Data Improves Low-Resource Neural Machine Translation

Author: Currey Anna
Heafield Kenneth
Miceli Barone Antonio
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Crossref

Edinburgh Research Explorer

Psychosocial Findings in Alcohol-Dependent Patients Before and After Three Months of Total Alcohol Abstinence

Author: Anna Ferrulli
Antonio Miceli
Antonio Miceli
Antonio Mirijello
Cristina D'Angelo
Giovanni Addolorato
Giovanni Gasbarrini
Lorenzo Leggio
Lorenzo Leggio
Luisa Vonghia
Silvia Cardone
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2010
Field of study

Alcohol use disorders (AUDs) may be associated with several psychological and affective disorders. It is controversial, however, if these symptoms are a cause or rather a consequence of alcohol dependence. There are few data testing simultaneously psychosocial and affective disorders before and after a period of alcohol abstinence. The aim of this study was to perform multiple psychometric evaluations in alcohol-dependent patients before and after 12 weeks of abstinence. Twenty-five alcohol-dependent patients were included in the study. The following psychometric tests were administered at baseline (T0) and after 12 weeks (T1): addiction severity index (ASI), brief psychiatric rating scale (BPRS), social behavior scale (SBS), Sheehan disability scale (DISS), aggression questionnaire (AQ). At T1, 16 (64%) patients were abstinent, 5 (20%) patients dropped out and 4 (16%) patients relapsed. Compared to T0, patients totally abstinent at T1 showed a significant reduction of the scores related to BPRS, BPRS-E and its subscales (except BPRS 5), ASI 1, ASI 2, ASI 3, ASI 6, ASI 7, BSM, AQ, DISS 1, DISS 2, DISS 3 (p < 0.05). No significant changes in ASI 4, ASI 5, DISS 4, and DISS 5, BPRS 5 scores were found at T1 compared to T0. The present study indicates that total alcohol abstinence improves psychometric features, such as alcohol addiction severity, psychiatric rating, social behavior, aggressiveness, and disability. Larger controlled studies are needed to confirm these findings

Crossref

Directory of Open Access Journals

PubMed Central

Distributionally Robust Recurrent Decoders with Random Network Distillation

Author: Birch Alexandra
Miceli Barone Antonio Valerio
Sennrich Rico
Publication venue: Association for Computational Linguistics
Publication date: 26/05/2022
Field of study

Neural machine learning models can successfully model language that is similar to their training distribution, but they are highly susceptible to degradation under distribution shift, which occurs in many practical applications when processing out-of-domain (OOD) text. This has been attributed to “shortcut learning”":" relying on weak correlations over arbitrary large contexts. We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to automatically disregard OOD context during inference, smoothly transitioning towards a less expressive but more robust model as the data becomes more OOD, while retaining its full context capability when operating in-distribution. We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets

ZORA

Distributionally Robust Recurrent Decoders with Random Network Distillation

Author: Birch Alexandra
Miceli-Barone Antonio Valerio
Sennrich Rico
Publication venue
Publication date: 24/04/2022
Field of study

Neural machine learning models can successfully model language that is similar to their training distribution, but they are highly susceptible to degradation under distribution shift, which occurs in many practical applications when processing out-of-domain (OOD) text. This has been attributed to "shortcut learning": relying on weak correlations over arbitrary large contexts. We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to automatically disregard OOD context during inference, smoothly transitioning towards a less expressive but more robust model as the data becomes more OOD while retaining its full context capability when operating in-distribution. We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.Comment: 8 pages, 1 figur

arXiv.org e-Print Archive

Edinburgh Research Explorer

ZORA

Dialogue-based generation of self-driving simulation scenarios using Large Language Models

Author: Innes Craig
Lascarides Alex
Miceli-Barone Antonio Valerio
Publication venue
Publication date: 26/10/2023
Field of study

Simulation is an invaluable tool for developing and evaluating controllers for self-driving cars. Current simulation frameworks are driven by highly-specialist domain specific languages, and so a natural language interface would greatly enhance usability. But there is often a gap, consisting of tacit assumptions the user is making, between a concise English utterance and the executable code that captures the user's intent. In this paper we describe a system that addresses this issue by supporting an extended multimodal interaction: the user can follow up prior instructions with refinements or revisions, in reaction to the simulations that have been generated from their utterances so far. We use Large Language Models (LLMs) to map the user's English utterances in this interaction into domain-specific code, and so we explore the extent to which LLMs capture the context sensitivity that's necessary for computing the speaker's intended message in discourse.Comment: 12 pages, 6 figures, SpLU-RoboNLP 202

arXiv.org e-Print Archive

Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders

Author: Miceli Barone Antonio
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Current approaches to learning vector representations of text that are compatible between different languages usually require some amount of parallel text, aligned at word, sentence or at least document level. We hypothesize however, that different natural languages share enough semantic structure that it should be possible, in principle, to learn compatible vector representations just by analyzing the monolingual distribution of words. In order to evaluate this hypothesis, we propose a scheme to map word vectors trained on a source language to vectors semantically compatible with word vectors trained on a target language using an adversarial autoencoder. We present preliminary qualitative results and discuss possible future developments of this technique, such as applications to cross-lingual sentence representations.Comment: 6 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Low-rank passthrough neural networks

Author: Miceli Barone Antonio
Publication venue
Publication date: 01/01/2018
Field of study

Various common deep learning architectures, such as LSTMs, GRUs, Resnets and Highway Networks, employ state passthrough connections that support training with high feed-forward depth or recurrence over many time steps. These "Passthrough Networks" architectures also enable the decoupling of the network state size from the number of parameters of the network, a possibility has been studied by \newcite{Sak2014} with their low-rank parametrization of the LSTM. In this work we extend this line of research, proposing effective, low-rank and low-rank plus diagonal matrix parametrizations for Passthrough Networks which exploit this decoupling property, reducing the data complexity and memory requirements of the network while preserving its memory capacity. This is particularly beneficial in low-resource settings as it supports expressive models with a compact parametrization less susceptible to overfitting. We present competitive experimental results on several tasks, including language modeling and a near state of the art result on sequential randomly-permuted MNIST classification, a hard task on natural data.Comment: 12 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Translation reranking using source phrase dependency features

Author: Miceli Barone Antonio
Publication venue
Publication date: 01/06/2015
Field of study

Edinburgh Research Explorer

Participation of the 39-kDa glycoprotein (gp39) of the vitelline envelope of Bufo arenarum eggs in sperm-egg interaction

Author: Barrera Antonio Daniel
Llanos Ricardo Javier
Miceli Dora Cristina
Publication venue: Cambridge University Press
Publication date: 01/05/2012
Field of study

The acquisition of egg fertilizability in Bufo arenarum takes place during the oviductal transit and during this process the extracellular coelomic envelope (CE) of the eggs is converted into the vitelline envelope (VE). It has been stated that one of the necessary events leading to a fertilizable state is the proteolytic cleavage of CE glycoproteins in the oviductal pars recta by oviductin, a serine protease. Consequently, there is a marked increase in the relative quantity of glycoproteins with 39 (gp39) and 42-kDa (gp42) in the VE. In the present study, sperm-VE binding assays using heat-solubilized biotin-conjugated VE glycoproteins revealed that both gp39 and gp42 have sperm binding capacity. According to this result, our study was focused on gp39, a glycoprotein that we have previously reported as a homologue of mammalian ZPC. For this purpose, rabbit polyclonal antibodies against gp39 were generated at our laboratory. The specificity of the antibodies was confirmed with Western blot of VE glycoproteins separated on SDS-PAGE. Immunohistochemical and immunoelectron studies showed gp39 distributed throughout the width of the VE. In addition, immunofluorescence assays probed that gp39 bound to the sperm head. Finally, as an approach to elucidate the possible involvement of gp39 in fertilization, inhibition assays showed that pretreatment of eggs with antibodies against gp39 generated a significant decrease in the fertilization rate. Therefore, our findings suggest that gp39, which is modified by oviductal action, participates as a VE glycoprotein ligand for sperm in Bufo arenarum fertilization.Fil: Barrera, Antonio Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto Superior de Investigaciones Biológicas. Universidad Nacional de Tucumán. Instituto Superior de Investigaciones Biológicas; ArgentinaFil: Llanos, Ricardo Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto Superior de Investigaciones Biológicas. Universidad Nacional de Tucumán. Instituto Superior de Investigaciones Biológicas; ArgentinaFil: Miceli, Dora Cristina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto Superior de Investigaciones Biológicas. Universidad Nacional de Tucumán. Instituto Superior de Investigaciones Biológicas; Argentin

CONICET Digital