967 research outputs found
A parallel corpus of Python functions and documentation strings for automated code documentation and code generation
Automated documentation of programming source code and automated code
generation from natural language are challenging tasks of both practical and
scientific interest. Progress in these areas has been limited by the low
availability of parallel corpora of code and natural language descriptions,
which tend to be small and constrained to specific domains.
In this work we introduce a large and diverse parallel corpus of a hundred
thousands Python functions with their documentation strings ("docstrings")
generated by scraping open source repositories on GitHub. We describe baseline
results for the code documentation and code generation tasks obtained by neural
machine translation. We also experiment with data augmentation techniques to
further increase the amount of training data.
We release our datasets and processing scripts in order to stimulate research
in these areas.Comment: 5 pages, 1 figure, 3 table
Psychosocial Findings in Alcohol-Dependent Patients Before and After Three Months of Total Alcohol Abstinence
Alcohol use disorders (AUDs) may be associated with several psychological and affective disorders. It is controversial, however, if these symptoms are a cause or rather a consequence of alcohol dependence. There are few data testing simultaneously psychosocial and affective disorders before and after a period of alcohol abstinence. The aim of this study was to perform multiple psychometric evaluations in alcohol-dependent patients before and after 12 weeks of abstinence. Twenty-five alcohol-dependent patients were included in the study. The following psychometric tests were administered at baseline (T0) and after 12 weeks (T1): addiction severity index (ASI), brief psychiatric rating scale (BPRS), social behavior scale (SBS), Sheehan disability scale (DISS), aggression questionnaire (AQ). At T1, 16 (64%) patients were abstinent, 5 (20%) patients dropped out and 4 (16%) patients relapsed. Compared to T0, patients totally abstinent at T1 showed a significant reduction of the scores related to BPRS, BPRS-E and its subscales (except BPRS 5), ASI 1, ASI 2, ASI 3, ASI 6, ASI 7, BSM, AQ, DISS 1, DISS 2, DISS 3 (p < 0.05). No significant changes in ASI 4, ASI 5, DISS 4, and DISS 5, BPRS 5 scores were found at T1 compared to T0. The present study indicates that total alcohol abstinence improves psychometric features, such as alcohol addiction severity, psychiatric rating, social behavior, aggressiveness, and disability. Larger controlled studies are needed to confirm these findings
Distributionally Robust Recurrent Decoders with Random Network Distillation
Neural machine learning models can successfully model language that is similar to their training distribution, but they are highly susceptible to degradation under distribution shift, which occurs in many practical applications when processing out-of-domain (OOD) text. This has been attributed to “shortcut learning”":" relying on weak correlations over arbitrary large contexts. We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to automatically disregard OOD context during inference, smoothly transitioning towards a less expressive but more robust model as the data becomes more OOD, while retaining its full context capability when operating in-distribution. We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets
Distributionally Robust Recurrent Decoders with Random Network Distillation
Neural machine learning models can successfully model language that is
similar to their training distribution, but they are highly susceptible to
degradation under distribution shift, which occurs in many practical
applications when processing out-of-domain (OOD) text. This has been attributed
to "shortcut learning": relying on weak correlations over arbitrary large
contexts.
We propose a method based on OOD detection with Random Network Distillation
to allow an autoregressive language model to automatically disregard OOD
context during inference, smoothly transitioning towards a less expressive but
more robust model as the data becomes more OOD while retaining its full context
capability when operating in-distribution. We apply our method to a GRU
architecture, demonstrating improvements on multiple language modeling (LM)
datasets.Comment: 8 pages, 1 figur
Dialogue-based generation of self-driving simulation scenarios using Large Language Models
Simulation is an invaluable tool for developing and evaluating controllers
for self-driving cars. Current simulation frameworks are driven by
highly-specialist domain specific languages, and so a natural language
interface would greatly enhance usability. But there is often a gap, consisting
of tacit assumptions the user is making, between a concise English utterance
and the executable code that captures the user's intent. In this paper we
describe a system that addresses this issue by supporting an extended
multimodal interaction: the user can follow up prior instructions with
refinements or revisions, in reaction to the simulations that have been
generated from their utterances so far. We use Large Language Models (LLMs) to
map the user's English utterances in this interaction into domain-specific
code, and so we explore the extent to which LLMs capture the context
sensitivity that's necessary for computing the speaker's intended message in
discourse.Comment: 12 pages, 6 figures, SpLU-RoboNLP 202
Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders
Current approaches to learning vector representations of text that are
compatible between different languages usually require some amount of parallel
text, aligned at word, sentence or at least document level. We hypothesize
however, that different natural languages share enough semantic structure that
it should be possible, in principle, to learn compatible vector representations
just by analyzing the monolingual distribution of words.
In order to evaluate this hypothesis, we propose a scheme to map word vectors
trained on a source language to vectors semantically compatible with word
vectors trained on a target language using an adversarial autoencoder.
We present preliminary qualitative results and discuss possible future
developments of this technique, such as applications to cross-lingual sentence
representations.Comment: 6 pages, 2 figure
Low-rank passthrough neural networks
Various common deep learning architectures, such as LSTMs, GRUs, Resnets and
Highway Networks, employ state passthrough connections that support training
with high feed-forward depth or recurrence over many time steps. These
"Passthrough Networks" architectures also enable the decoupling of the network
state size from the number of parameters of the network, a possibility has been
studied by \newcite{Sak2014} with their low-rank parametrization of the LSTM.
In this work we extend this line of research, proposing effective, low-rank and
low-rank plus diagonal matrix parametrizations for Passthrough Networks which
exploit this decoupling property, reducing the data complexity and memory
requirements of the network while preserving its memory capacity. This is
particularly beneficial in low-resource settings as it supports expressive
models with a compact parametrization less susceptible to overfitting. We
present competitive experimental results on several tasks, including language
modeling and a near state of the art result on sequential randomly-permuted
MNIST classification, a hard task on natural data.Comment: 12 pages, 2 figure
Participation of the 39-kDa glycoprotein (gp39) of the vitelline envelope of Bufo arenarum eggs in sperm-egg interaction
The acquisition of egg fertilizability in Bufo arenarum takes place during the oviductal transit and during this process the extracellular coelomic envelope (CE) of the eggs is converted into the vitelline envelope (VE). It has been stated that one of the necessary events leading to a fertilizable state is the proteolytic cleavage of CE glycoproteins in the oviductal pars recta by oviductin, a serine protease. Consequently, there is a marked increase in the relative quantity of glycoproteins with 39 (gp39) and 42-kDa (gp42) in the VE. In the present study, sperm-VE binding assays using heat-solubilized biotin-conjugated VE glycoproteins revealed that both gp39 and gp42 have sperm binding capacity. According to this result, our study was focused on gp39, a glycoprotein that we have previously reported as a homologue of mammalian ZPC. For this purpose, rabbit polyclonal antibodies against gp39 were generated at our laboratory. The specificity of the antibodies was confirmed with Western blot of VE glycoproteins separated on SDS-PAGE. Immunohistochemical and immunoelectron studies showed gp39 distributed throughout the width of the VE. In addition, immunofluorescence assays probed that gp39 bound to the sperm head. Finally, as an approach to elucidate the possible involvement of gp39 in fertilization, inhibition assays showed that pretreatment of eggs with antibodies against gp39 generated a significant decrease in the fertilization rate. Therefore, our findings suggest that gp39, which is modified by oviductal action, participates as a VE glycoprotein ligand for sperm in Bufo arenarum fertilization.Fil: Barrera, Antonio Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto Superior de Investigaciones Biológicas. Universidad Nacional de Tucumán. Instituto Superior de Investigaciones Biológicas; ArgentinaFil: Llanos, Ricardo Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto Superior de Investigaciones Biológicas. Universidad Nacional de Tucumán. Instituto Superior de Investigaciones Biológicas; ArgentinaFil: Miceli, Dora Cristina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán. Instituto Superior de Investigaciones Biológicas. Universidad Nacional de Tucumán. Instituto Superior de Investigaciones Biológicas; Argentin
- …