25 research outputs found
Universal Captioner: Inducing Content-Style Separation in Vision-and-Language Model Training
While captioning models have obtained compelling results in describing
natural images, there is a growing effort to increase their capability of
dealing with real-world concepts. In this paper, we address the task of
generating fluent descriptions by training on a non-uniform combination of data
sources, containing both human- and automatically-collected captions. To this
end, we propose a model which induces a separation between content and
descriptive style through the incorporation of stylistic parameters and
keywords extracted from large-scale multi-modal models as pivotal data. In
terms of visual features, our model avoids the need of object detectors and
employs grid-like features together with a single objective of prompt language
modeling. Experimentally, we consistently outperform existing methods in terms
of caption quality and capability of describing out-of-domain concepts.
Finally, our model obtains a new state of the art on both COCO and nocaps
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
The primary goal of this FBK’s systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality. As such, we first question the need of ASR pre-training, showing that it is not essential to achieve competitive results. Second, we focus on data filtering, showing that a simple method that looks at the ratio between source and target characters yields a quality improvement of 1 BLEU. Third, we compare different methods to reduce the detrimental effect of the audio segmentation mismatch between training data manually segmented at sentence level and inference data that is automatically segmented. Towards the same goal of training cost reduction, we participate in the simultaneous task with the same model trained for offline ST. The effectiveness of our lightweight training strategy is shown by the high score obtained on the MuST-C en-de corpus (26.7 BLEU) and is confirmed in high-resource data conditions by a 1.6 BLEU improvement on the IWSLT2020 test set over last year’s winning system
LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language
Large Language Models represent state-of-the-art linguistic models designed
to equip computers with the ability to comprehend natural language. With its
exceptional capacity to capture complex contextual relationships, the LLaMA
(Large Language Model Meta AI) family represents a novel advancement in the
field of natural language processing by releasing foundational models designed
to improve the natural language understanding abilities of the transformer
architecture thanks to their large amount of trainable parameters (7, 13, and
70 billion parameters). In many natural language understanding tasks, these
models obtain the same performances as private company models such as OpenAI
Chat-GPT with the advantage to make publicly available weights and code for
research and commercial uses. In this work, we investigate the possibility of
Language Adaptation for LLaMA models, explicitly focusing on addressing the
challenge of Italian Language coverage. Adopting an open science approach, we
explore various tuning approaches to ensure a high-quality text generated in
Italian suitable for common tasks in this underrepresented language in the
original models' datasets. We aim to release effective text generation models
with strong linguistic properties for many tasks that seem challenging using
multilingual or general-purpose LLMs. By leveraging an open science philosophy,
this study contributes to Language Adaptation strategies for the Italian
language by introducing the novel LLaMAntino family of Italian LLMs
Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation
Deep-learning models for 3D point cloud semantic segmentation exhibit limited
generalization capabilities when trained and tested on data captured with
different sensors or in varying environments due to domain shift. Domain
adaptation methods can be employed to mitigate this domain shift, for instance,
by simulating sensor noise, developing domain-agnostic generators, or training
point cloud completion networks. Often, these methods are tailored for range
view maps or necessitate multi-modal input. In contrast, domain adaptation in
the image domain can be executed through sample mixing, which emphasizes input
data manipulation rather than employing distinct adaptation modules. In this
study, we introduce compositional semantic mixing for point cloud domain
adaptation, representing the first unsupervised domain adaptation technique for
point cloud segmentation based on semantic and geometric sample mixing. We
present a two-branch symmetric network architecture capable of concurrently
processing point clouds from a source domain (e.g. synthetic) and point clouds
from a target domain (e.g. real-world). Each branch operates within one domain
by integrating selected data fragments from the other domain and utilizing
semantic information derived from source labels and target (pseudo) labels.
Additionally, our method can leverage a limited number of human point-level
annotations (semi-supervised) to further enhance performance. We assess our
approach in both synthetic-to-real and real-to-real scenarios using LiDAR
datasets and demonstrate that it significantly outperforms state-of-the-art
methods in both unsupervised and semi-supervised settings.Comment: TPAMI. arXiv admin note: text overlap with arXiv:2207.0977
Il bilancio integrato per le PMI
Accanto ai capitali finanziario e produttivo, ogni impresa
fonda il proprio business e il proprio successo anche su
risorse intangibili, quali il capitale intellettuale, il capitale
umano, il capitale sociale e relazionale ed il capitale
naturale. Il tradizionale bilancio economico-finanziario,
però, non è adatto a valutare e rappresentare tali risorse,
poiché è stato concepito con riferimento ad un’economia
industriale fondata pressoché esclusivamente su capitali
tangibili; pertanto, anche avuto riguardo alla realtà delle
PMI, si rende oggi necessario introdurre nuovi strumenti
e nuovi indicatori per la misurazione e la rendicontazione,
che siano in grado di cogliere e valorizzare anche le
componenti immateriali del capitale aziendale. In questo
contesto, il bilancio integrato si pone come una forma
evoluta di comunicazione aziendale, finalizzata ad
illustrare come strategia, governance, modello di
business, rapporti con gli stakeholder, performance
passate e prospettive future, rischi e opportunitÃ
consentano anche ad un’impresa di piccole e medie
dimensioni di creare valore nel breve, medio e lungo
termine
Machine learning galaxy properties from 21 cm lightcones: impact of network architectures and signal contamination
Imaging the cosmic 21 cm signal will map out the first billion years of our Universe. The resulting 3D lightcone (LC) will encode the properties of the unseen first galaxies and physical cosmology. Unfortunately, there is no obvious summary statistic to use when interpreting this non-Gaussian image, and the commonly-used power spectrum may waste valuable information. Here we build on previous work using Convolutional Neural Networks (CNNs) to infer astrophysical parameters directly from 21 cm LC images. Guided by the properties of LCs, we combine recurrent layers characterizing evolution along the redshift axis with 2D convolutional layers characterizing local correlations in the sky-plane. Such Recursive Neural Networks (RNNs) are known for efficiently learning temporal correlations in sequential data. Using a large database of simulated cosmic 21 cm LCs, we confirm that RNNs outperform previously-used CNNs in recovering UV and X-ray galaxy properties, reducing the mean squared parameter estimation error by factors of . We also corrupt the cosmic signal by adding noise expected from a 1000 h integration with the Square Kilometre Array, as well as excising a foreground-contaminated ''horizon wedge''. Parameter prediction errors increase when the NNs are trained on these contaminated LC images, though recovery is still good even in the most pessimistic case (with ). However, we find no notable differences in performance between network architectures on the contaminated images. We argue this is due to the size of our dataset, highlighting the need for larger datasets and/or better data augmentation in order to maximize the potential of NNs in 21 cm parameter estimation