3,415 research outputs found
Self-supervised learning for transferable representations
Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks
Investigation of the metabolism of rare nucleotides in plants
Nucleotides are metabolites involved in primary metabolism, and specialized
metabolism and have a regulatory role in various biochemical reactions in all forms of life. While in other organisms, the nucleotide metabolome was characterized
extensively, comparatively little is known about the cellular concentrations of
nucleotides in plants. The aim of this dissertation was to investigate the nucleotide metabolome and enzymes influencing the composition and quantities of nucleotides in plants. For this purpose, a method for the analysis of nucleotides and nucleosides in plants and algae was developed (Chapter 2.1), which comprises efficient quenching of enzymatic
activity, liquid-liquid extraction and solid phase extraction employing a weak-anionexchange resin. This method allowed the analysis of the nucleotide metabolome of plants in great depth including the quantification of low abundant deoxyribonucleotides and deoxyribonucleosides. The details of the method were summarized in an article, serving as a laboratory protocol (Chapter 2.2).
Furthermore, we contributed a review article (Chapter 2.3) that summarizes the
literature about nucleotide analysis and recent technological advances with a focus on plants and factors influencing and hindering the analysis of nucleotides in plants, i.e., a complex metabolic matrix, highly stable phosphatases and physicochemical
properties of nucleotides. To analyze the sub-cellular concentrations of metabolites, a protocol for the rapid isolation of highly pure mitochondria utilizing affinity chromatography was developed (Chapter 2.4).
The method for the purification of nucleotides furthermore contributed to the
comprehensive analysis of the nucleotide metabolome in germinating seeds and in
establishing seedlings of A. thaliana, with a focus on genes involved in the synthesis of thymidilates (Chapter 2.5) and the characterization of a novel enzyme of purine nucleotide degradation, the XANTHOSINE MONOPHOSPHATE PHOSPHATASE (Chapter 2.6). Protein homology analysis comparing A. thaliana, S. cerevisiae, and H. sapiens led to the identification and characterization of an enzyme involved in the metabolite damage repair system of plants, the INOSINE TRIPHOSPHATE PYROPHOSPHATASE (Chapter 2.7). It was shown that this enzyme dephosphorylates deaminated purine nucleotide triphosphates and thus prevents their incorporation into nucleic acids. Lossof-function mutants senesce early and have a constitutively increased content of salicylic acid. Also, the source of deaminated purine nucleotides in plants was investigated and it was shown that abiotic factors contribute to nucleotide damage.Nukleotide sind Metaboliten, die am PrimÀrstoffwechsel und an spezialisierten
StoffwechselvorgÀngen beteiligt sind und eine regulierende Rolle bei verschiedenen
biochemischen Reaktionen in allen Lebensformen spielen. WĂ€hrend bei anderen
Organismen das Nukleotidmetabolom umfassend charakterisiert wurde, ist in Pflanzen
vergleichsweise wenig ĂŒber die zellulĂ€ren Konzentrationen von Nukleotiden bekannt.
Ziel dieser Dissertation war es, das Nukleotidmetabolom und die Enzyme zu
untersuchen, die die Zusammensetzung und Menge der Nukleotide in Pflanzen
beeinflussen. Zu diesem Zweck wurde eine Methode zur Analyse von Nukleotiden und
Nukleosiden in Pflanzen und Algen entwickelt (Kapitel 2.1), die ein effizientes Stoppen
enzymatischer AktivitĂ€t, eine FlĂŒssig-FlĂŒssig-Extraktion und eine
Festphasenextraktion unter Verwendung eines schwachen Ionenaustauschers
umfasst. Mit dieser Methode konnte das Nukleotidmetabolom von Pflanzen eingehend
analysiert werden, einschlieĂlich der Quantifizierung von Desoxyribonukleotiden und
Desoxyribonukleosiden mit geringer Abundanz. Die Einzelheiten der Methode wurden
in einem Artikel zusammengefasst, der als Laborprotokoll dient (Kapitel 2.2).
DarĂŒber hinaus wurde ein Ăbersichtsartikel (Kapitel 2.3) verfasst, der die Literatur
ĂŒber die Analyse von Nukleotiden und die jĂŒngsten technologischen Fortschritte
zusammenfasst. Der Schwerpunkt lag hierbei auf Pflanzen und Faktoren, die die
Analyse von Nukleotiden in Pflanzen beeinflussen oder behindern, d. h. eine komplexe
Matrix, hochstabile Phosphatasen und physikalisch-chemische Eigenschaften von
Nukleotiden.
Um die subzellulÀren Konzentrationen von Metaboliten zu analysieren, wurde ein
Protokoll fĂŒr die schnelle Isolierung hochreiner Mitochondrien unter Verwendung einer
AffinitÀtschromatographie entwickelt (Kapitel 2.4).
Die Methode zur Analyse von Nukleotiden trug auĂerdem zu einer umfassenden
Analyse des Nukleotidmetaboloms in keimenden Samen und in sich etablierenden
Keimlingen von A. thaliana bei, wobei der Schwerpunkt auf Genen lag, die an der
Synthese von Thymidilaten beteiligt sind (Kapitel 2.5), sowie zu der Charakterisierung
eines neuen Enzyms des Purinnukleotidabbaus, der XANTHOSINE
MONOPHOSPHATE PHOSPHATASE (Kapitel 2.6). Eine Proteinhomologieanalyse, die A. thaliana, S. cerevisiae und H. sapiens
miteinander verglich fĂŒhrte zur Identifizierung und Charakterisierung eines Enzyms,
das an der Reparatur von geschÀdigten Metaboliten in Pflanzen beteiligt ist, der
INOSINE TRIPHOSPHATE PYROPHOSPHATASE (Kapitel 2.7). Es konnte gezeigt
werden, dass dieses Enzym desaminierte Purinnukleotidtriphosphate
dephosphoryliert und so deren Einbau in NukleinsÀuren verhindert.
Funktionsverlustmutanten altern frĂŒh und weisen einen konstitutiv erhöhten Gehalt an SalicylsĂ€ure auf. AuĂerdem wurde die Quelle der desaminierten Purinnukleotide in Pflanzen untersucht, und es wurde gezeigt, dass abiotische Faktoren zur
NukleotidschÀdigung beitragen
ORCA: A Challenging Benchmark for Arabic Language Understanding
Due to their crucial role in all NLP, several benchmarks have been proposed
to evaluate pretrained language models. In spite of these efforts, no public
benchmark of diverse nature currently exists for evaluation of Arabic. This
makes it challenging to measure progress for both Arabic and multilingual
language models. This challenge is compounded by the fact that any benchmark
targeting Arabic needs to take into account the fact that Arabic is not a
single language but rather a collection of languages and varieties. In this
work, we introduce ORCA, a publicly available benchmark for Arabic language
understanding evaluation. ORCA is carefully constructed to cover diverse Arabic
varieties and a wide range of challenging Arabic understanding tasks exploiting
60 different datasets across seven NLU task clusters. To measure current
progress in Arabic NLU, we use ORCA to offer a comprehensive comparison between
18 multilingual and Arabic language models. We also provide a public
leaderboard with a unified single-number evaluation metric (ORCA score) to
facilitate future research.Comment: All authors contributed equally. Accepted at ACL 2023, Toronto,
Canad
La traduzione specializzata allâopera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.
Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The âLanguage Toolkit â Le lingue straniere al servizio dellâinternazionalizzazione dellâimpresaâ project, promoted by the Department of Interpreting and Translation (ForlĂŹ Campus) in collaboration with the Romagna Chamber of Commerce (ForlĂŹ-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices
Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai
2023 8th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand. 18-19 May 2023The author and his colleagues have been developing classical Chinese treebank using Universal Dependencies. We also developed RoBERTa-Classical-Chinese model pre-trained with classical Chinese texts of 1.7 billion characters. In this paper we describe how to finetune sequence-labeling RoBERTa model for dependency-parsing in classical Chinese. We introduce âgoeswithâ-labeled edges into the directed acyclic graphs of Universal Dependencies in order to resolve the mismatch between the token length of RoBERTa-Classical-Chinese and the word length in classical Chinese. We utilize [MASK]token of RoBERTa model to handle outgoing edges and to produce the adjacency-matrices for the graphs of Universal Dependencies. Our RoBERTa-UDgoeswith model outperforms other dependency-parsers in classical Chinese on LAS/MLAS/BLEX benchmark scores. Then we apply our methods to other isolating languages. For Vietnamese we introduce âgoeswithâ-labeled edges to separate words into space-separated syllables, and finetune RoBERTa and PhoBERT models. For Thai we try three kinds of tokenizers, character-wise tokenizer, quasi-syllable tokenizer, and SentencePiece, to produce RoBERTa models
Comparing the production of a formula with the development of L2 competence
This pilot study investigates the production of a formula with the development of L2 competence over proficiency levels of a spoken learner corpus. The results show that the formula
in beginner production data is likely being recalled holistically from learnersâ phonological
memory rather than generated online, identifiable by virtue of its fluent production in absence
of any other surface structure evidence of the formulaâs syntactic properties. As learnersâ L2
competence increases, the formula becomes sensitive to modifications which show structural
conformity at each proficiency level. The transparency between the formulaâs modification
and learnersâ corresponding L2 surface structure realisations suggest that it is the independent
development of L2 competence which integrates the formula into compositional language,
and ultimately drives the SLA process forward
Towards a muon collider
A muon collider would enable the big jump ahead in energy reach that is needed for a fruitful exploration of fundamental interactions. The challenges of producing muon collisions at high luminosity and 10 TeV centre of mass energy are being investigated by the recently-formed International Muon Collider Collaboration. This Review summarises the status and the recent advances on muon colliders design, physics and detector studies. The aim is to provide a global perspective of the field and to outline directions for future work
Recommended from our members
The Forward Physics Facility at the High-Luminosity LHC
High energy collisions at the High-Luminosity Large Hadron Collider (LHC) produce a large number of particles along the beam collision axis, outside of the acceptance of existing LHC experiments. The proposed Forward Physics Facility (FPF), to be located several hundred meters from the ATLAS interaction point and shielded by concrete and rock, will host a suite of experiments to probe standard model (SM) processes and search for physics beyond the standard model (BSM). In this report, we review the status of the civil engineering plans and the experiments to explore the diverse physics signals that can be uniquely probed in the forward region. FPF experiments will be sensitive to a broad range of BSM physics through searches for new particle scattering or decay signatures and deviations from SM expectations in high statistics analyses with TeV neutrinos in this low-background environment. High statistics neutrino detection will also provide valuable data for fundamental topics in perturbative and non-perturbative QCD and in weak interactions. Experiments at the FPF will enable synergies between forward particle production at the LHC and astroparticle physics to be exploited. We report here on these physics topics, on infrastructure, detector, and simulation studies, and on future directions to realize the FPFâs physics potential
- âŠ