3,415 research outputs found

    Self-supervised learning for transferable representations

    Get PDF
    Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks

    Investigation of the metabolism of rare nucleotides in plants

    Get PDF
    Nucleotides are metabolites involved in primary metabolism, and specialized metabolism and have a regulatory role in various biochemical reactions in all forms of life. While in other organisms, the nucleotide metabolome was characterized extensively, comparatively little is known about the cellular concentrations of nucleotides in plants. The aim of this dissertation was to investigate the nucleotide metabolome and enzymes influencing the composition and quantities of nucleotides in plants. For this purpose, a method for the analysis of nucleotides and nucleosides in plants and algae was developed (Chapter 2.1), which comprises efficient quenching of enzymatic activity, liquid-liquid extraction and solid phase extraction employing a weak-anionexchange resin. This method allowed the analysis of the nucleotide metabolome of plants in great depth including the quantification of low abundant deoxyribonucleotides and deoxyribonucleosides. The details of the method were summarized in an article, serving as a laboratory protocol (Chapter 2.2). Furthermore, we contributed a review article (Chapter 2.3) that summarizes the literature about nucleotide analysis and recent technological advances with a focus on plants and factors influencing and hindering the analysis of nucleotides in plants, i.e., a complex metabolic matrix, highly stable phosphatases and physicochemical properties of nucleotides. To analyze the sub-cellular concentrations of metabolites, a protocol for the rapid isolation of highly pure mitochondria utilizing affinity chromatography was developed (Chapter 2.4). The method for the purification of nucleotides furthermore contributed to the comprehensive analysis of the nucleotide metabolome in germinating seeds and in establishing seedlings of A. thaliana, with a focus on genes involved in the synthesis of thymidilates (Chapter 2.5) and the characterization of a novel enzyme of purine nucleotide degradation, the XANTHOSINE MONOPHOSPHATE PHOSPHATASE (Chapter 2.6). Protein homology analysis comparing A. thaliana, S. cerevisiae, and H. sapiens led to the identification and characterization of an enzyme involved in the metabolite damage repair system of plants, the INOSINE TRIPHOSPHATE PYROPHOSPHATASE (Chapter 2.7). It was shown that this enzyme dephosphorylates deaminated purine nucleotide triphosphates and thus prevents their incorporation into nucleic acids. Lossof-function mutants senesce early and have a constitutively increased content of salicylic acid. Also, the source of deaminated purine nucleotides in plants was investigated and it was shown that abiotic factors contribute to nucleotide damage.Nukleotide sind Metaboliten, die am PrimĂ€rstoffwechsel und an spezialisierten StoffwechselvorgĂ€ngen beteiligt sind und eine regulierende Rolle bei verschiedenen biochemischen Reaktionen in allen Lebensformen spielen. WĂ€hrend bei anderen Organismen das Nukleotidmetabolom umfassend charakterisiert wurde, ist in Pflanzen vergleichsweise wenig ĂŒber die zellulĂ€ren Konzentrationen von Nukleotiden bekannt. Ziel dieser Dissertation war es, das Nukleotidmetabolom und die Enzyme zu untersuchen, die die Zusammensetzung und Menge der Nukleotide in Pflanzen beeinflussen. Zu diesem Zweck wurde eine Methode zur Analyse von Nukleotiden und Nukleosiden in Pflanzen und Algen entwickelt (Kapitel 2.1), die ein effizientes Stoppen enzymatischer AktivitĂ€t, eine FlĂŒssig-FlĂŒssig-Extraktion und eine Festphasenextraktion unter Verwendung eines schwachen Ionenaustauschers umfasst. Mit dieser Methode konnte das Nukleotidmetabolom von Pflanzen eingehend analysiert werden, einschließlich der Quantifizierung von Desoxyribonukleotiden und Desoxyribonukleosiden mit geringer Abundanz. Die Einzelheiten der Methode wurden in einem Artikel zusammengefasst, der als Laborprotokoll dient (Kapitel 2.2). DarĂŒber hinaus wurde ein Übersichtsartikel (Kapitel 2.3) verfasst, der die Literatur ĂŒber die Analyse von Nukleotiden und die jĂŒngsten technologischen Fortschritte zusammenfasst. Der Schwerpunkt lag hierbei auf Pflanzen und Faktoren, die die Analyse von Nukleotiden in Pflanzen beeinflussen oder behindern, d. h. eine komplexe Matrix, hochstabile Phosphatasen und physikalisch-chemische Eigenschaften von Nukleotiden. Um die subzellulĂ€ren Konzentrationen von Metaboliten zu analysieren, wurde ein Protokoll fĂŒr die schnelle Isolierung hochreiner Mitochondrien unter Verwendung einer AffinitĂ€tschromatographie entwickelt (Kapitel 2.4). Die Methode zur Analyse von Nukleotiden trug außerdem zu einer umfassenden Analyse des Nukleotidmetaboloms in keimenden Samen und in sich etablierenden Keimlingen von A. thaliana bei, wobei der Schwerpunkt auf Genen lag, die an der Synthese von Thymidilaten beteiligt sind (Kapitel 2.5), sowie zu der Charakterisierung eines neuen Enzyms des Purinnukleotidabbaus, der XANTHOSINE MONOPHOSPHATE PHOSPHATASE (Kapitel 2.6). Eine Proteinhomologieanalyse, die A. thaliana, S. cerevisiae und H. sapiens miteinander verglich fĂŒhrte zur Identifizierung und Charakterisierung eines Enzyms, das an der Reparatur von geschĂ€digten Metaboliten in Pflanzen beteiligt ist, der INOSINE TRIPHOSPHATE PYROPHOSPHATASE (Kapitel 2.7). Es konnte gezeigt werden, dass dieses Enzym desaminierte Purinnukleotidtriphosphate dephosphoryliert und so deren Einbau in NukleinsĂ€uren verhindert. Funktionsverlustmutanten altern frĂŒh und weisen einen konstitutiv erhöhten Gehalt an SalicylsĂ€ure auf. Außerdem wurde die Quelle der desaminierten Purinnukleotide in Pflanzen untersucht, und es wurde gezeigt, dass abiotische Faktoren zur NukleotidschĂ€digung beitragen

    ORCA: A Challenging Benchmark for Arabic Language Understanding

    Full text link
    Due to their crucial role in all NLP, several benchmarks have been proposed to evaluate pretrained language models. In spite of these efforts, no public benchmark of diverse nature currently exists for evaluation of Arabic. This makes it challenging to measure progress for both Arabic and multilingual language models. This challenge is compounded by the fact that any benchmark targeting Arabic needs to take into account the fact that Arabic is not a single language but rather a collection of languages and varieties. In this work, we introduce ORCA, a publicly available benchmark for Arabic language understanding evaluation. ORCA is carefully constructed to cover diverse Arabic varieties and a wide range of challenging Arabic understanding tasks exploiting 60 different datasets across seven NLU task clusters. To measure current progress in Arabic NLU, we use ORCA to offer a comprehensive comparison between 18 multilingual and Arabic language models. We also provide a public leaderboard with a unified single-number evaluation metric (ORCA score) to facilitate future research.Comment: All authors contributed equally. Accepted at ACL 2023, Toronto, Canad

    La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.

    Get PDF
    Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (ForlĂŹ Campus) in collaboration with the Romagna Chamber of Commerce (ForlĂŹ-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices

    Sequence-Labeling RoBERTa Model for Dependency-Parsing in Classical Chinese and Its Application to Vietnamese and Thai

    Get PDF
    2023 8th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand. 18-19 May 2023The author and his colleagues have been developing classical Chinese treebank using Universal Dependencies. We also developed RoBERTa-Classical-Chinese model pre-trained with classical Chinese texts of 1.7 billion characters. In this paper we describe how to finetune sequence-labeling RoBERTa model for dependency-parsing in classical Chinese. We introduce “goeswith”-labeled edges into the directed acyclic graphs of Universal Dependencies in order to resolve the mismatch between the token length of RoBERTa-Classical-Chinese and the word length in classical Chinese. We utilize [MASK]token of RoBERTa model to handle outgoing edges and to produce the adjacency-matrices for the graphs of Universal Dependencies. Our RoBERTa-UDgoeswith model outperforms other dependency-parsers in classical Chinese on LAS/MLAS/BLEX benchmark scores. Then we apply our methods to other isolating languages. For Vietnamese we introduce “goeswith”-labeled edges to separate words into space-separated syllables, and finetune RoBERTa and PhoBERT models. For Thai we try three kinds of tokenizers, character-wise tokenizer, quasi-syllable tokenizer, and SentencePiece, to produce RoBERTa models

    Comparing the production of a formula with the development of L2 competence

    Get PDF
    This pilot study investigates the production of a formula with the development of L2 competence over proficiency levels of a spoken learner corpus. The results show that the formula in beginner production data is likely being recalled holistically from learners’ phonological memory rather than generated online, identifiable by virtue of its fluent production in absence of any other surface structure evidence of the formula’s syntactic properties. As learners’ L2 competence increases, the formula becomes sensitive to modifications which show structural conformity at each proficiency level. The transparency between the formula’s modification and learners’ corresponding L2 surface structure realisations suggest that it is the independent development of L2 competence which integrates the formula into compositional language, and ultimately drives the SLA process forward

    Towards a muon collider

    Get PDF
    A muon collider would enable the big jump ahead in energy reach that is needed for a fruitful exploration of fundamental interactions. The challenges of producing muon collisions at high luminosity and 10 TeV centre of mass energy are being investigated by the recently-formed International Muon Collider Collaboration. This Review summarises the status and the recent advances on muon colliders design, physics and detector studies. The aim is to provide a global perspective of the field and to outline directions for future work
    • 

    corecore