516 research outputs found

    Improving Cross-Lingual Transfer Learning for Event Detection

    Get PDF
    The widespread adoption of applications powered by Artificial Intelligence (AI) backbones has unquestionably changed the way we interact with the world around us. Applications such as automated personal assistants, automatic question answering, and machine-based translation systems have become mainstays of modern culture thanks to the recent considerable advances in Natural Language Processing (NLP) research. Nonetheless, with over 7000 spoken languages in the world, there still remain a considerable number of marginalized communities that are unable to benefit from these technological advancements largely due to the language they speak. Cross-Lingual Learning (CLL) looks to address this issue by transferring the knowledge acquired from a popular, high-resource source language (e.g., English, Chinese, or Spanish) to a less favored, lower-resourced target language (e.g., Urdu or Swahili). This dissertation leverages the Event Detection (ED) sub-task of Information Extraction (IE) as a testbed and presents three novel approaches that improve cross-lingual transfer learning from distinct perspectives: (1) direct knowledge transfer, (2) hybrid knowledge transfer, and (3) few-shot learning

    Produkce diskurzu českých mluvčích s afázií: Explorace s využitím usage-based lingvistiky

    Get PDF
    The research in linguistic aphasiology has been dominated by structuralist, rule-based approaches to the study of langauge. However, recent work has shown that analyses based in constructivist, usage-based frameworks can provide explanations to patterns of language processing in aphasia that are difficult to accommodate in structuralist models. The present work follows up on these findings and aims to provide additional evidence for the benefits of the usage-based model by using data from Czech speakers with aphasia, an understudied language in this context. The aims of the study were threefold: to create a collection of samples of aphasic connected speech available to other researchers, to provide a description of the patterns of aphasic discourse production in Czech, and, most importantly, to show potential benefits of usage-based construction grammar for aphasia research. A corpus of the speech of eleven persons with fluent and non-fluent aphasia of varying degrees of severity was created. The corpus consist of more than 23000 word position produced by speakers with aphasia in tasks used to elicit conversational, narrative, descriptive, and procedural discourse. The corpus is lemmatized and morphologically tagged and the transcripts are aligned with audio recordings. A smaller sample of three,...Výzkum v lingvistické afaziologii využíval po dlouhou dobu především strukturalistické přístupy založené na pravidlech. Některé výsledky z poslední doby však ukazují, že konstruktivistické přístupy založené na užívání jazyka (usage-based přístup) dokážou vysvětlit některá specifika zpracování jazyka v afázii, která jsou ve strukturalistickém rámci obtížně vysvětlitelná. Předkládaná dizertační práce navazuje na tyto výzkumy a klade si za cíl předložit další důkazy pro výhodnost usage-přístupu. Využívá přitom data z češtiny, která je v afaziologickém výzkumu značně podreprezentovaná. Práce si stanovila tři cíle: jednak shromáždit projevy českých mluvčích s afázií, které by byly přístupné dalším výzkumníkům, dále podat detailní popis produkce diskurzu v afázii v češtině a konečně ukázat některé přednosti usage-based přístupu pro afaziologii. V rámci práce byl vytvořen korpus jedenácti mluvčích s fluentní a nefluentní afázií s různými stupni závažnosti poruchy. Korpus obsahuje přes 23000 slovních pozic vyprodukovaných mluvčími s afázií sebranými s využitím úkolů, jejichž cílem bylo elicitovat konverzační, narativní, deskriptivní a procedurální diskurz. Korpus je lematizován a morfologicky označkován. Dále je v něm zahrnut menší vzorek řečové produkce tří neurotypických mluvčích se srovnatelnými...Ústav českého jazyka a teorie komunikaceInstitute of Czech Language and Theory of CommunicationFaculty of ArtsFilozofická fakult

    Workshop Proceedings of the 12th edition of the KONVENS conference

    Get PDF
    The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Full text link
    Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.Comment: Under submission at Computer Science and Language. Preprint allowe

    Machine Learning Algorithm for the Scansion of Old Saxon Poetry

    Get PDF
    Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input verses

    Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement

    Full text link
    Deep learning (DL) package supply chains (SCs) are critical for DL frameworks to remain competitive. However, vital knowledge on the nature of DL package SCs is still lacking. In this paper, we explore the domains, clusters, and disengagement of packages in two representative PyPI DL package SCs to bridge this knowledge gap. We analyze the metadata of nearly six million PyPI package distributions and construct version-sensitive SCs for two popular DL frameworks: TensorFlow and PyTorch. We find that popular packages (measured by the number of monthly downloads) in the two SCs cover 34 domains belonging to eight categories. Applications, Infrastructure, and Sciences categories account for over 85% of popular packages in either SC and TensorFlow and PyTorch SC have developed specializations on Infrastructure and Applications packages respectively. We employ the Leiden community detection algorithm and detect 131 and 100 clusters in the two SCs. The clusters mainly exhibit four shapes: Arrow, Star, Tree, and Forest with increasing dependency complexity. Most clusters are Arrow or Star, but Tree and Forest clusters account for most packages (Tensorflow SC: 70%, PyTorch SC: 90%). We identify three groups of reasons why packages disengage from the SC (i.e., remove the DL framework and its dependents from their installation dependencies): dependency issues, functional improvements, and ease of installation. The most common disengagement reason in the two SCs are different. Our study provides rich implications on the maintenance and dependency management practices of PyPI DL SCs.Comment: Manuscript submitted to ACM Transactions on Software Engineering and Methodolog

    Personality-aware Human-centric Multimodal Reasoning: A New Task

    Full text link
    Multimodal reasoning, an area of artificial intelligence that aims at make inferences from multimodal signals such as vision, language and speech, has drawn more and more attention in recent years. People with different personalities may respond differently to the same situation. However, such individual personalities were ignored in the previous studies. In this work, we introduce a new Personality-aware Human-centric Multimodal Reasoning (Personality-aware HMR) task, and accordingly construct a new dataset based on The Big Bang Theory television shows, to predict the behavior of a specific person at a specific moment, given the multimodal information of its past and future moments. The Myers-Briggs Type Indicator (MBTI) was annotated and utilized in the task to represent individuals' personalities. We benchmark the task by proposing three baseline methods, two were adapted from the related tasks and one was newly proposed for our task. The experimental results demonstrate that personality can effectively improve the performance of human-centric multimodal reasoning. To further solve the lack of personality annotation in real-life scenes, we introduce an extended task called Personality-predicted HMR, and propose the corresponding methods, to predict the MBTI personality at first, and then use the predicted personality to help multimodal reasoning. The experimental results show that our method can accurately predict personality and achieves satisfactory multimodal reasoning performance without relying on personality annotations

    Managing healthcare transformation towards P5 medicine (Published in Frontiers in Medicine)

    Get PDF
    Health and social care systems around the world are facing radical organizational, methodological and technological paradigm changes to meet the requirements for improving quality and safety of care as well as efficiency and efficacy of care processes. In this they’re trying to manage the challenges of ongoing demographic changes towards aging, multi-diseased societies, development of human resources, a health and social services consumerism, medical and biomedical progress, and exploding costs for health-related R&D as well as health services delivery. Furthermore, they intend to achieve sustainability of global health systems by transforming them towards intelligent, adaptive and proactive systems focusing on health and wellness with optimized quality and safety outcomes. The outcome is a transformed health and wellness ecosystem combining the approaches of translational medicine, 5P medicine (personalized, preventive, predictive, participative precision medicine) and digital health towards ubiquitous personalized health services realized independent of time and location. It considers individual health status, conditions, genetic and genomic dispositions in personal social, occupational, environmental and behavioural context, thus turning health and social care from reactive to proactive. This requires the advancement communication and cooperation among the business actors from different domains (disciplines) with different methodologies, terminologies/ontologies, education, skills and experiences from data level (data sharing) to concept/knowledge level (knowledge sharing). The challenge here is the understanding and the formal as well as consistent representation of the world of sciences and practices, i.e. of multidisciplinary and dynamic systems in variable context, for enabling mapping between the different disciplines, methodologies, perspectives, intentions, languages, etc. Based on a framework for dynamically, use-case-specifically and context aware representing multi-domain ecosystems including their development process, systems, models and artefacts can be consistently represented, harmonized and integrated. The response to that problem is the formal representation of health and social care ecosystems through an system-oriented, architecture-centric, ontology-based and policy-driven model and framework, addressing all domains and development process views contributing to the system and context in question. Accordingly, this Research Topic would like to address this change towards 5P medicine. Specifically, areas of interest include, but are not limited: • A multidisciplinary approach to the transformation of health and social systems • Success factors for sustainable P5 ecosystems • AI and robotics in transformed health ecosystems • Transformed health ecosystems challenges for security, privacy and trust • Modelling digital health systems • Ethical challenges of personalized digital health • Knowledge representation and management of transformed health ecosystems Table of Contents: 04 Editorial: Managing healthcare transformation towards P5 medicine Bernd Blobel and Dipak Kalra 06 Transformation of Health and Social Care Systems—An Interdisciplinary Approach Toward a Foundational Architecture Bernd Blobel, Frank Oemig, Pekka Ruotsalainen and Diego M. Lopez 26 Transformed Health Ecosystems—Challenges for Security, Privacy, and Trust Pekka Ruotsalainen and Bernd Blobel 36 Success Factors for Scaling Up the Adoption of Digital Therapeutics Towards the Realization of P5 Medicine Alexandra Prodan, Lucas Deimel, Johannes Ahlqvist, Strahil Birov, Rainer Thiel, Meeri Toivanen, Zoi Kolitsi and Dipak Kalra 49 EU-Funded Telemedicine Projects – Assessment of, and Lessons Learned From, in the Light of the SARS-CoV-2 Pandemic Laura Paleari, Virginia Malini, Gabriella Paoli, Stefano Scillieri, Claudia Bighin, Bernd Blobel and Mauro Giacomini 60 A Review of Artificial Intelligence and Robotics in Transformed Health Ecosystems Kerstin Denecke and Claude R. Baudoin 73 Modeling digital health systems to foster interoperability Frank Oemig and Bernd Blobel 89 Challenges and solutions for transforming health ecosystems in low- and middle-income countries through artificial intelligence Diego M. López, Carolina Rico-Olarte, Bernd Blobel and Carol Hullin 111 Linguistic and ontological challenges of multiple domains contributing to transformed health ecosystems Markus Kreuzthaler, Mathias Brochhausen, Cilia Zayas, Bernd Blobel and Stefan Schulz 126 The ethical challenges of personalized digital health Els Maeckelberghe, Kinga Zdunek, Sara Marceglia, Bobbie Farsides and Michael Rigb
    corecore