234 research outputs found

    La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.

    Get PDF
    Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (Forlì Campus) in collaboration with the Romagna Chamber of Commerce (Forlì-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices

    A mixed methods study of the resilience skills of children and young people aged 11-15 with a mild moderate hearing loss

    Get PDF
    This study investigated the resilience skills of 29 Children and young people (CYP) aged 11-15 with a mild moderate hearing loss (MMHL). An exploration of the perceived resilience skills of the CYP with MMHL in this study were compared to the skills of their peers with expected levels of hearing (ELH) – 30 CYP and a severe profound hearing loss (SPHL) – 24 CYP. The research question asked, ‘How resilient are CYP with MMHL and how do they feel they would demonstrate resilience skills in everyday activities?’ There were three phases to this study: phase one involved a focus group of CYP with ELH, MMHL and SPHL. This group was asked to pilot the research tools, language assessments and the questionnaire devised specifically for the study as well as exploring the concept of ‘resilience’. The findings of the focus group that comprised of CYP with hearing as well as a hearing loss (HL) identified that resilience was a word used consistently throughout school, but they all agreed that they neither understood what it meant nor what it meant to be a resilient person. The phrases associated to the researcher’s definition of resilience used as part of the research were presented to the CYP and they discussed what each phrase meant to them. In phase two, all 83 participants; 30 ELH, 29 MMHL and 24 SPHL completed assessments of receptive and expressive language and the specifically devised questionnaire. The language assessments highlighted that the CYP with MMHL presented abilities commensurate with their chronological age. The questionnaire responses identified differences between learners with MMHL and those with ELH and SPHL. The key areas related to explaining their HL and audiological equipment to others; lack of organisational skills; leading a team; discussing emotions with close family and friends, and communication skills. Phase three involved semi-structured interviews with a group of nine CYP with MMHL to drill down into the questionnaire responses and address some of the themes the analysis highlighted. The data from phase two and three suggested that CYP with MMHL lacked the skills underpinning resilience and were unable to demonstrate such skills in everyday activities. They identified that, because of their higher language abilities, they received limited support from Teachers of the Deaf (ToDs). The CYP with MMHL believed specific resilience skills need to be learnt and practised. Opportunities to employ such skills in school and in social situations appear limited. A factor identified was parental anxiety associated with the belief their child would be vulnerable due to their HL

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field

    Leveraging EEG-based speech imagery brain-computer interfaces

    Get PDF
    Speech Imagery Brain-Computer Interfaces (BCIs) provide an intuitive and flexible way of interaction via brain activity recorded during imagined speech. Imagined speech can be decoded in form of syllables or words and captured even with non-invasive measurement methods as for example the Electroencephalography (EEG). Over the last decade, research in this field has made tremendous progress and prototypical implementations of EEG-based Speech Imagery BCIs are numerous. However, most work is still conducted in controlled laboratory environments with offline classification and does not find its way to real online scenarios. Within this thesis we identify three main reasons for these circumstances, namely, the mentally and physically exhausting training procedures, insufficient classification accuracies and cumbersome EEG setups with usually high-resolution headsets. We furthermore elaborate on possible solutions to overcome the aforementioned problems and present and evaluate new methods in each of the domains. In detail we introduce two new training concepts for imagined speech BCIs, one based on EEG activity during silently reading and the other recorded during overtly speaking certain words. Insufficient classification accuracies are addressed by introducing the concept of a Semantic Speech Imagery BCI, which classifies the semantic category of an imagined word prior to the word itself to increase the performance of the system. Finally, we investigate on different techniques for electrode reduction in Speech Imagery BCIs and aim at finding a suitable subset of electrodes for EEG-based imagined speech detection, therefore facilitating the cumbersome setups. All of our presented results together with general remarks on experiences and best practice for study setups concerning imagined speech are summarized and supposed to act as guidelines for further research in the field, thereby leveraging Speech Imagery BCIs towards real-world application.Speech Imagery Brain-Computer Interfaces (BCIs) bieten eine intuitive und flexible Möglichkeit der Interaktion mittels Gehirnaktivität, aufgezeichnet während der bloßen Vorstellung von Sprache. Vorgestellte Sprache kann in Form von Silben oder Wörtern auch mit nicht-invasiven Messmethoden wie der Elektroenzephalographie (EEG) gemessen und entschlüsselt werden. In den letzten zehn Jahren hat die Forschung auf diesem Gebiet enorme Fortschritte gemacht, und es gibt zahlreiche prototypische Implementierungen von EEG-basierten Speech Imagery BCIs. Die meisten Arbeiten werden jedoch immer noch in kontrollierten Laborumgebungen mit Offline-Klassifizierung durchgeführt und finden nicht denWeg in reale Online-Szenarien. In dieser Arbeit identifizieren wir drei Hauptgründe für diesen Umstand, nämlich die geistig und körperlich anstrengenden Trainingsverfahren, unzureichende Klassifizierungsgenauigkeiten und umständliche EEG-Setups mit meist hochauflösenden Headsets. Darüber hinaus erarbeiten wir mögliche Lösungen zur Überwindung der oben genannten Probleme und präsentieren und evaluieren neue Methoden für jeden dieser Bereiche. Im Einzelnen stellen wir zwei neue Trainingskonzepte für Speech Imagery BCIs vor, von denen eines auf der Messung von EEG-Aktivität während des stillen Lesens und das andere auf der Aktivität während des Aussprechens bestimmter Wörter basiert. Unzureichende Klassifizierungsgenauigkeiten werden durch die Einführung des Konzepts eines Semantic Speech Imagery BCI angegangen, das die semantische Kategorie eines vorgestellten Wortes vor dem Wort selbst klassifiziert, um die Performance des Systems zu erhöhen. Schließlich untersuchen wir verschiedene Techniken zur Elektrodenreduktion bei Speech Imagery BCIs und zielen darauf ab, eine geeignete Teilmenge von Elektroden für die EEG-basierte Erkennung von vorgestellter Sprache zu finden, um so die umständlichen Setups zu erleichtern. Alle unsere Ergebnisse werden zusammen mit allgemeinen Bemerkungen zu Erfahrungen und Best Practices für Studien-Setups bezüglich vorgestellter Sprache zusammengefasst und sollen als Richtlinien für die weitere Forschung auf diesem Gebiet dienen, um so Speech Imagery BCIs für die Anwendung in der realenWelt zu optimieren

    DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

    Full text link
    In recent years, audio-driven 3D facial animation has gained significant attention, particularly in applications such as virtual reality, gaming, and video conferencing. However, accurately modeling the intricate and subtle dynamics of facial expressions remains a challenge. Most existing studies approach the facial animation task as a single regression problem, which often fail to capture the intrinsic inter-modal relationship between speech signals and 3D facial animation and overlook their inherent consistency. Moreover, due to the limited availability of 3D-audio-visual datasets, approaches learning with small-size samples have poor generalizability that decreases the performance. To address these issues, in this study, we propose a cross-modal dual-learning framework, termed DualTalker, aiming at improving data usage efficiency as well as relating cross-modal dependencies. The framework is trained jointly with the primary task (audio-driven facial animation) and its dual task (lip reading) and shares common audio/motion encoder components. Our joint training framework facilitates more efficient data usage by leveraging information from both tasks and explicitly capitalizing on the complementary relationship between facial motion and audio to improve performance. Furthermore, we introduce an auxiliary cross-modal consistency loss to mitigate the potential over-smoothing underlying the cross-modal complementary representations, enhancing the mapping of subtle facial expression dynamics. Through extensive experiments and a perceptual user study conducted on the VOCA and BIWI datasets, we demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. We have made our code and video demonstrations available at https://github.com/sabrina-su/iadf.git

    Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

    Full text link
    The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples may not be sufficient to characterize an entire concept class. In contrast, humans use cross-modal information to learn new concepts efficiently. In this work, we demonstrate that one can indeed build a better visual{\bf visual} dog classifier by read{\bf read}ing about dogs and listen{\bf listen}ing to them bark. To do so, we exploit the fact that recent multimodal foundation models such as CLIP are inherently cross-modal, mapping different modalities to the same representation space. Specifically, we propose a simple cross-modal adaptation approach that learns from few-shot examples spanning different modalities. By repurposing class names as additional one-shot training samples, we achieve SOTA results with an embarrassingly simple linear classifier for vision-language adaptation. Furthermore, we show that our approach can benefit existing methods such as prefix tuning, adapters, and classifier ensembling. Finally, to explore other modalities beyond vision and language, we construct the first (to our knowledge) audiovisual few-shot benchmark and use cross-modal training to improve the performance of both image and audio classification.Comment: CVPR 2023. Project website: https://linzhiqiu.github.io/papers/cross_modal

    End-user requirements of an assistive technology for profoundly deaf parents with infants

    Get PDF
    As the number of deaf people in the world increases, the amount of parents who are deaf, is also growing. The world is increasingly relying on technology from which deaf parents can, and do, benefit significantly. Deaf parents are able to rely on available technology such as assistive technologies to overcome functional limitations. However, assistive technologies are often abandoned within a short period of time of being acquired. The abandonment of assistive technologies is believed to be due to a lack of proper elicitation of requirements. Therefore, the problem identified in this research is a lack of understanding of end-user requirements of an assistive technology for profoundly deaf parents with infants. A literature review together with logical argumentation was conducted and applied to identify and recommend a method suitable for eliciting end-user requirements for assistive technologies. Thereafter, an integrative literature review and thematic analysis was done to extract needs and challenges of profoundly deaf parents with infants, and group them according to themes that emerged. Finally, making use of the recommended method and the extracted needs and challenges of profoundly deaf parents with infants, twenty-eight end-user requirements of an assistive technology for profoundly deaf parents with infants were elicited. The twenty-eight elicited end-user requirements consist of eighteen end-user requirements that express functions of an assistive technology for profoundly deaf parents with infants, and ten end-user requirements that express an overall goal/objective to be attained by profoundly deaf parents with infants when the assistive technology is designed and developed. To evaluate the elicited end-user requirements, only the eighteen end-user requirements that express functions of an assistive technology for profoundly deaf parents with infants were considered. The evaluation was done by assessing both existing and emerging assistive technologies to understand the comprehensiveness of the eighteen elicited end-user requirements that express functions of an assistive technology for profoundly deaf parents with infants.Thesis (MIT) -- Faculty of Engineering, the Built Environment and Technology, School of Information and Communication Technology, 202

    Staged Otherness

    Get PDF
    The cultural phenomenon of exhibiting non-European people in front of the European audiences in the 19th and 20th century was concentrated in the metropolises in the western part of the continent. Nevertheless, traveling ethnic troupes and temporary exhibitions of non-European humans took place also in territories located to the east of the Oder river and Austria. The contributors to this edited volume present practices of ethnographic shows in Russia, Poland, Czechia, Slovenia, Hungary, Germany, Romania, and Austria and discuss the reactions of local audiences. The essays offer critical arguments to rethink narratives of cultural encounters in the context of ethnic shows. By demonstrating the many ways in which the western models and customs were reshaped, developed, and contested in Central and Eastern European contexts, the authors argue that the dominant way of characterizing these performances as “human zoos” is too narrow. The contributors had to tackle the difficult task of finding traces other than faint copies of official press releases by the tour organizers. The original source material was drawn from local archives, museums, and newspapers of the discussed period. A unique feature of the volume is the rich amount of images that complement every single case study of ethnic shows

    Data and methods for a visual understanding of sign languages

    Get PDF
    Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages. Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automàtic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfícies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen característiques manuals i no manuals per transmetre informació, i no tenen una forma escrita estàndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automàtic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automàtica de les llengües de signes. Així, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vídeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capítol 4 una solució per anotar signes automàticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contínues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capítol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vídeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capítol 6, explorem la generació de vídeos en llengua de signes aplicant xarxes adversàries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vídeos generats automàticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vídeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente, siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologías que funcionen con las lenguas de signos presenta retos de investigación que requieren esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad sorda y las personas no signantes. Sin embargo, las tecnologías más modernas en IA todavía no consideran las lenguas de signos en sus interfaces con el usuario. Esto se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan características manuales y no manuales para transmitir información, y carecen de una forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una mejor comprensión automática de las lenguas de signos. Así, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal y multivista de vídeos de lenguaje la lengua de signos estadounidense. En la Part II, contribuimos al desarrollo de tecnología para lenguas de signos, presentando en el capítulo 4 una solución para anotar signos automáticamente llamada Spot-Align, basada en métodos de localización de signos en secuencias continuas de signos. Después, presentamos las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación, en el capítulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign para explorar la búsqueda de vídeos en lengua de signos a partir del entrenamiento de incrustaciones multimodales. Finalmente, en el capítulo 6, exploramos la generación de vídeos en lengua de signos aplicando redes adversarias generativas al dominio de la lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vídeos generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en las categorías dentro de How2Sign y la traducción de los vídeos al inglés escrito.Teoria del Senyal i Comunicacion
    corecore