12 research outputs found

    The automatic analysis of classroom talk

    Get PDF
    The SMART SPEECH Project is a joint venture between three Finnish universities and a Chilean university. The aim is to develop a mobile application that can be used to record classroom talk and enable observations to be made of classroom interactions. We recorded Finnish and Chilean physics teachers’ speech using both a conventional microphone/dictator setup and a microphone/mobile application setup. The recordings were analysed via automatic speech recognition (ASR). The average word error rate achieved for the Finnish teachers’ speech was under 40%. The ASR approach also enabled us to determine the key topics discussed within the Finnish physics lessons under scrutiny. The results here were promising as the recognition accuracy rate was about 85% on average

    Lecture instrumentation based on synchronous speech transcription

    Get PDF
    International audience—Speech transcription can now be achieved quickly enough to generate subtitles in real-time. Research presented in this paper aims to explore the potential of synchronous transcription in instrumenting learning environments, specifically in the context of hybrid lectures. Considering available technologies, we designed a toolset in order to answer the existing needs of students and teachers. This toolset is built around the transcription of the teacher's speech, which acts as a support for interactions and provides data needed by other functionalities. We present here the different interfaces and their components, as well as the results of the user test carried out in real life conditions

    AI-Josyu: Thinking Support System in Class by Real-time Speech Recognition and Keyword Extraction

    Get PDF
    In this paper, we present a thinking support system, AI-Josyu. This system also operates as a class support system which helps to teachers for lightening their work. AI-Josyu is implemented based on media-driven real-time content management framework. The system links real world media and legacy media contents together. In resent years, it is easier to collect a large amount of various kinds of data which are created with sensors in the real world. The system realizes interconnection and utilization of legacy media contents. The legacy media contents are generated and scattered on the Internet. The framework has four modules, which are called “acquisition,†“extraction,†“selection,†and “retrieval.†The real world media and the legacy media contents are interconnected by these modules. This interconnection includes semantic components. This system records teacher's voice of its lecture in real time and presents retrieved legacy media contents corresponding to subject of the lecture. By this presentation, preparing of the legacy contents is not required. This system automatically retrieves and shows the legacy media contents. This system helps students to understand contents of the lecture. In addition, the system attends to expansion of ideas. We constructed the system and conducted the demonstration in class. It shows that the system is helpful to teacher and students for expansion of thinking

    An Empirical Simulation-based Study of Real-Time Speech Translation for Multilingual Global Project Teams

    Get PDF
    ABSTRACT Context: Real-time speech translation technology is today available but still lacks a complete understanding of how such technology may affect communication in global software projects. Goal: To investigate the adoption of combining speech recognition and machine translation in order to overcome language barriers among stakeholders who are remotely negotiating software requirements. Method: We performed an empirical simulation-based study including: Google Web Speech API and Google Translate service, two groups of four subjects, speaking Italian and Brazilian Portuguese, and a test set of 60 technical and non-technical utterances. Results: Our findings revealed that, overall: (i) a satisfactory accuracy in terms of speech recognition was achieved, although significantly affected by speaker and utterance differences; (ii) adequate translations tend to follow accurate transcripts, meaning that speech recognition is the most critical part for speech translation technology. Conclusions: Results provide a positive albeit initial evidence towards the possibility to use speech translation technologies to help globally distributed team members to communicate in their native languages

    Providing quality education for all: the dilemma of deaf students in a mainstream Technical University

    Get PDF
    Technical and Vocational Education and Training (TVET) requires that students comprehend and turn theoretical knowledge into practical skills within the shortest possible time, if not immediately. This dictate of TVET however, is a major challenge for most Deaf students who largely rely on others to benefit from classroom teaching because of communication challenges. It is for this reason that this study considered the experiences of the Deaf pursuing TVET programmes in a mainstream Technical University and how they could be supported. A qualitative approach (face-to-face interview) was adopted and thematic analysis was used in analysing the data. Generally speaking, the Deaf students had academic, financial and socio-cultural challenges. Nonetheless, what is so intriguing to read in this study is the communication challenges they experienced. The study’s findings suggest the need for Sign Language (SL) to be promoted at least, among those who come into contact with Deaf students pursuing TVET in a technical university

    Multilingual videos for MOOCs and OER

    Full text link
    [EN] Massive Open Online Courses (MOOCs) and Open Educational Resources (OER) are rapidly growing, but are not usually offered in multiple languages due to the lack of cost-effective solutions to translate the different objects comprising them and particularly videos. However, current state-of-the-art automatic speech recognition (ASR) and machine translation (MT) techniques have reached a level of maturity which opens the possibility of producing multilingual video subtitles of publishable quality at low cost. This work summarizes authors' experience in exploring this possibility in two real-life case studies: a MOOC platform and a large video lecture repository. Apart from describing the systems, tools and integration components employed for such purpose, a comprehensive evaluation of the results achieved is provided in terms of quality and efficiency. More precisely, it is shown that draft multilingual subtitles produced by domainadapted ASR/MT systems reach a level of accuracy that make them worth post-editing, instead of generating them ex novo, saving approximately 25%-75% of the time. Finally, the results reported on user multilingual data consumption reflect that multilingual subtitles have had a very positive impact in our case studies boosting student enrolment, in the case of the MOOC platform, by 70% relative.The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755 (transLectures) and from the EU's ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme under grant agreement no. 621030 (EMMA). Additionally, it is supported by the Spanish research project TIN2015-68326-R (MINECO/FEDER).Valor Miró, JD.; Baquero-Arnal, P.; Civera Saiz, J.; Turró Ribalta, C.; Juan, A. (2018). Multilingual videos for MOOCs and OER. Educational Technology & Society. 21(2):1-12. http://hdl.handle.net/10251/122577S11221

    Speech technologies for inclusive education

    Get PDF
    Existen condiciones de discapacidad que comprometen la habilidad de las personas para expresarse utilizando su voz. Esto puede constituir una limitante para la comunicación, y una barrera para la inclusión de estudiantes con estas condiciones en contextos educativos tradicionales, donde la comunicación verbal es predominante. Dado que la educación es un derecho humano, es imperativo establecer opciones para que el estudiantado con discapacidad sea incluido en sistemas educativos donde pueda ejercer plenamente sus derechos, lo que ha llevado a la investigación, análisis y creación de sistemas de comunicación aumentativos y alternativos. En este ensayo se presentan el tema de tecnologías del habla, las cuales tienen como finalidad replicar la capacidad humana de comunicarse por esta vía apoyándose en dispositivos electrónicos, y el potencial de aplicación de estas tecnologías en los procesos de educación inclusiva. Para esto se da una visión general de los conceptos teóricos, las tendencias y los puntos de encuentro de la educación inclusiva y la tecnología, que hacen factible la creación de sistemas y dispositivos que impacten positivamente la calidad de vida del alumnado en su desarrollo educativo. Para que estas posibilidades tecnológicas sean una realidad en las aulas y los diversos entornos educativos, se propone el establecimiento de equipos interdisciplinarios amplios que permitan un abordaje integral de cada caso particular. De esta manera, las soluciones se pueden convertir en herramientas concretas verdaderamente útiles y confiables para el usuario, su familia y su entorno educativo.There are conditions of disability that compromise the ability of people to express themselves using their voice. This may constitute a limitation for communication, and a barrier to the inclusion of students with these conditions in traditional educational contexts, where verbal communication is predominant. Since education is a human right, it is imperative to establish options for students with disabilities to be included in educational systems where they can fully exercise their rights, which has led to the research, analysis and creation of augmentative and alternative communication systems. This essay presents the most recent developments in speech technologies, which are intended to replicate the human capacity to communicate in this way based on electronic devices, and the potential for application of these technologies in the processes of educational inclusion. We give an overview of the theoretical concepts, trends and meeting points of educational inclusion and technology, which make it possible to create systems and devices that positively impact the quality of life of students in their educational development. To make these technological a reality in the classrooms and other educational environments, it is proposed the establishment of broad interdisciplinary teams that allow an approach integral of each particular case. In this way, solutions can become concrete tools that are truly useful and reliable for the user, his family, and his educational environment.Universidad de Costa Rica/[ED-3416]/UCR/Costa RicaUCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería EléctricaUCR::Vicerrectoría de Docencia::Ciencias Sociales::Facultad de Educación::Escuela de Orientación y Educación Especia

    Clearing the transcription hurdle in dialect corpus building : the corpus of Southern Dutch dialects as case-study

    Get PDF
    This paper discusses how the transcription hurdle in dialect corpus building can be cleared. While corpus analysis has strongly gained in popularity in linguistic research, dialect corpora are still relatively scarce. This scarcity can be attributed to several factors, one of which is the challenging nature of transcribing dialects, given a lack of both orthographic norms for many dialects and speech technological tools trained on dialect data. This paper addresses the questions (i) how dialects can be transcribed efficiently and (ii) whether speech technological tools can lighten the transcription work. These questions are tackled using the Southern Dutch dialects (SDDs) as case study, for which the usefulness of automatic speech recognition (ASR), respeaking, and forced alignment is considered. Tests with these tools indicate that dialects still constitute a major speech technological challenge. In the case of the SDDs, the decision was made to use speech technology only for the word-level segmentation of the audio files, as the transcription itself could not be sped up by ASR tools. The discussion does however indicate that the usefulness of ASR and other related tools for a dialect corpus project is strongly determined by the sound quality of the dialect recordings, the availability of statistical dialect-specific models, the degree of linguistic differentiation between the dialects and the standard language, and the goals the transcripts have to serve

    Aplicación de tecnologías de segmentación de audio y reconocimiento automático de dialecto para la obtención de información de diálogos contenidos en audio

    Get PDF
    El interés de la comunidad científica en la identificación de contenidos audiovisuales ha crecido considerablemente en los últimos años, debido a la necesidad de ejecutar procesos automáticos de clasificación y monitoreo del cada vez mayor contenido transmitido por diferentes medios como televisión, radio e internet. En este artículo se propone una arquitectura para la extracción de información a partir de audio, con la finalidad de aplicarlo al análisis de contenidos televisivos en el contexto ecuatoriano. Para esto, se definen dos servicios, un servicio de segmentación de audio y un servicio de transcripción. El servicio de segmentación identifica y extrae los segmentos de audio que contienen narrativa, música, o narrativa sobre música. Mientras que, el servicio de transcripción hace un reconocimiento de los segmentos de tipo narrativa para obtener su contenido como texto. Estos servicios y las herramientas que los conforman han sido evaluados con el fin de medir su rendimiento y, en el caso de las herramientas usadas, definir cuál de estas es la que mejor se ajusta a la definición de la arquitectura. Los resultados de las evaluaciones realizadas sobre la arquitectura propuesta demuestran que la construcción de un sistema de reconocimiento de habla que haga uso de distintas herramientas de código abierto existentes ofrece un mayor nivel de precisión que un servicio de transcripción de disposición general.The interest of the scientific community in the identification of audiovisual content has grown considerably in recent years, due to the need to execute automatic classification and monitoring processes on the increasing content broadcasted by different media such as television, radio and internet. This article proposes an architecture for extracting information from audio, with the purpose of applying it to the analysis of television contents in the Ecuadorian context. For this, two services are defined, an audio segmentation service and a transcription service. The segmentation service identifies and extracts audio segments containing speech, music, or speech with musical background. Whereas, the transcription service recognizes the speech segments to obtain its content as text. These services and the tools that conform them have been evaluated in order to measure their performance and, in the case of the tools used, to define which of these is the one that best fits the definition of the architecture. The results of the evaluations carried out on the proposed architecture demonstrate that the construction of a speech recognition system, that makes use of different existing open source tools, offers a higher level of precision than a general availability transcription service.Ingeniero de SistemasCuenc

    Attitudes, behaviors, and learning outcomes from using classtranscribe, a UDL-featured video-based online learning platform with learnersourced text-searchable captions

    Get PDF
    This thesis consisted of a series of three studies on students' attitudes, behaviors, and learning outcomes from using ClassTranscribe, a Universal Design for Learning (UDL) featured video-based online learning platform. ClassTranscribe provided accurate accessible transcriptions and captioning plus a custom text-searchable interface to rapidly find relevant video moments from the entire course. Users could edit the machine-generated captions in a crowdsourcing way. The system logged student viewing, searching, and editing behaviors as fine-grained web browser interaction events including full-screen-switching, loss-of-focus, caption searching and editing events, and continued-video-watching events with the latter at 15-second granularity. In Study I, lecture material of a sophomore large-enrollment (N=271) system programming 15-week class in Spring 2019 was delivered solely online using a new video-based web platform - ClassTranscribe. Student learning behaviors and findings from four research questions were presented using individual-level performance and interaction data. Firstly, we reported on learning outcomes from alternative learning paths that arose from the course's application of Universal Design for Learning principles. Secondly, final exam performance was equal or better to prior semesters that utilized traditional in-person live lectures. Thirdly, learning outcomes of low and high performing students were analyzed independently by grouping students into four quartiles based on their non-final-exam course performance of programming assignments and quizzes. We introduced and justified an empirically-defined qualification threshold for sufficient video minutes viewed for each group. In all quartiles, students who watched an above-threshold of video minutes improved their in-group final exam performance (ranging from +6% to +14%) with the largest gain for the lowest-performing quartile. The improvement was similar in magnitude for all groups when expressed as a fraction of unrewarded final exam points. Finally, we found that using ClassTranscribe caption-based video search significantly predicted improvement in final exam scores. Overall, the study presented and evaluated how learner use of online video using ClassTranscribe predicted course performance and positive learning outcomes. In Study II, we further explored learner's searching behavior, which was shown to be correlated with improved final exam scores in the first study. From Fall 2019 to Summer 2020, engineering students used ClassTranscribe in engineering courses to view course videos and search for video content. The tool collected detailed timestamped student behavioral data from 1,894 students across 25 engineering courses that included what individual students searched for and when. As the first study showed that using ClassTranscribe caption search significantly predicted improvement in final exam scores in a computer science course, in this study, we presented how students used the search functionality based on a more detailed analysis of the log data. The search functionality of ClassTranscribe used the timestamped caption data to find specific video moments both within the current video or across the entire course. The number of search activities per person ranged from zero to 186 events. An in-depth analysis of the students (N=167) who performed 1,022 searches was conducted to gain insight into student search needs and behaviors. Based on the total number of searches performed, students were grouped into “Infrequent Searcher” (< 18 searches) and “Frequent Searcher” (18 to 110 searches) using clustering algorithms. The search queries used by each group were found to follow the Zipf’s Law and were categorized into STEM-related terms, course logistics and others. Our study reported on students’ search context, behaviors, strategies, and optimizations. Using Universal Design for Learning as a foundation, we discussed the implications for educators, designers, and developers who are interested in providing new learning pathways to support and enhance video-based learning environments. In Study III, we investigated students' attitudes towards learnersourced captioning for lecture videos. We deployed ClassTranscribe in a large (N=387) text retrieval and mining course where 58 learners participated in editing captions of 89 lecture videos, and each lecture video was edited by two editors sequentially. In the following semester, 18 editors participated in follow-up interviews to discuss their experience of using and editing captions in the class. Our study showed how students use captions to learn, and shed light on students' attitudes, motivations, and strategies in collaborating with other learners to fix captions in a learnersourced way
    corecore