49 research outputs found

    Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video

    Full text link
    Audio-visual automatic speech recognition (AV-ASR) extends speech recognition by introducing the video modality as an additional source of information. In this work, the information contained in the motion of the speaker's mouth is used to augment the audio features. The video modality is traditionally processed with a 3D convolutional neural network (e.g. 3D version of VGG). Recently, image transformer networks arXiv:2010.11929 demonstrated the ability to extract rich visual features for image classification tasks. Here, we propose to replace the 3D convolution with a video transformer to extract visual features. We train our baselines and the proposed model on a large scale corpus of YouTube videos. The performance of our approach is evaluated on a labeled subset of YouTube videos as well as on the LRS3-TED public corpus. Our best video-only model obtains 31.4% WER on YTDEV18 and 17.0% on LRS3-TED, a 10% and 15% relative improvements over our convolutional baseline. We achieve the state of the art performance of the audio-visual recognition on the LRS3-TED after fine-tuning our model (1.6% WER). In addition, in a series of experiments on multi-person AV-ASR, we obtained an average relative reduction of 2% over our convolutional video frontend.Comment: 5 pages, 3 figures, published at Interspeech 202

    On Robustness to Missing Video for Audiovisual Speech Recognition

    Full text link
    It has been shown that learning audiovisual features can lead to improved speech recognition performance over audio-only features, especially for noisy speech. However, in many common applications, the visual features are partially or entirely missing, e.g.~the speaker might move off screen. Multi-modal models need to be robust: missing video frames should not degrade the performance of an audiovisual model to be worse than that of a single-modality audio-only model. While there have been many attempts at building robust models, there is little consensus on how robustness should be evaluated. To address this, we introduce a framework that allows claims about robustness to be evaluated in a precise and testable way. We also conduct a systematic empirical study of the robustness of common audiovisual speech recognition architectures on a range of acoustic noise conditions and test suites. Finally, we show that an architecture-agnostic solution based on cascades can consistently achieve robustness to missing video, even in settings where existing techniques for robustness like dropout fall short

    Audio-visual fine-tuning of audio-only ASR models

    Full text link
    Audio-visual automatic speech recognition (AV-ASR) models are very effective at reducing word error rates on noisy speech, but require large amounts of transcribed AV training data. Recently, audio-visual self-supervised learning (SSL) approaches have been developed to reduce this dependence on transcribed AV data, but these methods are quite complex and computationally expensive. In this work, we propose replacing these expensive AV-SSL methods with a simple and fast \textit{audio-only} SSL method, and then performing AV supervised fine-tuning. We show that this approach is competitive with state-of-the-art (SOTA) AV-SSL methods on the LRS3-TED benchmark task (within 0.5% absolute WER), while being dramatically simpler and more efficient (12-30x faster to pre-train). Furthermore, we show we can extend this approach to convert a SOTA audio-only ASR model into an AV model. By doing so, we match SOTA AV-SSL results, even though no AV data was used during pre-training

    Funções excecutivas e aprendizagem de Matemática: uma revisão de literatura

    Get PDF
    The article shows initial results of a review of literature about Executive Functions and Mathematics Learning. The objectives of this review are to know studies which deals between Executive Functions and Mathematics Learning and to analyze your contributions to the apprenticeship of this discipline. In this regard, we opted for information search strategy based on two thematic axes, “Executive Function and Mathematics Learning”, inserted in the Brazilian database CAPES Periodicals. The search process resulted in 260 studies, analyzed and related with each other according to the literature review method. Of these, 31 studies were selected in accordance with the research investigation criteria, being analyzed, in this article, 04 studies published in Portuguese and 02 studies published in Spanish. The results show, among other things, that the Executive Functions contribute to the learning performance of Mathematics.O artigo apresenta resultados iniciais de uma revisão de literatura sobre Funções Executivas e Aprendizagem de Matemática. São objetivos desta revisão: i) Conhecer os estudos que tratam da relação entre Funções Executivas e Aprendizagem de Matemática, ii) Analisar suas contribuições para a aprendizagem desta disciplina. Para tal, optou-se pela estratégia de busca de informações baseada em dois eixos temáticos, a saber, “Executive Function and Mathematics Learning” e “Funções Executivas e Aprendizagem Matemática”, inseridos na base de dados brasileira Periódicos CAPES. O processo de busca resultou em 260 estudos, analisados e relacionados entre si conforme o método de revisão da literatura. Destes, foram selecionados 31 estudos de acordo com os critérios de investigação da pesquisa, sendo analisados, neste artigo, 04 estudos publicados em português e 02 em espanhol. Os resultados obtidos mostraram, entre outros aspectos, que as Funções Executivas contribuem para o desempenho da aprendizagem de Matemática

    Technical efficiency and farm size: an analysis based on the Brazilian agriculture and livestock census

    Get PDF
    This paper analyzes the relationship between technical efficiency and farm size, considering different classes of area and efficiency levels in Brazil. Stochastic Frontier Production was used to obtain the technical efficiency and the Quantile Regression was used to identify their determinants. Microdata from the 2006 Brazilian Agriculture Census were used. It was found a positive and non-linear relationship between farm size and efficiency in all area classes. However, the more efficient the producers, the weaker the relationship, which indicates that such producers were less dependent on the land factor. In addition, irrigation, technical assistance and cooperatives membership were the factors which contributed most to increasing efficiency, especially for the less efficient producers

    CRON-1 The First Brazilian Private Cubesat

    Get PDF
    Brazil has launched a few cubesats so far. Both through universities as well as through space research institutes and its Space Agency. There is a growing interest in the country for this type of satellite due to its low and feasible costs for these institutions, as well for the increasing number of possibilities with its use. The advantages of its use for science and educational purposes is not questioned any more in a changing scenario, as was the case in the world in general. However, so far, all these missions were developed with government funds. The challenge now is to transfer this technology and application to the private sector. The mission here described is the first in the country developed by a private company in cooperation with the public R&D space sector for the payload. In the process it also creates a production chain with other companies for the development of part of its subsystems and software. A few of them (HorusEye, USIPED) are new in the space field although with large experience in other micro electronics and precision mechanics applications. These subsystems are the attitude determination and control, the EPS and the structure. All of these with advantages when compared with similar subsystems available in the international cubesat market. Software is also developed by a small company from former INPE graduate students (EMSISTI). The OBC and the transceiver will still have to be imported due to the larger development costs, and the limited budget for the project. The scientific payload of the mission is an experiment for the detection of hard X-ray and gamma ray radiation in space, possibly from cosmic explosions such as Gamma-Ray Bursts (GRBs). This experiment was initially conceived for a larger bus but it has never materialized due to its costs. The number of detectors in the payload array was significantly reduced but it will still produce significant results for the mission PI. One exciting possibility is the detection of electromagnetic counterparts of gravitational wave signals detected by the LIGO/Virgo consortium. This was not known when the larger bus was being considered for this mission. The cubesat is a 2U with 1U fully for the payload. CRON-1 was officially submitted to be launched in 2021 by the first launch of VLM (Microsatellite Launch Vehicle), the small launcher under development by the Brazilian Air Force, Brazilian industries and DLR (German Aerospace Center). However it will be ready to be launched by the end of 2020 and another earlier launch alternative may be selected if it can´t be launched by VLM. The project was selected to be funded by the São Paulo State Foundation for R&D (FAPESP) in a call from its Innovation Program for the Small Company (PIPE) for the development of the engineering model so far. The paper gives more information and details about the payload and the science motivation for the mission as well as for the subsystems developed for CRON-1

    Diretriz sobre Diagnóstico e Tratamento da Cardiomiopatia Hipertrófica – 2024

    Get PDF
    Hypertrophic cardiomyopathy (HCM) is a form of genetically caused heart muscle disease, characterized by the thickening of the ventricular walls. Diagnosis requires detection through imaging methods (Echocardiogram or Cardiac Magnetic Resonance) showing any segment of the left ventricular wall with a thickness > 15 mm, without any other probable cause. Genetic analysis allows the identification of mutations in genes encoding different structures of the sarcomere responsible for the development of HCM in about 60% of cases, enabling screening of family members and genetic counseling, as an important part of patient and family management. Several concepts about HCM have recently been reviewed, including its prevalence of 1 in 250 individuals, hence not a rare but rather underdiagnosed disease. The vast majority of patients are asymptomatic. In symptomatic cases, obstruction of the left ventricular outflow tract (LVOT) is the primary disorder responsible for symptoms, and its presence should be investigated in all cases. In those where resting echocardiogram or Valsalva maneuver does not detect significant intraventricular gradient (> 30 mmHg), they should undergo stress echocardiography to detect LVOT obstruction. Patients with limiting symptoms and severe LVOT obstruction, refractory to beta-blockers and verapamil, should receive septal reduction therapies or use new drugs inhibiting cardiac myosin. Finally, appropriately identified patients at increased risk of sudden death may receive prophylactic measure with implantable cardioverter-defibrillator (ICD) implantation.La miocardiopatía hipertrófica (MCH) es una forma de enfermedad cardíaca de origen genético, caracterizada por el engrosamiento de las paredes ventriculares. El diagnóstico requiere la detección mediante métodos de imagen (Ecocardiograma o Resonancia Magnética Cardíaca) que muestren algún segmento de la pared ventricular izquierda con un grosor > 15 mm, sin otra causa probable. El análisis genético permite identificar mutaciones en genes que codifican diferentes estructuras del sarcómero responsables del desarrollo de la MCH en aproximadamente el 60% de los casos, lo que permite el tamizaje de familiares y el asesoramiento genético, como parte importante del manejo de pacientes y familiares. Varios conceptos sobre la MCH han sido revisados recientemente, incluida su prevalencia de 1 entre 250 individuos, por lo tanto, no es una enfermedad rara, sino subdiagnosticada. La gran mayoría de los pacientes son asintomáticos. En los casos sintomáticos, la obstrucción del tracto de salida ventricular izquierdo (TSVI) es el trastorno principal responsable de los síntomas, y su presencia debe investigarse en todos los casos. En aquellos en los que el ecocardiograma en reposo o la maniobra de Valsalva no detecta un gradiente intraventricular significativo (> 30 mmHg), deben someterse a ecocardiografía de esfuerzo para detectar la obstrucción del TSVI. Los pacientes con síntomas limitantes y obstrucción grave del TSVI, refractarios al uso de betabloqueantes y verapamilo, deben recibir terapias de reducción septal o usar nuevos medicamentos inhibidores de la miosina cardíaca. Finalmente, los pacientes adecuadamente identificados con un riesgo aumentado de muerte súbita pueden recibir medidas profilácticas con el implante de un cardioversor-desfibrilador implantable (CDI).A cardiomiopatia hipertrófica (CMH) é uma forma de doença do músculo cardíaco de causa genética, caracterizada pela hipertrofia das paredes ventriculares. O diagnóstico requer detecção por métodos de imagem (Ecocardiograma ou Ressonância Magnética Cardíaca) de qualquer segmento da parede do ventrículo esquerdo com espessura > 15 mm, sem outra causa provável. A análise genética permite identificar mutações de genes codificantes de diferentes estruturas do sarcômero responsáveis pelo desenvolvimento da CMH em cerca de 60% dos casos, permitindo o rastreio de familiares e aconselhamento genético, como parte importante do manejo dos pacientes e familiares. Vários conceitos sobre a CMH foram recentemente revistos, incluindo sua prevalência de 1 em 250 indivíduos, não sendo, portanto, uma doença rara, mas subdiagnosticada. A vasta maioria dos pacientes é assintomática. Naqueles sintomáticos, a obstrução do trato de saída do ventrículo esquerdo (OTSVE) é o principal distúrbio responsável pelos sintomas, devendo-se investigar a sua presença em todos os casos. Naqueles em que o ecocardiograma em repouso ou com Manobra de Valsalva não detecta gradiente intraventricular significativo (> 30 mmHg), devem ser submetidos à ecocardiografia com esforço físico para detecção da OTSVE.   Pacientes com sintomas limitantes e grave OTSVE, refratários ao uso de betabloqueadores e verapamil, devem receber terapias de redução septal ou uso de novas drogas inibidoras da miosina cardíaca. Por fim, os pacientes adequadamente identificados com risco aumentado de morta súbita podem receber medida profilática com implante de cardiodesfibrilador implantável (CDI)
    corecore