5,290 research outputs found

    Speech Sensorimotor Learning through a Virtual Vocal Tract

    Get PDF
    Studies of speech sensorimotor learning often manipulate auditory feedback by modifying isolated acoustic parameters such as formant frequency or fundamental frequency using near real-time resynthesis of a participant\u27s speech. An alternative approach is to engage a participant in a total remapping of the sensorimotor working space using a virtual vocal tract. To support this approach for studying speech sensorimotor learning we have developed a system to control an articulatory synthesizer using electromagnetic articulography data. Articulator movement data from the NDI Wave System are streamed to a Maeda articulatory synthesizer. The resulting synthesized speech provides auditory feedback to the participant. This approach allows the experimenter to generate novel articulatory-acoustic mappings. Moreover, the acoustic output of the synthesizer can be perturbed using acoustic resynthesis methods. Since no robust speech-acoustic signal is required from the participant, this system will allow for the study of sensorimotor learning in any individuals, even those with severe speech disorders. In the current work we present preliminary results that demonstrate that typically-functioning participants can use a virtual vocal tract to produce diphthongs within a novel articulatory-acoustic workspace. Once sufficient baseline performance is established, perturbations to auditory feedback (formant shifting) can elicit compensatory and adaptive articulatory responses

    Analysis and Development of an End-to-End Convolutional Neural Network for Sounds Classification Through Deep Learning Techniques

    Get PDF
    El presente trabajo estudia el análisis y desarrollo continuo de un modelo de inteligencia artificial orientado a la clasificación de audio. El capítulo 1 presenta antecedentes sobre las diferentes tareas relacionadas a audio que la comunidad de investigación ha seguido a lo largo de los últimos años, también establece la hipótesis central de este trabajo y define objetivos generales y específicos para contribuir a la mejora del rendimiento sobre un generador de embeddings de audio de tipo end-to-end. El capítulo 2 presenta los métodos de vanguardia y trabajos publicados que se enfocan principalmente al desarrollo de la clasificación de audio y el aprendizaje profundo como disciplinas que aún tienen un gran potencial. El capítulo 3 presenta el marco conceptual en el que se basa esta tesis, dividido en dos secciones principales: preprocesamiento de audio y técnicas de aprendizaje profundo. Cada una de estas secciones se divide en varias subsecciones para representar el proceso de clasificación de audio a través de redes neuronales profundas. El capítulo 4 brinda una explicación profunda del generador de embeddings de audio llamado AemNet y sus componentes, utilizado como objeto de estudio, donde se detalla en las siguientes subsecciones. Se realizó una experimentación inicial sobre este enfoque y se presentaron resultados experimentales que sugirieron un mejor rendimiento mediante la modificación de las etapas de arquitectura de la red neuronal. El capítulo 5 es la primera aplicación objetivo de nuestra adaptación de AemNet que se presentó al desafío DCASE 2021. Los detalles sobre el desafío y los resultados se describen en las secciones de este capítulo, así como la metodología seguida para presentar nuestra propuesta. El capítulo 6 es la segunda aplicación objetivo y el primero en apuntar a los sonidos respiratorios. El desafío de ICBHI se explica en las secciones de este capítulo, así como la metodología y los experimentos realizados para llegar a un clasificador robusto que distingue cuatro anomalías de tos diferentes. Se creó un artículo a partir de la solución propuesta y se presentó en el IEEE LA-CCI 2021. El capítulo 7 aprovecha los diversos resultados anteriores para cumplir con un enfoque moderno como lo es la detección de COVID-19, cuya recopilación y experimentación de fuentes de datos se describen profundamente y los resultados experimentales sugieren que una adaptación de red residual denominada AemResNet, puede cumplir la función de distinguir a los pacientes con COVID-19 a partir de tos y sonidos respiratorios. Finalmente, las conclusiones de toda esta investigación y los resultados evaluados en cada una de las aplicaciones objetivo se discuten en el capítulo 8.ITESO, A. C

    Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

    Full text link
    Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands - audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model and limited to specific microphones and speakers, making their use across different acoustic hardware platforms (and thus their practicality) limited. In this paper, we break these dependencies and make hidden command attacks more practical through model-agnostic (blackbox) attacks, which exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems. Specifically, we exploit the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms (e.g., FFTs). We develop four classes of perturbations that create unintelligible audio and test them against 12 machine learning models, including 7 proprietary models (e.g., Google Speech API, Bing Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful attacks against all targets. Moreover, we successfully use our maliciously generated audio samples in multiple hardware configurations, demonstrating effectiveness across both models and real systems. In so doing, we demonstrate that domain-specific knowledge of audio signal processing represents a practical means of generating successful hidden voice command attacks

    Vocal Experimentation in the Juvenile Songbird Requires a Basal Ganglia Circuit

    Get PDF
    Songbirds learn their songs by trial-and-error experimentation, producing highly variable vocal output as juveniles. By comparing their own sounds to the song of a tutor, young songbirds gradually converge to a stable song that can be a remarkably good copy of the tutor song. Here we show that vocal variability in the learning songbird is induced by a basal-ganglia-related circuit, the output of which projects to the motor pathway via the lateral magnocellular nucleus of the nidopallium (LMAN). We found that pharmacological inactivation of LMAN dramatically reduced acoustic and sequence variability in the songs of juvenile zebra finches, doing so in a rapid and reversible manner. In addition, recordings from LMAN neurons projecting to the motor pathway revealed highly variable spiking activity across song renditions, showing that LMAN may act as a source of variability. Lastly, pharmacological blockade of synaptic inputs from LMAN to its target premotor area also reduced song variability. Our results establish that, in the juvenile songbird, the exploratory motor behavior required to learn a complex motor sequence is dependent on a dedicated neural circuit homologous to cortico-basal ganglia circuits in mammals

    Navigation system based in motion tracking sensor for percutaneous renal access

    Get PDF
    Tese de Doutoramento em Engenharia BiomédicaMinimally-invasive kidney interventions are daily performed to diagnose and treat several renal diseases. Percutaneous renal access (PRA) is an essential but challenging stage for most of these procedures, since its outcome is directly linked to the physician’s ability to precisely visualize and reach the anatomical target. Nowadays, PRA is always guided with medical imaging assistance, most frequently using X-ray based imaging (e.g. fluoroscopy). Thus, radiation on the surgical theater represents a major risk to the medical team, where its exclusion from PRA has a direct impact diminishing the dose exposure on both patients and physicians. To solve the referred problems this thesis aims to develop a new hardware/software framework to intuitively and safely guide the surgeon during PRA planning and puncturing. In terms of surgical planning, a set of methodologies were developed to increase the certainty of reaching a specific target inside the kidney. The most relevant abdominal structures for PRA were automatically clustered into different 3D volumes. For that, primitive volumes were merged as a local optimization problem using the minimum description length principle and image statistical properties. A multi-volume Ray Cast method was then used to highlight each segmented volume. Results show that it is possible to detect all abdominal structures surrounding the kidney, with the ability to correctly estimate a virtual trajectory. Concerning the percutaneous puncturing stage, either an electromagnetic or optical solution were developed and tested in multiple in vitro, in vivo and ex vivo trials. The optical tracking solution aids in establishing the desired puncture site and choosing the best virtual puncture trajectory. However, this system required a line of sight to different optical markers placed at the needle base, limiting the accuracy when tracking inside the human body. Results show that the needle tip can deflect from its initial straight line trajectory with an error higher than 3 mm. Moreover, a complex registration procedure and initial setup is needed. On the other hand, a real-time electromagnetic tracking was developed. Hereto, a catheter was inserted trans-urethrally towards the renal target. This catheter has a position and orientation electromagnetic sensor on its tip that function as a real-time target locator. Then, a needle integrating a similar sensor is used. From the data provided by both sensors, one computes a virtual puncture trajectory, which is displayed in a 3D visualization software. In vivo tests showed a median renal and ureteral puncture times of 19 and 51 seconds, respectively (range 14 to 45 and 45 to 67 seconds). Such results represent a puncture time improvement between 75% and 85% when comparing to state of the art methods. 3D sound and vibrotactile feedback were also developed to provide additional information about the needle orientation. By using these kind of feedback, it was verified that the surgeon tends to follow a virtual puncture trajectory with a reduced amount of deviations from the ideal trajectory, being able to anticipate any movement even without looking to a monitor. Best results show that 3D sound sources were correctly identified 79.2 ± 8.1% of times with an average angulation error of 10.4º degrees. Vibration sources were accurately identified 91.1 ± 3.6% of times with an average angulation error of 8.0º degrees. Additionally to the EMT framework, three circular ultrasound transducers were built with a needle working channel. One explored different manufacture fabrication setups in terms of the piezoelectric materials, transducer construction, single vs. multi array configurations, backing and matching material design. The A-scan signals retrieved from each transducer were filtered and processed to automatically detect reflected echoes and to alert the surgeon when undesirable anatomical structures are in between the puncture path. The transducers were mapped in a water tank and tested in a study involving 45 phantoms. Results showed that the beam cross-sectional area oscillates around the ceramics radius and it was possible to automatically detect echo signals in phantoms with length higher than 80 mm. Hereupon, it is expected that the introduction of the proposed system on the PRA procedure, will allow to guide the surgeon through the optimal path towards the precise kidney target, increasing surgeon’s confidence and reducing complications (e.g. organ perforation) during PRA. Moreover, the developed framework has the potential to make the PRA free of radiation for both patient and surgeon and to broad the use of PRA to less specialized surgeons.Intervenções renais minimamente invasivas são realizadas diariamente para o tratamento e diagnóstico de várias doenças renais. O acesso renal percutâneo (ARP) é uma etapa essencial e desafiante na maior parte destes procedimentos. O seu resultado encontra-se diretamente relacionado com a capacidade do cirurgião visualizar e atingir com precisão o alvo anatómico. Hoje em dia, o ARP é sempre guiado com recurso a sistemas imagiológicos, na maior parte das vezes baseados em raios-X (p.e. a fluoroscopia). A radiação destes sistemas nas salas cirúrgicas representa um grande risco para a equipa médica, aonde a sua remoção levará a um impacto direto na diminuição da dose exposta aos pacientes e cirurgiões. De modo a resolver os problemas existentes, esta tese tem como objetivo o desenvolvimento de uma framework de hardware/software que permita, de forma intuitiva e segura, guiar o cirurgião durante o planeamento e punção do ARP. Em termos de planeamento, foi desenvolvido um conjunto de metodologias de modo a aumentar a eficácia com que o alvo anatómico é alcançado. As estruturas abdominais mais relevantes para o procedimento de ARP, foram automaticamente agrupadas em volumes 3D, através de um problema de optimização global com base no princípio de “minimum description length” e propriedades estatísticas da imagem. Por fim, um procedimento de Ray Cast, com múltiplas funções de transferência, foi utilizado para enfatizar as estruturas segmentadas. Os resultados mostram que é possível detetar todas as estruturas abdominais envolventes ao rim, com a capacidade para estimar corretamente uma trajetória virtual. No que diz respeito à fase de punção percutânea, foram testadas duas soluções de deteção de movimento (ótica e eletromagnética) em múltiplos ensaios in vitro, in vivo e ex vivo. A solução baseada em sensores óticos ajudou no cálculo do melhor ponto de punção e na definição da melhor trajetória a seguir. Contudo, este sistema necessita de uma linha de visão com diferentes marcadores óticos acoplados à base da agulha, limitando a precisão com que a agulha é detetada no interior do corpo humano. Os resultados indicam que a agulha pode sofrer deflexões à medida que vai sendo inserida, com erros superiores a 3 mm. Por outro lado, foi desenvolvida e testada uma solução com base em sensores eletromagnéticos. Para tal, um cateter que integra um sensor de posição e orientação na sua ponta, foi colocado por via trans-uretral junto do alvo renal. De seguida, uma agulha, integrando um sensor semelhante, é utilizada para a punção percutânea. A partir da diferença espacial de ambos os sensores, é possível gerar uma trajetória de punção virtual. A mediana do tempo necessário para puncionar o rim e ureter, segundo esta trajetória, foi de 19 e 51 segundos, respetivamente (variações de 14 a 45 e 45 a 67 segundos). Estes resultados representam uma melhoria do tempo de punção entre 75% e 85%, quando comparados com o estado da arte dos métodos atuais. Além do feedback visual, som 3D e feedback vibratório foram explorados de modo a fornecer informações complementares da posição da agulha. Verificou-se que com este tipo de feedback, o cirurgião tende a seguir uma trajetória de punção com desvios mínimos, sendo igualmente capaz de antecipar qualquer movimento, mesmo sem olhar para o monitor. Fontes de som e vibração podem ser corretamente detetadas em 79,2 ± 8,1% e 91,1 ± 3,6%, com erros médios de angulação de 10.4º e 8.0 graus, respetivamente. Adicionalmente ao sistema de navegação, foram também produzidos três transdutores de ultrassom circulares com um canal de trabalho para a agulha. Para tal, foram exploradas diferentes configurações de fabricação em termos de materiais piezoelétricos, transdutores multi-array ou singulares e espessura/material de layers de suporte. Os sinais originados em cada transdutor foram filtrados e processados de modo a detetar de forma automática os ecos refletidos, e assim, alertar o cirurgião quando existem variações anatómicas ao longo do caminho de punção. Os transdutores foram mapeados num tanque de água e testados em 45 phantoms. Os resultados mostraram que o feixe de área em corte transversal oscila em torno do raio de cerâmica, e que os ecos refletidos são detetados em phantoms com comprimentos superiores a 80 mm. Desta forma, é expectável que a introdução deste novo sistema a nível do ARP permitirá conduzir o cirurgião ao longo do caminho de punção ideal, aumentado a confiança do cirurgião e reduzindo possíveis complicações (p.e. a perfuração dos órgãos). Além disso, de realçar que este sistema apresenta o potencial de tornar o ARP livre de radiação e alarga-lo a cirurgiões menos especializados.The present work was only possible thanks to the support by the Portuguese Science and Technology Foundation through the PhD grant with reference SFRH/BD/74276/2010 funded by FCT/MEC (PIDDAC) and by Fundo Europeu de Desenvolvimento Regional (FEDER), Programa COMPETE - Programa Operacional Factores de Competitividade (POFC) do QREN

    Selective attention and the auditory vertex potential. 2: Effects of signal intensity and masking noise

    Get PDF
    A randomized sequence of tone bursts was delivered to subjects at short inter-stimulus intervals with the tones originating from one of three spatially and frequency specific channels. The subject's task was to count the tones in one of the three channels at a time, ignoring the other two, and press a button after each tenth tone. In different conditions, tones were given at high and low intensities and with or without a background white noise to mask the tones. The N sub 1 component of the auditory vertex potential was found to be larger in response to attended channel tones in relation to unattended tones. This selective enhancement of N sub 1 was minimal for loud tones presented without noise and increased markedly for the lower tone intensity and in noise added conditions

    Use of baited remote underwater video (BRUV) and motion analysis for studying the impacts of underwater noise upon free ranging fish and implications for marine energy management

    Get PDF
    © 2016 Elsevier Ltd Free-ranging individual fish were observed using a baited remote underwater video (BRUV) system during sound playback experiments. This paper reports on test trials exploring BRUV design parameters, image analysis and practical experimental designs. Three marine species were exposed to playback noise, provided as examples of behavioural responses to impulsive sound at 163–171 dB re 1 μPa (peak-to-peak SPL) and continuous sound of 142.7 dB re 1 μPa (RMS, SPL), exhibiting directional changes and accelerations. The methods described here indicate the efficacy of BRUV to examine behaviour of free-ranging species to noise playback, rather than using confinement. Given the increasing concern about the effects of water-borne noise, for example its inclusion within the EU Marine Strategy Framework Directive, and the lack of empirical evidence in setting thresholds, this paper discusses the use of BRUV, and short term behavioural changes, in supporting population level marine noise management
    corecore