5,290 research outputs found
Speech Sensorimotor Learning through a Virtual Vocal Tract
Studies of speech sensorimotor learning often manipulate auditory feedback by modifying isolated acoustic parameters such as formant frequency or fundamental frequency using near real-time resynthesis of a participant\u27s speech. An alternative approach is to engage a participant in a total remapping of the sensorimotor working space using a virtual vocal tract. To support this approach for studying speech sensorimotor learning we have developed a system to control an articulatory synthesizer using electromagnetic articulography data. Articulator movement data from the NDI Wave System are streamed to a Maeda articulatory synthesizer. The resulting synthesized speech provides auditory feedback to the participant. This approach allows the experimenter to generate novel articulatory-acoustic mappings. Moreover, the acoustic output of the synthesizer can be perturbed using acoustic resynthesis methods. Since no robust speech-acoustic signal is required from the participant, this system will allow for the study of sensorimotor learning in any individuals, even those with severe speech disorders. In the current work we present preliminary results that demonstrate that typically-functioning participants can use a virtual vocal tract to produce diphthongs within a novel articulatory-acoustic workspace. Once sufficient baseline performance is established, perturbations to auditory feedback (formant shifting) can elicit compensatory and adaptive articulatory responses
Analysis and Development of an End-to-End Convolutional Neural Network for Sounds Classification Through Deep Learning Techniques
El presente trabajo estudia el análisis y desarrollo continuo de un modelo de inteligencia artificial orientado a la clasificación de audio. El capítulo 1 presenta antecedentes sobre las diferentes tareas relacionadas a audio que la comunidad de investigación ha seguido a lo largo de los últimos años, también establece la hipótesis central de este trabajo y define objetivos generales y específicos para contribuir a la mejora del rendimiento sobre un generador de embeddings de audio de tipo end-to-end. El capítulo 2 presenta los métodos de vanguardia y trabajos publicados que se enfocan principalmente al desarrollo de la clasificación de audio y el aprendizaje profundo como disciplinas que aún tienen un gran potencial. El capítulo 3 presenta el marco conceptual en el que se basa esta tesis, dividido en dos secciones principales: preprocesamiento de audio y técnicas de aprendizaje profundo. Cada una de estas secciones se divide en varias subsecciones para representar el proceso de clasificación de audio a través de redes neuronales profundas. El capítulo 4 brinda una explicación profunda del generador de embeddings de audio llamado AemNet y sus componentes, utilizado como objeto de estudio, donde se detalla en las siguientes subsecciones. Se realizó una experimentación inicial sobre este enfoque y se presentaron resultados experimentales que sugirieron un mejor rendimiento mediante la modificación de las etapas de arquitectura de la red neuronal. El capítulo 5 es la primera aplicación objetivo de nuestra adaptación de AemNet que se presentó al desafío DCASE 2021. Los detalles sobre el desafío y los resultados se describen en las secciones de este capítulo, así como la metodología seguida para presentar nuestra propuesta. El capítulo 6 es la segunda aplicación objetivo y el primero en apuntar a los sonidos respiratorios. El desafío de ICBHI se explica en las secciones de este capítulo, así como la metodología y los experimentos realizados para llegar a un clasificador robusto que distingue cuatro anomalías de tos diferentes. Se creó un artículo a partir de la solución propuesta y se presentó en el IEEE LA-CCI 2021. El capítulo 7 aprovecha los diversos resultados anteriores para cumplir con un enfoque moderno como lo es la detección de COVID-19, cuya recopilación y experimentación de fuentes de datos se describen profundamente y los resultados experimentales sugieren que una adaptación de red residual denominada AemResNet, puede cumplir la función de distinguir a los pacientes con COVID-19 a partir de tos y sonidos respiratorios. Finalmente, las conclusiones de toda esta investigación y los resultados evaluados en cada una de las aplicaciones objetivo se discuten en el capítulo 8.ITESO, A. C
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems
Voice Processing Systems (VPSes), now widely deployed, have been made
significantly more accurate through the application of recent advances in
machine learning. However, adversarial machine learning has similarly advanced
and has been used to demonstrate that VPSes are vulnerable to the injection of
hidden commands - audio obscured by noise that is correctly recognized by a VPS
but not by human beings. Such attacks, though, are often highly dependent on
white-box knowledge of a specific machine learning model and limited to
specific microphones and speakers, making their use across different acoustic
hardware platforms (and thus their practicality) limited. In this paper, we
break these dependencies and make hidden command attacks more practical through
model-agnostic (blackbox) attacks, which exploit knowledge of the signal
processing algorithms commonly used by VPSes to generate the data fed into
machine learning systems. Specifically, we exploit the fact that multiple
source audio samples have similar feature vectors when transformed by acoustic
feature extraction algorithms (e.g., FFTs). We develop four classes of
perturbations that create unintelligible audio and test them against 12 machine
learning models, including 7 proprietary models (e.g., Google Speech API, Bing
Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful
attacks against all targets. Moreover, we successfully use our maliciously
generated audio samples in multiple hardware configurations, demonstrating
effectiveness across both models and real systems. In so doing, we demonstrate
that domain-specific knowledge of audio signal processing represents a
practical means of generating successful hidden voice command attacks
Vocal Experimentation in the Juvenile Songbird Requires a Basal Ganglia Circuit
Songbirds learn their songs by trial-and-error experimentation, producing highly variable vocal output as juveniles. By comparing their own sounds to the song of a tutor, young songbirds gradually converge to a stable song that can be a remarkably good copy of the tutor song. Here we show that vocal variability in the learning songbird is induced by a basal-ganglia-related circuit, the output of which projects to the motor pathway via the lateral magnocellular nucleus of the nidopallium (LMAN). We found that pharmacological inactivation of LMAN dramatically reduced acoustic and sequence variability in the songs of juvenile zebra finches, doing so in a rapid and reversible manner. In addition, recordings from LMAN neurons projecting to the motor pathway revealed highly variable spiking activity across song renditions, showing that LMAN may act as a source of variability. Lastly, pharmacological blockade of synaptic inputs from LMAN to its target premotor area also reduced song variability. Our results establish that, in the juvenile songbird, the exploratory motor behavior required to learn a complex motor sequence is dependent on a dedicated neural circuit homologous to cortico-basal ganglia circuits in mammals
Navigation system based in motion tracking sensor for percutaneous renal access
Tese de Doutoramento em Engenharia BiomédicaMinimally-invasive kidney interventions are daily performed to diagnose and treat several renal
diseases. Percutaneous renal access (PRA) is an essential but challenging stage for most of these
procedures, since its outcome is directly linked to the physician’s ability to precisely visualize and
reach the anatomical target.
Nowadays, PRA is always guided with medical imaging assistance, most frequently using X-ray
based imaging (e.g. fluoroscopy). Thus, radiation on the surgical theater represents a major risk to
the medical team, where its exclusion from PRA has a direct impact diminishing the dose exposure
on both patients and physicians.
To solve the referred problems this thesis aims to develop a new hardware/software framework
to intuitively and safely guide the surgeon during PRA planning and puncturing.
In terms of surgical planning, a set of methodologies were developed to increase the certainty of
reaching a specific target inside the kidney. The most relevant abdominal structures for PRA were
automatically clustered into different 3D volumes. For that, primitive volumes were merged as a local
optimization problem using the minimum description length principle and image statistical
properties. A multi-volume Ray Cast method was then used to highlight each segmented volume.
Results show that it is possible to detect all abdominal structures surrounding the kidney, with the
ability to correctly estimate a virtual trajectory.
Concerning the percutaneous puncturing stage, either an electromagnetic or optical solution
were developed and tested in multiple in vitro, in vivo and ex vivo trials. The optical tracking solution
aids in establishing the desired puncture site and choosing the best virtual puncture trajectory.
However, this system required a line of sight to different optical markers placed at the needle base,
limiting the accuracy when tracking inside the human body. Results show that the needle tip can
deflect from its initial straight line trajectory with an error higher than 3 mm. Moreover, a complex
registration procedure and initial setup is needed.
On the other hand, a real-time electromagnetic tracking was developed. Hereto, a catheter
was inserted trans-urethrally towards the renal target. This catheter has a position and orientation
electromagnetic sensor on its tip that function as a real-time target locator. Then, a needle integrating a similar sensor is used. From the data provided by both sensors, one computes a virtual puncture
trajectory, which is displayed in a 3D visualization software. In vivo tests showed a median renal and
ureteral puncture times of 19 and 51 seconds, respectively (range 14 to 45 and 45 to 67 seconds).
Such results represent a puncture time improvement between 75% and 85% when comparing to
state of the art methods.
3D sound and vibrotactile feedback were also developed to provide additional information about
the needle orientation. By using these kind of feedback, it was verified that the surgeon tends to
follow a virtual puncture trajectory with a reduced amount of deviations from the ideal trajectory,
being able to anticipate any movement even without looking to a monitor. Best results show that 3D
sound sources were correctly identified 79.2 ± 8.1% of times with an average angulation error of
10.4º degrees. Vibration sources were accurately identified 91.1 ± 3.6% of times with an average
angulation error of 8.0º degrees.
Additionally to the EMT framework, three circular ultrasound transducers were built with a needle
working channel. One explored different manufacture fabrication setups in terms of the piezoelectric
materials, transducer construction, single vs. multi array configurations, backing and matching
material design. The A-scan signals retrieved from each transducer were filtered and processed to
automatically detect reflected echoes and to alert the surgeon when undesirable anatomical
structures are in between the puncture path. The transducers were mapped in a water tank and
tested in a study involving 45 phantoms. Results showed that the beam cross-sectional area
oscillates around the ceramics radius and it was possible to automatically detect echo signals in
phantoms with length higher than 80 mm.
Hereupon, it is expected that the introduction of the proposed system on the PRA procedure,
will allow to guide the surgeon through the optimal path towards the precise kidney target, increasing
surgeon’s confidence and reducing complications (e.g. organ perforation) during PRA. Moreover, the
developed framework has the potential to make the PRA free of radiation for both patient and surgeon
and to broad the use of PRA to less specialized surgeons.Intervenções renais minimamente invasivas são realizadas diariamente para o tratamento e
diagnóstico de várias doenças renais. O acesso renal percutâneo (ARP) é uma etapa essencial e
desafiante na maior parte destes procedimentos. O seu resultado encontra-se diretamente
relacionado com a capacidade do cirurgião visualizar e atingir com precisão o alvo anatómico.
Hoje em dia, o ARP é sempre guiado com recurso a sistemas imagiológicos, na maior parte
das vezes baseados em raios-X (p.e. a fluoroscopia). A radiação destes sistemas nas salas cirúrgicas
representa um grande risco para a equipa médica, aonde a sua remoção levará a um impacto direto
na diminuição da dose exposta aos pacientes e cirurgiões.
De modo a resolver os problemas existentes, esta tese tem como objetivo o desenvolvimento
de uma framework de hardware/software que permita, de forma intuitiva e segura, guiar o cirurgião
durante o planeamento e punção do ARP.
Em termos de planeamento, foi desenvolvido um conjunto de metodologias de modo a
aumentar a eficácia com que o alvo anatómico é alcançado. As estruturas abdominais mais
relevantes para o procedimento de ARP, foram automaticamente agrupadas em volumes 3D, através
de um problema de optimização global com base no princípio de “minimum description length” e
propriedades estatísticas da imagem. Por fim, um procedimento de Ray Cast, com múltiplas funções
de transferência, foi utilizado para enfatizar as estruturas segmentadas. Os resultados mostram que
é possível detetar todas as estruturas abdominais envolventes ao rim, com a capacidade para
estimar corretamente uma trajetória virtual.
No que diz respeito à fase de punção percutânea, foram testadas duas soluções de deteção
de movimento (ótica e eletromagnética) em múltiplos ensaios in vitro, in vivo e ex vivo. A solução
baseada em sensores óticos ajudou no cálculo do melhor ponto de punção e na definição da melhor
trajetória a seguir. Contudo, este sistema necessita de uma linha de visão com diferentes
marcadores óticos acoplados à base da agulha, limitando a precisão com que a agulha é detetada
no interior do corpo humano. Os resultados indicam que a agulha pode sofrer deflexões à medida
que vai sendo inserida, com erros superiores a 3 mm.
Por outro lado, foi desenvolvida e testada uma solução com base em sensores
eletromagnéticos. Para tal, um cateter que integra um sensor de posição e orientação na sua ponta, foi colocado por via trans-uretral junto do alvo renal. De seguida, uma agulha, integrando um sensor
semelhante, é utilizada para a punção percutânea. A partir da diferença espacial de ambos os
sensores, é possível gerar uma trajetória de punção virtual. A mediana do tempo necessário para
puncionar o rim e ureter, segundo esta trajetória, foi de 19 e 51 segundos, respetivamente
(variações de 14 a 45 e 45 a 67 segundos). Estes resultados representam uma melhoria do tempo
de punção entre 75% e 85%, quando comparados com o estado da arte dos métodos atuais.
Além do feedback visual, som 3D e feedback vibratório foram explorados de modo a fornecer
informações complementares da posição da agulha. Verificou-se que com este tipo de feedback, o
cirurgião tende a seguir uma trajetória de punção com desvios mínimos, sendo igualmente capaz
de antecipar qualquer movimento, mesmo sem olhar para o monitor. Fontes de som e vibração
podem ser corretamente detetadas em 79,2 ± 8,1% e 91,1 ± 3,6%, com erros médios de angulação
de 10.4º e 8.0 graus, respetivamente.
Adicionalmente ao sistema de navegação, foram também produzidos três transdutores de
ultrassom circulares com um canal de trabalho para a agulha. Para tal, foram exploradas diferentes
configurações de fabricação em termos de materiais piezoelétricos, transdutores multi-array ou
singulares e espessura/material de layers de suporte. Os sinais originados em cada transdutor
foram filtrados e processados de modo a detetar de forma automática os ecos refletidos, e assim,
alertar o cirurgião quando existem variações anatómicas ao longo do caminho de punção. Os
transdutores foram mapeados num tanque de água e testados em 45 phantoms. Os resultados
mostraram que o feixe de área em corte transversal oscila em torno do raio de cerâmica, e que os
ecos refletidos são detetados em phantoms com comprimentos superiores a 80 mm.
Desta forma, é expectável que a introdução deste novo sistema a nível do ARP permitirá
conduzir o cirurgião ao longo do caminho de punção ideal, aumentado a confiança do cirurgião e
reduzindo possíveis complicações (p.e. a perfuração dos órgãos). Além disso, de realçar que este
sistema apresenta o potencial de tornar o ARP livre de radiação e alarga-lo a cirurgiões menos
especializados.The present work was only possible thanks to the support by the Portuguese Science and
Technology Foundation through the PhD grant with reference SFRH/BD/74276/2010 funded by
FCT/MEC (PIDDAC) and by Fundo Europeu de Desenvolvimento Regional (FEDER), Programa
COMPETE - Programa Operacional Factores de Competitividade (POFC) do QREN
Selective attention and the auditory vertex potential. 2: Effects of signal intensity and masking noise
A randomized sequence of tone bursts was delivered to subjects at short inter-stimulus intervals with the tones originating from one of three spatially and frequency specific channels. The subject's task was to count the tones in one of the three channels at a time, ignoring the other two, and press a button after each tenth tone. In different conditions, tones were given at high and low intensities and with or without a background white noise to mask the tones. The N sub 1 component of the auditory vertex potential was found to be larger in response to attended channel tones in relation to unattended tones. This selective enhancement of N sub 1 was minimal for loud tones presented without noise and increased markedly for the lower tone intensity and in noise added conditions
Use of baited remote underwater video (BRUV) and motion analysis for studying the impacts of underwater noise upon free ranging fish and implications for marine energy management
© 2016 Elsevier Ltd Free-ranging individual fish were observed using a baited remote underwater video (BRUV) system during sound playback experiments. This paper reports on test trials exploring BRUV design parameters, image analysis and practical experimental designs. Three marine species were exposed to playback noise, provided as examples of behavioural responses to impulsive sound at 163–171 dB re 1 μPa (peak-to-peak SPL) and continuous sound of 142.7 dB re 1 μPa (RMS, SPL), exhibiting directional changes and accelerations. The methods described here indicate the efficacy of BRUV to examine behaviour of free-ranging species to noise playback, rather than using confinement. Given the increasing concern about the effects of water-borne noise, for example its inclusion within the EU Marine Strategy Framework Directive, and the lack of empirical evidence in setting thresholds, this paper discusses the use of BRUV, and short term behavioural changes, in supporting population level marine noise management
- …