611 research outputs found

    Deep Question Answering: A New Teacher For DistilBERT

    Get PDF
    Durante questo lavoro di tesi sono stati sperimentati i benefici ottenibili dalle modifiche al solo livello di question answering del modello BERT e di un suo derivato, DistilBERT. Gli esperimenti sono stati condotti su due dataset differenti, SQuAD 2 e OLP (dataset sperimentale che ha richiesto un lungo lavoro di pre-processing perché fosse compatibile con il formato di SQuAD). Il preprocessing del dataset ha portato ad ottenere, appunto, la stessa struttura di SQuAD ed ha permesso di risparmiare tempo perché si è riusciti ad utilizzare lo stesso script di parsing utilizzato per SQuAD. L'idea di utilizzare una struttura a 4 livelli con uno step di skip per il livello di QA invece deriva da un lavoro di ricerca e testando diversi approcci fino a concludere che, quello utilizzato in questa tesi, permetteva di ottenere risultati migliori rispetto agli altri

    Energy compensation and received echo level dynamics in constant-frequency bats during active target approaches

    Get PDF
    This work was supported by the Semper Arden Carlsberg grant to P.T.M., by a National Science Foundation grant [1658620] to R.M. and by a National Natural Science Foundation of China [11574183] to R.M.Bats have been reported to adjust the energy of their outgoing vocalizations to target range (R) in a logarithmic fashion close to 20log10R which has been interpreted as providing one-way compensation for increasing echo levels during target approaches. However, it remains unknown how species using high-frequency calls, which are strongly affected by absorption, adjust their vocal outputs during approaches to point targets. We hypothesized that such species should compensate less than the 20log10R model predicts at longer distances and more at shorter distances as a consequence of the significant influence of absorption at longer ranges. Using a microphone array and an acoustic recording tag, we show that the output adjustments of two Hipposideros pratti and one Hipposideros armiger do not decrease logarithmically during approaches to different-sized targets. Consequently, received echo levels increase dramatically early in the approach phase with near-constant output levels, but level off late in the approach phase as a result of substantial output reductions. To improve echo-to-noise ratio, we suggest that bats using higher frequency vocalizations compensate less at longer ranges, where they are strongly affected by absorption. Close to the target, they decrease their output levels dramatically to mitigate reception of very high echo levels. This strategy maintains received echo levels between 6 and 40 dB re. 20 µPa2 s across different target sizes. The bats partially compensated for target size, but not in a one-to-one dB fashion, showing that these bats do not seek to stabilize perceived echo levels, but may instead use them to gauge target size.Publisher PDFPeer reviewe

    Modeling auditory evoked potentials to complex stimuli

    Get PDF

    Efficient audio signal processing for embedded systems

    Get PDF
    We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.PhDCommittee Chair: Anderson, David; Committee Member: Hasler, Jennifer; Committee Member: Hunt, William; Committee Member: Lanterman, Aaron; Committee Member: Minch, Bradle

    Temporal integration in cochlear implants and the effect of high pulse rates

    Get PDF
    Although cochlear implants (CIs) have proven to be an invaluable help for many people afflicted with severe hearing loss, there are still many hurdles left before a full restoration of hearing. A better understanding of how individual stimuli in a pulse train interact temporally to form a conjoined percept, and what effects the stimulation rate has on the percept of loudness will be beneficial for further improvements in the development of new coding strategies and thus in the quality of life of CI-wearers. Two experiments presented here deal on the topic of temporal integration with CIs, and raise the question of the effects of the high stimulation rates made possible by the broad spread of stimulation. To this effect, curves of equal loudness were measured as a function of pulse train length for different stimulation characteristics. In the first exploratory experiment, threshold and maximum acceptable loudness (MAL) were measured, and the existence and behaviour of the critical duration of integration in cochlear implants is discussed. In the second experiment, the effect of level was further investigated by including MAL measurements at shorter durations, as well as a line of equal loudness at a comfortable level. It is found that the amount of temporal integration (the slope of integration as a function of duration) is greatly decreased in electrical hearing compared to acoustic hearing. The higher stimulation rates seem to have a compensating effect on this, increasing the slope with increasing rate. The highest rates investigated here lead to slopes that are even comparable to those found in persons with normal hearing and hearing impaired. The rate also has an increasing effect on the dynamic range, which is otherwise taken to be a correlate of good performance. The values presented here point towards larger effects of rate on dynamic range than what has been found so far in the literature for more moderate ranges. While rate effects on threshold, dynamic range and integration slope seem to act uniformly for the different test subjects, the critical duration of integration varies strongly but in a non-consistent way, possibly reflecting more central, individual-specific effects. Additionally, measurements on the voltage spread of human CI-wearers are presented which are used to validate a 3D computational model of the human cochlea developed in our group. The theoretical model falls squarely inside of the distribution of measurements. A single, implant dependent voltage-offset seems to adequately explain most of the variability

    Advanced deep neural networks for speech separation and enhancement

    Get PDF
    Ph. D. Thesis.Monaural speech separation and enhancement aim to remove noise interference from the noisy speech mixture recorded by a single microphone, which causes a lack of spatial information. Deep neural network (DNN) dominates speech separation and enhancement. However, there are still challenges in DNN-based methods, including choosing proper training targets and network structures, refining generalization ability and model capacity for unseen speakers and noises, and mitigating the reverberations in room environments. This thesis focuses on improving separation and enhancement performance in the real-world environment. The first contribution in this thesis is to address monaural speech separation and enhancement within reverberant room environment by designing new training targets and advanced network structures. The second contribution to this thesis is on improving the enhancement performance by proposing a multi-scale feature recalibration convolutional bidirectional gate recurrent unit (GRU) network (MCGN). The third contribution is to improve the model capacity of the network and retain the robustness in the enhancement performance. A convolutional fusion network (CFN) is proposed, which exploits the group convolutional fusion unit (GCFU). The proposed speech enhancement methods are evaluated with various challenging datasets. The proposed methods are assessed with the stateof-the-art techniques and performance measures to confirm that this thesis contributes novel solution

    Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey

    Full text link
    Adversarial attacks and defenses in machine learning and deep neural network have been gaining significant attention due to the rapidly growing applications of deep learning in the Internet and relevant scenarios. This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques, with a focus on deep neural network-based classification models. Specifically, we conduct a comprehensive classification of recent adversarial attack methods and state-of-the-art adversarial defense techniques based on attack principles, and present them in visually appealing tables and tree diagrams. This is based on a rigorous evaluation of the existing works, including an analysis of their strengths and limitations. We also categorize the methods into counter-attack detection and robustness enhancement, with a specific focus on regularization-based methods for enhancing robustness. New avenues of attack are also explored, including search-based, decision-based, drop-based, and physical-world attacks, and a hierarchical classification of the latest defense methods is provided, highlighting the challenges of balancing training costs with performance, maintaining clean accuracy, overcoming the effect of gradient masking, and ensuring method transferability. At last, the lessons learned and open challenges are summarized with future research opportunities recommended.Comment: 46 pages, 21 figure

    Standardized spectral and radiometric calibration of consumer cameras

    Get PDF
    Consumer cameras, particularly onboard smartphones and UAVs, are now commonly used as scientific instruments. However, their data processing pipelines are not optimized for quantitative radiometry and their calibration is more complex than that of scientific cameras. The lack of a standardized calibration methodology limits the interoperability between devices and, in the ever-changing market, ultimately the lifespan of projects using them. We present a standardized methodology and database (SPECTACLE) for spectral and radiometric calibrations of consumer cameras, including linearity, bias variations, read-out noise, dark current, ISO speed and gain, flat-field, and RGB spectral response. This includes golden standard ground-truth methods and do-it-yourself methods suitable for non-experts. Applying this methodology to seven popular cameras, we found high linearity in RAW but not JPEG data, inter-pixel gain variations >400% correlated with large-scale bias and read-out noise patterns, non-trivial ISO speed normalization functions, flat-field correction factors varying by up to 2.79 over the field of view, and both similarities and differences in spectral response. Moreover, these results differed wildly between camera models, highlighting the importance of standardization and a centralized database

    Speech enhancement algorithms for audiological applications

    Get PDF
    Texto en inglés y resumen en inglés y españolPremio Extraordinario de Doctorado de la UAH en el año académico 2013-2014La mejora de la calidad de la voz es un problema que, aunque ha sido abordado durante muchos años, aún sigue abierto. El creciente auge de aplicaciones tales como los sistemas manos libres o de reconocimiento de voz automático y las cada vez mayores exigencias de las personas con pérdidas auditivas han dado un impulso definitivo a este área de investigación. Esta tesis doctoral se centra en la mejora de la calidad de la voz en aplicaciones audiológicas. La mayoría del trabajo de investigación desarrollado en esta tesis está dirigido a la mejora de la inteligibilidad de la voz en audífonos digitales, teniendo en cuenta las limitaciones de este tipo de dispositivos. La combinación de técnicas de separación de fuentes y filtrado espacial con técnicas de aprendizaje automático y computación evolutiva ha originado novedosos e interesantes algoritmos que son incluidos en esta tesis. La tesis esta dividida en dos grandes bloques. El primer bloque contiene un estudio preliminar del problema y una exhaustiva revisión del estudio del arte sobre algoritmos de mejora de la calidad de la voz, que sirve para definir los objetivos de esta tesis. El segundo bloque contiene la descripción del trabajo de investigación realizado para cumplir los objetivos de la tesis, así como los experimentos y resultados obtenidos. En primer lugar, el problema de mejora de la calidad de la voz es descrito formalmente en el dominio tiempo-frecuencia. Los principales requerimientos y restricciones de los audífonos digitales son definidas. Tras describir el problema, una amplia revisión del estudio del arte ha sido elaborada. La revisión incluye algoritmos de mejora de la calidad de la voz mono-canal y multi-canal, considerando técnicas de reducción de ruido y técnicas de separación de fuentes. Además, la aplicación de estos algoritmos en audífonos digitales es evaluada. El primer problema abordado en la tesis es la separación de fuentes sonoras en mezclas infra-determinadas en el dominio tiempo-frecuencia, sin considerar ningún tipo de restricción computacional. El rendimiento del famoso algoritmo DUET, que consigue separar fuentes de voz con solo dos mezclas, ha sido evaluado en diversos escenarios, incluyendo mezclas lineales y binaurales no reverberantes, mezclas reverberantes, y mezclas de voz con otro tipo de fuentes tales como ruido y música. El estudio revela la falta de robustez del algoritmo DUET, cuyo rendimiento se ve seriamente disminuido en mezclas reverberantes, mezclas binaurales, y mezclas de voz con música y ruido. Con el objetivo de mejorar el rendimiento en estos casos, se presenta un novedoso algoritmo de separación de fuentes que combina la técnica de clustering mean shift con la base del algoritmo DUET. La etapa de clustering del algoritmo DUET, que esta basada en un histograma ponderado, es reemplazada por una modificación del algoritmo mean shift, introduciendo el uso de un kernel Gaussiano ponderado. El análisis de los resultados obtenidos muestran una clara mejora obtenida por el algoritmo propuesto en relación con el algoritmo DUET original y una modificación que usa k-means. Además, el algoritmo propuesto ha sido extendido para usar un array de micrófonos de cualquier tamaño y geometría. A continuación se ha abordado el problema de la enumeración de fuentes de voz, que esta relacionado con el problema de separación de fuentes. Se ha propuesto un novedoso algoritmo basado en un criterio de teoría de la información y en la estimación de los retardos relativos causados por las fuentes entre un par de micrófonos. El algoritmo ha obtenido excelente resultados y muestra robustez en la enumeración de mezclas no reverberantes de hasta 5 fuentes de voz. Además se demuestra la potencia del algoritmo para la enumeración de fuentes en mezclas reverberantes. El resto de la tesis esta centrada en audífonos digitales. El primer problema tratado es el de la mejora de la inteligibilidad de la voz en audífonos monoaurales. En primer lugar, se realiza un estudio de los recursos computacionales disponibles en audífonos digitales de ultima generación. Los resultados de este estudio se han utilizado para limitar el coste computacional de los algoritmos de mejora de la calidad de la voz para audífonos propuestos en esta tesis. Para resolver este primer problema se propone un algoritmo mono-canal de mejora de la calidad de la voz de bajo coste computacional. El objetivo es la estimación de una mascara tiempo-frecuencia continua para obtener el mayor parámetro PESQ de salida. El algoritmo combina una versión generalizada del estimador de mínimos cuadrados con un algoritmo de selección de características a medida, utilizando un novedoso conjunto de características. El algoritmo ha obtenido resultados excelentes incluso con baja relación señal a ruido. El siguiente problema abordado es el diseño de algoritmos de mejora de la calidad de la voz para audífonos binaurales comunicados de forma inalámbrica. Estos sistemas tienen un problema adicional, y es que la conexión inalámbrica aumenta el consumo de potencia. El objetivo en esta tesis es diseñar algoritmos de mejora de la calidad de la voz de bajo coste computacional que incrementen la eficiencia energética en audífonos binaurales comunicados de forma inalámbrica. Se han propuesto dos soluciones. La primera es un algoritmo de extremado bajo coste computacional que maximiza el parámetro WDO y esta basado en la estimación de una mascara binaria mediante un discriminante cuadrático que utiliza los valores ILD e ITD de cada punto tiempo-frecuencia para clasificarlo entre voz o ruido. El segundo algoritmo propuesto, también de bajo coste, utiliza además la información de puntos tiempo-frecuencia vecinos para estimar la IBM mediante una versión generalizada del LS-LDA. Además, se propone utilizar un MSE ponderado para estimar la IBM y maximizar el parámetro WDO al mismo tiempo. En ambos algoritmos se propone un esquema de transmisión eficiente energéticamente, que se basa en cuantificar los valores de amplitud y fase de cada banda de frecuencia con un numero distinto de bits. La distribución de bits entre frecuencias se optimiza mediante técnicas de computación evolutivas. El ultimo trabajo incluido en esta tesis trata del diseño de filtros espaciales para audífonos personalizados a una persona determinada. Los coeficientes del filtro pueden adaptarse a una persona siempre que se conozca su HRTF. Desafortunadamente, esta información no esta disponible cuando un paciente visita el audiólogo, lo que causa perdidas de ganancia y distorsiones. Con este problema en mente, se han propuesto tres métodos para diseñar filtros espaciales que maximicen la ganancia y minimicen las distorsiones medias para un conjunto de HRTFs de diseño
    corecore