77 research outputs found

    IMPROVING THE AUTOMATIC RECOGNITION OF DISTORTED SPEECH

    Get PDF
    Automatic speech recognition has a wide variety of uses in this technological age, yet speech distortions present many difficulties for accurate recognition. The research presented provides solutions that counter the detrimental effects that some distortions have on the accuracy of automatic speech recognition. Two types of speech distortions are focused on independently. They are distortions due to speech coding and distortions due to additive noise. Compensations for both types of distortion resulted in decreased recognition error.Distortions due to the speech coding process are countered through recognition of the speech directly from the bitstream, thus eliminating the need for reconstruction of the speech signal and eliminating the distortion caused by it. There is a relative difference of 6.7% between the recognition error rate of uncoded speech and that of speech reconstructed from MELP encoded parameters. The relative difference between the recognition error rate for uncoded speech and that of encoded speech recognized directly from the MELP bitstream is 3.5%. This 3.2 percentage point difference is equivalent to the accurate recognition of an additional 334 words from the 12,863 words spoken.Distortions due to noise are offset through appropriate modification of an existing noise reduction technique called minimum mean-square error log spectral amplitude enhancement. A relative difference of 28% exists between the recognition error rate of clean speech and that of speech with additive noise. Applying a speech enhancement front-end reduced this difference to 22.2%. This 5.8 percentage point difference is equivalent to the accurate recognition of an additional 540 words from the 12,863 words spoken

    Communication-aware motion planning in mobile networks

    Get PDF
    Over the past few years, considerable progress has been made in the area of networked robotic systems and mobile sensor networks. The vision of a mobile sensor network cooperatively learning and adapting in harsh unknown environments to achieve a common goal is closer than ever. In addition to sensing, communication plays a key role in the overall performance of a mobile network, as nodes need to cooperate to achieve their tasks and thus have to communicate vital information in environments that are typically challenging for communication. Therefore, in order to realize the full potentials of such networks, an integrative approach to sensing (information gathering), communication (information exchange), and motion planning is needed, such that each mobile sensor considers the impact of its motion decisions on both sensing and communication, and optimizes its trajectory accordingly. This is the main motivation for this dissertation. This dissertation focuses on communication-aware motion planning of mobile networks in the presence of realistic communication channels that experience path loss, shadowing and multipath fading. This is a challenging multi-disciplinary task. It requires an assessment of wireless link qualities at places that are not yet visited by the mobile sensors as well as a proper co-optimization of sensing, communication and navigation objectives, such that each mobile sensor chooses a trajectory that provides the best balance between its sensing and communication, while satisfying the constraints on its connectivity, motion and energy consumption. While some trajectories allow the mobile sensors to sense efficiently, they may not result in a good communication. On the other hand, trajectories that optimize communication may result in poor sensing. The main contribution of this dissertation is then to address these challenges by proposing a new paradigm for communication-aware motion planning in mobile networks. We consider three examples from networked robotics and mobile sensor network literature: target tracking, surveillance and dynamic coverage. For these examples, we show how probabilistic assessment of the channel can be used to integrate sensing, communication and navigation objectives when planning the motion in order to guarantee satisfactory performance of the network in realistic communication settings. Specifically, we characterize the performance of the proposed framework mathematically and unveil new and considerably more efficient system behaviors. Finally, since multipath fading cannot be assessed, proper strategies are needed to increase the robustness of the network to multipath fading and other modeling/channel assessment errors. We further devise such robustness strategies in the context of our communication-aware surveillance scenario. Overall, our results show the superior performance of the proposed motion planning approaches in realistic fading environments and provide an in-depth understanding of the underlying design trade-off space

    The functional anatomy of white matter pathways for visual configuration learning

    Get PDF
    The role of the medial temporal lobes (MTL) in visuo-spatial learning has been extensively studied and documented in the neuroscientific literature. Numerous animal and human studies have demonstrated that the parahippocampal place area (PPA), which sits at the confluence of the parahippocampal and lingual gyri, is particularly important for learning the spatial configuration of objects in visually presented scenes. In current visuo-spatial processing models, the PPA sits downstream from the parietal lobes which are involved in multiple facets of spatial processing. Yet, direct input to the PPA from early visual cortex (EVC) is rarely discussed and poorly understood. This thesis adopted a multimodal neuroimaging analysis approach to study the functional anatomy of these connections. First, the pattern of structural connectivity between EVC and the MTL was explored by means of surface-based ‘connectomes’ constructed from diffusion MRI tractography in a cohort of 200 healthy young adults from the Human Connectome Project. Through this analysis, the PPA emerged as a primary recipient of EVC connections within the MTL. Second, a data-driven clustering analysis of the PPA’s connectivity to an extended cortical region (including EVC, retrosplenial cortex, and other areas) revealed multiple clusters with different connectivity profiles within the PPA. The two main clusters were located in the posterior and anterior portions of the PPA, with the posterior cluster preferentially connected to EVC. Motivated by this result, virtual tractography dissections were used to delineate the medial occipital longitudinal tract (MOLT), the white matter bundle connecting the PPA with EVC. The properties of this bundle and its relation to visual configuration learning were verified in a different, cross-sectional adult cohort of 90 subjects. Finally, the role of the MOLT in the visuo-spatial learning domain was further confirmed in the case of a stroke patient who, after bilateral occipital injury, exhibited deficits confined to this domain. The results presented in this work suggest that the MOLT should be included in current visuo-spatial processing models as it offers additional insight into how the MTL acquires and processes information for spatial learning

    Learning from limited labelled data: contributions to weak, few-shot, and unsupervised learning

    Full text link
    Tesis por compendio[ES] En la última década, el aprendizaje profundo (DL) se ha convertido en la principal herramienta para las tareas de visión por ordenador (CV). Bajo el paradigma de aprendizaje supervisado, y gracias a la recopilación de grandes conjuntos de datos, el DL ha alcanzado resultados impresionantes utilizando redes neuronales convolucionales (CNNs). Sin embargo, el rendimiento de las CNNs disminuye cuando no se dispone de suficientes datos, lo cual dificulta su uso en aplicaciones de CV en las que sólo se dispone de unas pocas muestras de entrenamiento, o cuando el etiquetado de imágenes es una tarea costosa. Estos escenarios motivan la investigación de estrategias de aprendizaje menos supervisadas. En esta tesis, hemos explorado diferentes paradigmas de aprendizaje menos supervisados. Concretamente, proponemos novedosas estrategias de aprendizaje autosupervisado en la clasificación débilmente supervisada de imágenes histológicas gigapixel. Por otro lado, estudiamos el uso del aprendizaje por contraste en escenarios de aprendizaje de pocos disparos para la vigilancia automática de cruces de ferrocarril. Por último, se estudia la localización de lesiones cerebrales en el contexto de la segmentación no supervisada de anomalías. Asimismo, prestamos especial atención a la incorporación de conocimiento previo durante el entrenamiento que pueda mejorar los resultados en escenarios menos supervisados. En particular, introducimos proporciones de clase en el aprendizaje débilmente supervisado en forma de restricciones de desigualdad. Además, se incorpora la homogeneización de la atención para la localización de anomalías mediante términos de regularización de tamaño y entropía. A lo largo de esta tesis se presentan diferentes métodos menos supervisados de DL para CV, con aportaciones sustanciales que promueven el uso de DL en escenarios con datos limitados. Los resultados obtenidos son prometedores y proporcionan a los investigadores nuevas herramientas que podrían evitar la anotación de cantidades masivas de datos de forma totalmente supervisada.[CA] En l'última dècada, l'aprenentatge profund (DL) s'ha convertit en la principal eina per a les tasques de visió per ordinador (CV). Sota el paradigma d'aprenentatge supervisat, i gràcies a la recopilació de grans conjunts de dades, el DL ha aconseguit resultats impressionants utilitzant xarxes neuronals convolucionals (CNNs). No obstant això, el rendiment de les CNNs disminueix quan no es disposa de suficients dades, la qual cosa dificulta el seu ús en aplicacions de CV en les quals només es disposa d'unes poques mostres d'entrenament, o quan l'etiquetatge d'imatges és una tasca costosa. Aquests escenaris motiven la investigació d'estratègies d'aprenentatge menys supervisades. En aquesta tesi, hem explorat diferents paradigmes d'aprenentatge menys supervisats. Concretament, proposem noves estratègies d'aprenentatge autosupervisat en la classificació feblement supervisada d'imatges histològiques gigapixel. D'altra banda, estudiem l'ús de l'aprenentatge per contrast en escenaris d'aprenentatge de pocs trets per a la vigilància automàtica d'encreuaments de ferrocarril. Finalment, s'estudia la localització de lesions cerebrals en el context de la segmentació no supervisada d'anomalies. Així mateix, prestem especial atenció a la incorporació de coneixement previ durant l'entrenament que puga millorar els resultats en escenaris menys supervisats. En particular, introduïm proporcions de classe en l'aprenentatge feblement supervisat en forma de restriccions de desigualtat. A més, s'incorpora l'homogeneïtzació de l'atenció per a la localització d'anomalies mitjançant termes de regularització de grandària i entropia. Al llarg d'aquesta tesi es presenten diferents mètodes menys supervisats de DL per a CV, amb aportacions substancials que promouen l'ús de DL en escenaris amb dades limitades. Els resultats obtinguts són prometedors i proporcionen als investigadors noves eines que podrien evitar l'anotació de quantitats massives de dades de forma totalment supervisada.[EN] In the last decade, deep learning (DL) has become the main tool for computer vision (CV) tasks. Under the standard supervised learnng paradigm, and thanks to the progressive collection of large datasets, DL has reached impressive results on different CV applications using convolutional neural networks (CNNs). Nevertheless, CNNs performance drops when sufficient data is unavailable, which creates challenging scenarios in CV applications where only few training samples are available, or when labeling images is a costly task, that require expert knowledge. Those scenarios motivate the research of not-so-supervised learning strategies to develop DL solutions on CV. In this thesis, we have explored different less-supervised learning paradigms on different applications. Concretely, we first propose novel self-supervised learning strategies on weakly supervised classification of gigapixel histology images. Then, we study the use of contrastive learning on few-shot learning scenarios for automatic railway crossing surveying. Finally, brain lesion segmentation is studied in the context of unsupervised anomaly segmentation, using only healthy samples during training. Along this thesis, we pay special attention to the incorporation of tasks-specific prior knowledge during model training, which may be easily obtained, but which can substantially improve the results in less-supervised scenarios. In particular, we introduce relative class proportions in weakly supervised learning in the form of inequality constraints. Also, attention homogenization in VAEs for anomaly localization is incorporated using size and entropy regularization terms, to make the CNN to focus on all patterns for normal samples. The different methods are compared, when possible, with their supervised counterparts. In short, different not-so-supervised DL methods for CV are presented along this thesis, with substantial contributions that promote the use of DL in data-limited scenarios. The obtained results are promising, and provide researchers with new tools that could avoid annotating massive amounts of data in a fully supervised manner.The work of Julio Silva Rodríguez to carry out this research and to elaborate this dissertation has been supported by the Spanish Government under the FPI Grant PRE2018-083443.Silva Rodríguez, JJ. (2022). Learning from limited labelled data: contributions to weak, few-shot, and unsupervised learning [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/190633Compendi
    • …
    corecore