514 research outputs found

    Sonography data science

    Get PDF
    Fetal sonography remains a highly specialised skill in spite of its necessity and importance. Because of differences in fetal and maternal anatomy, and human pyschomotor skills, there is an intra- and inter-sonographer variability amoungst expert sonographers. By understanding their similarities and differences, we want to build more interpretive models to assist a sonographer who is less experienced in scanning. This thesis’s contributions to the field of fetal sonography can be grouped into two themes. First I have used data visualisation and machine learning methods to show that a sonographer’s search strategy is anatomical (plane) dependent. Second, I show that a sonographer’s style and human skill of scanning is not easily disentangled. We first examine task-specific spatio-temporal gaze behaviour through the use of data visualisation, where a task is defined as a specific anatomical plane the sonographer is searching for. The qualitative analysis is performed at both a population and individual level, where we show that the task being performed determines the sonographer’s gaze behaviour. In our population-level analysis, we use unsupervised methods to identify meaningful gaze patterns and visualise task-level differences. In our individual-level analysis, we use a deep learning model to provide context to the eye-tracking data with respect to the ultrasound image. We then use an event-based visualisation to understand differences between gaze patterns of sonographers performing the same task. In some instances, sonographers adopt a different search strategy which is seen in the misclassified instances of an eye-tracking task classification model. Our task classification model supports the qualitative behaviour seen in our population-level analysis, where task-specific gaze behaviour is quantitatively distinct. We also investigate the use of time-based skill definitions and their appropriateness in fetal ultrasound sonography; a time-based skill definition uses years of clinical experience as an indicator of skill. The developed task-agnostic skill classification model differentiates gaze behaviour between sonographers in training and fully qualified sonographers. The preliminary results also show that fetal sonography scanning remains an operator-dependent skill, where the notion of human skill and individual scanning stylistic differences cannot be easily disentangled. Our work demonstrates how and where sonographers look at whilst scanning, which can be used as a stepping stone for building style-agnostic skill models

    Tracking interacting targets in multi-modal sensors

    Get PDF
    PhDObject tracking is one of the fundamental tasks in various applications such as surveillance, sports, video conferencing and activity recognition. Factors such as occlusions, illumination changes and limited field of observance of the sensor make tracking a challenging task. To overcome these challenges the focus of this thesis is on using multiple modalities such as audio and video for multi-target, multi-modal tracking. Particularly, this thesis presents contributions to four related research topics, namely, pre-processing of input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking, and interaction recognition. To improve the performance of detection algorithms, especially in the presence of noise, this thesis investigate filtering of the input data through spatio-temporal feature analysis as well as through frequency band analysis. The pre-processed data from multiple modalities is then fused within Particle filtering (PF). To further minimise the discrepancy between the real and the estimated positions, we propose a strategy that associates the hypotheses and the measurements with a real target, using a Weighted Probabilistic Data Association (WPDA). Since the filtering involved in the detection process reduces the available information and is inapplicable on low signal-to-noise ratio data, we investigate simultaneous detection and tracking approaches and propose a multi-target track-beforedetect Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses the detection step and performs tracking in the raw signal. Finally, we apply the proposed multi-modal tracking to recognise interactions between targets in regions within, as well as outside the cameras’ fields of view. The efficiency of the proposed approaches are demonstrated on large uni-modal, multi-modal and multi-sensor scenarios from real world detections, tracking and event recognition datasets and through participation in evaluation campaigns

    Singing information processing: techniques and applications

    Get PDF
    Por otro lado, se presenta un método para el cambio realista de intensidad de voz cantada. Esta transformación se basa en un modelo paramétrico de la envolvente espectral, y mejora sustancialmente la percepción de realismo al compararlo con software comerciales como Melodyne o Vocaloid. El inconveniente del enfoque propuesto es que requiere intervención manual, pero los resultados conseguidos arrojan importantes conclusiones hacia la modificación automática de intensidad con resultados realistas. Por último, se propone un método para la corrección de disonancias en acordes aislados. Se basa en un análisis de múltiples F0, y un desplazamiento de la frecuencia de su componente sinusoidal. La evaluación la ha realizado un grupo de músicos entrenados, y muestra un claro incremento de la consonancia percibida después de la transformación propuesta.La voz cantada es una componente esencial de la música en todas las culturas del mundo, ya que se trata de una forma increíblemente natural de expresión musical. En consecuencia, el procesado automático de voz cantada tiene un gran impacto desde la perspectiva de la industria, la cultura y la ciencia. En este contexto, esta Tesis contribuye con un conjunto variado de técnicas y aplicaciones relacionadas con el procesado de voz cantada, así como con un repaso del estado del arte asociado en cada caso. En primer lugar, se han comparado varios de los mejores estimadores de tono conocidos para el caso de uso de recuperación por tarareo. Los resultados demuestran que \cite{Boersma1993} (con un ajuste no obvio de parámetros) y \cite{Mauch2014}, tienen un muy buen comportamiento en dicho caso de uso dada la suavidad de los contornos de tono extraídos. Además, se propone un novedoso sistema de transcripción de voz cantada basada en un proceso de histéresis definido en tiempo y frecuencia, así como una herramienta para evaluación de voz cantada en Matlab. El interés del método propuesto es que consigue tasas de error cercanas al estado del arte con un método muy sencillo. La herramienta de evaluación propuesta, por otro lado, es un recurso útil para definir mejor el problema, y para evaluar mejor las soluciones propuestas por futuros investigadores. En esta Tesis también se presenta un método para evaluación automática de la interpretación vocal. Usa alineamiento temporal dinámico para alinear la interpretación del usuario con una referencia, proporcionando de esta forma una puntuación de precisión de afinación y de ritmo. La evaluación del sistema muestra una alta correlación entre las puntuaciones dadas por el sistema, y las puntuaciones anotadas por un grupo de músicos expertos

    Tracking Rhythmicity in Biomedical Signals using Sequential Monte Carlo methods

    Get PDF
    Cyclical patterns are common in signals that originate from natural systems such as the human body and man-made machinery. Often these cyclical patterns are not perfectly periodic. In that case, the signals are called pseudo-periodic or quasi-periodic and can be modeled as a sum of time-varying sinusoids, whose frequencies, phases, and amplitudes change slowly over time. Each time-varying sinusoid represents an individual rhythmical component, called a partial, that can be characterized by three parameters: frequency, phase, and amplitude. Quasi-periodic signals often contain multiple partials that are harmonically related. In that case, the frequencies of other partials become exact integer multiples of that of the slowest partial. These signals are referred to as multi-harmonic signals. Examples of such signals are electrocardiogram (ECG), arterial blood pressure (ABP), and human voice. A Markov process is a mathematical model for a random system whose future and past states are independent conditional on the present state. Multi-harmonic signals can be modeled as a stochastic process with the Markov property. The Markovian representation of multi-harmonic signals enables us to use state-space tracking methods to continuously estimate the frequencies, phases, and amplitudes of the partials. Several research groups have proposed various signal analysis methods such as hidden Markov Models (HMM), short time Fourier transform (STFT), and Wigner-Ville distribution to solve this problem. Recently, a few groups of researchers have proposed Monte Carlo methods which estimate the posterior distribution of the fundamental frequency in multi-harmonic signals sequentially. However, multi-harmonic tracking is more challenging than single-frequency tracking, though the reason for this has not been well understood. The main objectives of this dissertation are to elucidate the fundamental obstacles to multi-harmonic tracking and to develop a reliable multi-harmonic tracker that can track cyclical patterns in multi-harmonic signals

    Towards pedestrian-aware autonomous cars

    Get PDF

    Towards pedestrian-aware autonomous cars

    Get PDF
    corecore