39 research outputs found

    Probabilistic modelling and inference of human behaviour from mobile phone time series

    No full text
    With an estimated 4.1 billion subscribers around the world, the mobile phone offers a unique opportunity to sense and understand human behaviour from location, co-presence and communication data. While the benefit of modelling this unprecedented amount of data is widely recognised, a number of challenges impede the development of accurate behaviour models. In this thesis, we identify and address two modelling problems and show that their consideration improves the accuracy of behaviour inference. We first examine the modelling of long-range dependencies in human behaviour. Human behaviour models only take into account short-range dependencies in mobile phone time series. Using information theory, we quantify long-range dependencies in mobile phone time series for the first time, demonstrate that they exhibit periodic oscillations and introduce novel tools to analyse them. We further show that considering what the user did 24 hours earlier improves accuracy when predicting user behaviour five hours or longer in advance. The second problem that we address is the modelling of temporal variations in human behaviour. The time spent by a user on an activity varies from one day to the next. In order to recognise behaviour patterns despite temporal variations, we establish a methodological connection between human behaviour modelling and biological sequence alignment. This connection allows us to compare, cluster and model behaviour sequences and introduce novel features for behaviour recognition which improve its accuracy. The experiments presented in this thesis have been conducted on the largest publicly available mobile phone dataset labelled in an unsupervised fashion and are entirely repeatable. Furthermore, our techniques only require cellular data which can easily be recorded by today's mobile phones and could benefit a wide range of applications including life logging, health monitoring, customer profiling and large-scale surveillance

    Exploiting Uncertainty Information in Speaker Verification and Diarization

    Get PDF
    Tato práce se zabývá dvěma modely, které umožňují využít informace o nejistotě v úlohách automatického ověřování mluvčího a diarizace mluvčích. První model, který zvažujeme, je modifikací široce používané gaussovské pravděpodobnostní lineární diskriminační analýzy (G-PLDA), modelující rozložení vektorových reprezentací promluv nazývaných embeddingy. V G-PLDA se předpokládá, že embeddingy jsou generovány přidáním šumového vektoru navzorkovaného z Gaussova rozložení k vektoru reprezentujícímu mluvčího. Ukazujeme, že za předpokladu, že šum byl místo toho vzorkován ze Studentova T-rozdělení, model PLDA (tuto verzi nazýváme PLDA s těžkým chvostem, heavy-tail, HT-PLDA) může při rozhodnutí o ověření mluvčího využít informace o nejistotě. Náš model je koncepčně podobný modelu HT-PLDA definovanému Kennym et al. v roce 2010, ale jak ukazujeme v této práci, umožňuje rychlé skórování, zatímco původní definice HT-PLDA je značně časové a výpočetně náročná. Představujeme algoritmus pro trénování naší verze HT-PLDA jako generativního modelu a zvažujeme rovněž různé strategie diskriminativního trénování parametrů tohoto modelu. Generativně a diskriminativně trénovanou HT-PLDA testujeme na úloze ověřování mluvčího. Výsledky naznačují, že HT-PLDA funguje podobně jako standardní G-PLDA, přičemž má výhodu v odolnosti vůči změnám v předzpracování dat. Experimenty s diarizací mluvčích ukazují, že HT-PLDA poskytuje nejen lepší výsledky než základní G-PLDA, ale skóre logaritmického poměru věrohodností (log-likelihood ratio, LLR) produkovaná tímto modelem jsou lépe kalibrována. Ve druhém modelu nepovažujeme (na rozdíl od HT-PLDA) embeddingy za pozorovaná data. Místo toho jsou v tomto modelu embeddingy normálně rozložené skryté proměnné. Přesnost (precision) embeddingů nese informaci o kvalitě řečového segmentu: u čistých dlouhých segmentů by přesnost měla být vysoká a u krátkých a zašuměných promluv by měla být nízká. Ukazujeme, jak lze takové pravděpodobnostní embeddingy začlenit do skórování založeného na G-PLDA, a jak parametry skrytého embeddingu ovlivňují jeho vliv při výpočtu věrohodností s tímto modelem. V experimentech demonstrujeme, jak lze využít existující extraktor embeddingů založený na neuronové síti (NN) k produkci nikoli embeddingu, ale parametrů pravděpodobnostního rozložení embeddingu. Pravděpodobnostní embeddingy testujeme na úloze diarizace mluvčích. Výsledky ukazují, že tento model poskytuje dobře kalibrovaná skóre LLR umožňující lepší diarizaci, pokud není k dispozici vývojová datová sada pro ladění shlukovacího algoritmu.This thesis considers two models allowing to utilize uncertainty information in the tasks of Automatic Speaker Verification and Speaker Diarization. The first model we consider is a modification of the widely-used Gaussian Probabilistic Linear Discriminant Analysis (G-PLDA) that models the distribution of the vector utterance representations called embeddings. In G-PLDA, the embeddings are assumed to be generated by adding a noise vector sampled from a Gaussian distribution to a speakerdependent vector. We show that when assuming that the noise was instead sampled from a Student's T-distribution, the PLDA model (we call this version heavy-tailed PLDA) can use the uncertainty information when making the verification decisions. Our model is conceptually similar to the HT-PLDA model defined by Kenny et al. in 2010, but, as we show in this thesis, it allows for fast scoring, while the original HT-PLDA definition requires considerable time and computation resources for scoring. We present the algorithm to train our version of HT-PLDA as a generative model. Also, we consider various strategies for discriminatively training the parameters of the model. We test the performance of generatively and discriminatively trained HT-PLDA on the speaker verification task. The results indicate that HT-PLDA performs on par with the standard G-PLDA while having the advantage of being more robust against variations in the data pre-processing. Experiments on the speaker diarization demonstrate that the HT-PLDA model not only provides better performance than the G-PLDA baseline model but also has the advantage of producing better-calibrated Log-Likelihood Ratio (LLR) scores. In the second model, unlike in HT-PLDA, we do not consider the embeddings as the observed data. Instead, in this model, the embeddings are normally distributed hidden variables. The embedding precision carries the information about the quality of the speech segment: for clean long segments, the precision should be high, and for short and noisy utterances, it should be low. We show how such probabilistic embeddings can be incorporated into the G-PLDA framework and how the parameters of the hidden embedding influence its impact when computing the likelihood with this model. In the experiments, we demonstrate how to utilize an existing neural network (NN) embedding extractor to provide not embeddings but parameters of probabilistic embedding distribution. We test the performance of the probabilistic embeddings model on the speaker diarization task. The results demonstrate that this model provides well-calibrated LLR scores allowing for better diarization when no development dataset is available to tune the clustering algorithm.

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Archival Phonetics & Prosodic Typology in Sixteen Australian Languages

    Get PDF
    In naturalistic speech, the phonetic instantiation of phonological categories is often highly variable. Speakers have been observed to converge on patterns of phonetic variation that are consistent within languages but variable cross-linguistically for the same phonological phenomenon. Speakers are evidently sensitive to these sorts of patterns and learn the phonetic variation in a consistent way. Furthermore, the systematicity of this variation suggests that these patterns should change over time systematically as well. Most Australian languages assign lexical stress consistently on the first syllable of the word, raising the question of how the phonetics of stress varies across languages with this phonologically stable pattern. This dissertation presents an investigation into structured variation of the acoustic correlates of stress and prosody in sixteen Indigenous languages of Australia that all have consistent initial stress placement, with a focus on the source(s) of variation in these factors cross-linguistically. Acoustic correlates of stress, despite the phonological uniformity present among these languages, show significant cross-linguistic variation, both in the presence or absence of a particular cue to stress, as well as the size of these effects. The phonological uniformity of stress assignment allows for a more controlled comparison of the acoustic correlates of stress across these languages, since the placement of stress marking remains constant. Acoustic correlates investigated are vowel duration, pre-tonic and post-tonic consonant duration, intensity, f0 (maximum and range), and vowel peripherality. These cues are identified using a series of mixed effects linear regression models. To identify the source(s) of variation in acoustic correlates to stress, the population genetics tool Analysis of Molecular Variance (AMOVA) is used. This is a statistical tool created for analysis of genetic variance that has been applied to cultural evolution topics such as music and folktales. This model finds significant variation across languages, as well as substantial intra-speaker variation, similarly to the findings for both biological and cultural evolution, but no significant intra-language variation across speakers. These results are also supported by the investigation of inter- and intra-language variation using regresssion modeling. Another population genetics measure, fixation index, is used to create a network model of language relationships based on the phonetic correlates of lexical stress. This network shows clear relationships between the Pama Nyungan languages in this sample, as well as some Gunwinyguan languages, supporting the claim that the phonetic cues to stress are stable within language families and change according to the principles of diachronic language change. Smaller groupings in this network also indicate some contact-induced change or areal effects in these phonetic markers. Phrasal prosody is also investigated in this dissertation, using a toolkit for automated phrasal contour clustering. For each language, f0 is measured at regular intervals across the word, which is used as input to a complete-linkage clustering algorithm to identify major categories of phrasal contours. Results of this sort of automatic clustering provide testable hypotheses about phrasal types in each language, while avoiding some common pitfalls of impressionistic analyses of prosodic phrases. As with the investigation into lexical stress, this sort of automated typological work serves as a crucial complement to more detailed language-specific studies for the creation of well-rounded and well-supported theories. The data used in this dissertation are narrative speech recordings sourced from language archives, collected in varying field settings. In processing these data I have created a large corpus of these recordings force aligned at the segment level and have worked out post-hoc methods for controlling noise and variation in field-collected audio to create a comparable set of language data. I include in the dissertation a lengthy discussion of these methods, with the aim of providing a practical toolkit for the use of archival materials to address novel phonetic questions, as well as to aid in the creation of language revitalization resources

    Application of generic sense classes in word sense disambiguation

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Acta Cybernetica : Volume 18. Number 3.

    Get PDF
    corecore