18 research outputs found

    The Cricket-Tracking Project: a case study

    Get PDF
    A case study commissioned from Dr Christmas by Open Exeter documenting the process of preparing and uploading large video files to the Data Archive.This document describes a case study for the archiving of a project to the Exeter Data Archive (EDA). The project described here presents a number of challenges for the archive and for the process of recording its information. The rst challenge is that some of the information is too big to upload into a website (of the order of 20Tb). Other challenges are the number of di erent types of information and the dependencies between them. We start, in section 2, by describing some terms and de nitions that will be used in the document. In particular the words data and dataset have particular meanings in the context of a Computer Science project which may di er from those used within EDA and in other university departments. In section 3 the project is described, including what sorts of content it has generated, and section 3.3 lists the di erent le formats used. The process by which the project's content has been grouped together for entry into EDA is described in section 3.5, which also details how the EDA entries have been constructed

    Robust autoregression: Student-t innovations using variational Bayes

    Get PDF
    Copyright © 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Autoregression (AR) is a tool commonly used to understand and predict time series data. Traditionally the excitation noise is modelled as a Gaussian. However, real-world data may not be Gaussian in nature, and it is known that Gaussian models are adversely affected by the presence of outliers. We introduce a Bayesian AR model in which the excitation noise is assumed to be Student-t distributed. Variational Bayesian approximations to the posterior distributions of the model parameters are used to overcome the intractable integrations inherent in the Bayesian model. Independent automatic relevance determination (ARD) priors over each of the AR coefficients are used to estimate the model order. Using synthetic data, we show that the Student-t model performs well against both Gaussian and leptokurtic data, in terms of parameter estimation (including the model order) and is much more robust to outliers than either Gaussian or finite mixtures of Gaussian models. We apply the model to strongly leptokurtic EEG signals and show that the Student-t model makes more accurate one-step-ahead predictions than the Gaussian model and provides more consistent estimates of the AR coefficients over simultaneously recorded EEG channels

    Robust spatio-temporal latent variable models

    Get PDF
    Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are widely-used mathematical models for decomposing multivariate data. They capture spatial relationships between variables, but ignore any temporal relationships that might exist between observations. Probabilistic PCA (PPCA) and Probabilistic CCA (ProbCCA) are versions of these two models that explain the statistical properties of the observed variables as linear mixtures of an alternative, hypothetical set of hidden, or latent, variables and explicitly model noise. Both the noise and the latent variables are assumed to be Gaussian distributed. This thesis introduces two new models, named PPCA-AR and ProbCCA-AR, that augment PPCA and ProbCCA respectively with autoregressive processes over the latent variables to additionally capture temporal relationships between the observations. To make PPCA-AR and ProbCCA-AR robust to outliers and able to model leptokurtic data, the Gaussian assumptions are replaced with infinite scale mixtures of Gaussians, using the Student-t distribution. Bayesian inference calculates posterior probability distributions for each of the parameter variables, from which we obtain a measure of confidence in the inference. It avoids the pitfalls associated with the maximum likelihood method: integrating over all possible values of the parameter variables guards against overfitting. For these new models the integrals required for exact Bayesian inference are intractable; instead a method of approximation, the variational Bayesian approach, is used. This enables the use of automatic relevance determination to estimate the model orders. PPCA-AR and ProbCCA-AR can be viewed as linear dynamical systems, so the forward-backward algorithm, also known as the Baum-Welch algorithm, is used as an efficient method for inferring the posterior distributions of the latent variables. The exact algorithm is tractable because Gaussian assumptions are made regarding the distribution of the latent variables. This thesis introduces a variational Bayesian forward-backward algorithm based on Student-t assumptions. The new models are demonstrated on synthetic datasets and on real remote sensing and EEG data

    Multiple novel prostate cancer susceptibility signals identified by fine-mapping of known risk loci among Europeans

    Get PDF
    Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain ∼38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same regio

    Para-infectious brain injury in COVID-19 persists at follow-up despite attenuated cytokine and autoantibody responses

    Get PDF
    To understand neurological complications of COVID-19 better both acutely and for recovery, we measured markers of brain injury, inflammatory mediators, and autoantibodies in 203 hospitalised participants; 111 with acute sera (1–11 days post-admission) and 92 convalescent sera (56 with COVID-19-associated neurological diagnoses). Here we show that compared to 60 uninfected controls, tTau, GFAP, NfL, and UCH-L1 are increased with COVID-19 infection at acute timepoints and NfL and GFAP are significantly higher in participants with neurological complications. Inflammatory mediators (IL-6, IL-12p40, HGF, M-CSF, CCL2, and IL-1RA) are associated with both altered consciousness and markers of brain injury. Autoantibodies are more common in COVID-19 than controls and some (including against MYL7, UCH-L1, and GRIN3B) are more frequent with altered consciousness. Additionally, convalescent participants with neurological complications show elevated GFAP and NfL, unrelated to attenuated systemic inflammatory mediators and to autoantibody responses. Overall, neurological complications of COVID-19 are associated with evidence of neuroglial injury in both acute and late disease and these correlate with dysregulated innate and adaptive immune responses acutely

    Robust spatio-temporal latent variable models

    No full text
    Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are widely-used mathematical models for decomposing multivariate data. They capture spatial relationships between variables, but ignore any temporal relationships that might exist between observations. Probabilistic PCA (PPCA) and Probabilistic CCA (ProbCCA) are versions of these two models that explain the statistical properties of the observed variables as linear mixtures of an alternative, hypothetical set of hidden, or latent, variables and explicitly model noise. Both the noise and the latent variables are assumed to be Gaussian distributed. This thesis introduces two new models, named PPCA-AR and ProbCCA-AR, that augment PPCA and ProbCCA respectively with autoregressive processes over the latent variables to additionally capture temporal relationships between the observations. To make PPCA-AR and ProbCCA-AR robust to outliers and able to model leptokurtic data, the Gaussian assumptions are replaced with infinite scale mixtures of Gaussians, using the Student-t distribution. Bayesian inference calculates posterior probability distributions for each of the parameter variables, from which we obtain a measure of confidence in the inference. It avoids the pitfalls associated with the maximum likelihood method: integrating over all possible values of the parameter variables guards against overfitting. For these new models the integrals required for exact Bayesian inference are intractable; instead a method of approximation, the variational Bayesian approach, is used. This enables the use of automatic relevance determination to estimate the model orders. PPCA-AR and ProbCCA-AR can be viewed as linear dynamical systems, so the forward-backward algorithm, also known as the Baum-Welch algorithm, is used as an efficient method for inferring the posterior distributions of the latent variables. The exact algorithm is tractable because Gaussian assumptions are made regarding the distribution of the latent variables. This thesis introduces a variational Bayesian forward-backward algorithm based on Student-t assumptions. The new models are demonstrated on synthetic datasets and on real remote sensing and EEG data.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Classifying and Visualising Roman Pottery using Computer-scanned Typologies

    No full text
    For many archaeological assemblages and type-series, accurate drawings of standardised pottery vessels have been recorded in consistent styles. This provides the opportunity to extract individual pot drawings and derive from them data that can be used for analysis and visualisation. Starting from PDF scans of the original pages of pot drawings, we have automated much of the process for locating, defining the boundaries, extracting and orientating each individual pot drawing. From these processed images, basic features such as width and height, the volume of the interior, the edges, and the shape of the cross-section outline are extracted and are then used to construct more complex features such as a measure of a pot's 'circularity'. Capturing these traits opens up new possibilities for (a) classifying vessel form in a way that is sensitive to the physical characteristics of pots relative to other vessels in an assemblage, and (b) visualising the results of quantifying assemblages using standard typologies. A frequently encountered problem when trying to compare pottery from different archaeological sites is that the pottery is classified into forms and labels using different standards. With a set of data from early Roman urban centres and related sites that has been labelled both with forms (e.g. 'platter' and 'bowl') and shape identifiers (based on the Camulodunum type-series), we use the extracted features from images to look both at how the pottery forms cluster for a given set of features, and at how the features may be used to compare finds from different sites

    Temporaneamente ritirato: Riconsiderare l'officina romana: utilizzare la computer vision per analizzare la realizzazione di antiche iscrizioni

    No full text
    This paper is currently not available due to a delay in the granting of permission for publishing the images. The paper will be made visible as soon as this permission is granted. We apologize for the inconvenience.Il contributo è attualmente non disponibile per la consultazione per un ritardo nella concessione del copyright sulla consultazione delle immagini. La visualizzazione sarà disponibile non appena la licenza sarà garantita. Ci scusiamo per il disagio
    corecore