2 research outputs found

    Learning predictive models from massive, semantically disparate data

    Get PDF
    Machine learning approaches offer some of the most successful techniques for constructing predictive models from data. However, applying such techniques in practice requires overcoming several challenges: infeasibility of centralized access to the data because of the massive size of some of the data sets that often exceeds the size of memory available to the learner, distributed nature of data, access restrictions, data fragmentation, semantic disparities between the data sources, and data sources that evolve spatially or temporally (e.g. data streams and genomic data sources in which new data is being submitted continuously). Learning using statistical queries and semantic correspondences that present a unified view of disparate data sources to the learner offer a powerful general framework for addressing some of these challenges. Against this background, this thesis describes (1) approaches to deal with missing values in the statistical query based algorithms for building predictors (Nayve Bayes and decision trees) and the techniques to minimize the number of required queries in such a setting. (2) Sufficient statistics based algorithms for constructing and updating sequence classifiers. (3) Reduction of several aspects of learning from semantically disparate data sources (such as (a) how errors in mappings affect the accuracy of the learned model and (b) how to choose an optimal mapping from among a set of alternative expert-supplied or automatically generated mappings) to the well-studied problems of domain adaptation and learning in presence of noise and (4) a software for learning predictive models from semantically disparate data

    Genetic predictors for epilepsy development, treatment response and dosing

    Get PDF
    Antiepileptic drug (AED) treatment is the first line strategy for seizure control in the majority of individuals with epilepsy but remains challenging, not least because of interindividual variability in efficacy, tolerability and dosing. The studies presented in this thesis set out to explore that variability from a genomic perspective in patients with newly diagnosed epilepsy from across the UK. Single nucleotide polymorphisms (SNPs) in genes encoding drug metabolising enzymes (DMEs) may be associated with the dose of carbamazepine (CBZ) required for seizure control. A cohort of 159 individuals who were seizure-free for 12 months on a stable dose of CBZ monotherapy was genotyped for 51 SNPs across six DMEs. Haplotype analysis identified 8 haplotype blocks across the genes. No single SNPs or haplotype blocks were associated with CBZ dose. Thus, it is unlikely that genetic variability in DMEs accounts for the individual differences in CBZ dose requirement. A splice site SNP (rs3812718) in the SCN1A gene was previously shown to influence maximum doses of AEDs. This SNP was genotyped in 817 patients and tested for association with maximum and maintenance doses of several AEDs. An association was identified between rs3812718 and maximum AED dose, with an interaction analysis suggestive of a drug specific effect. These findings suggest that this SCN1A variant contributes to variability in the limit of tolerability to AEDs. Response to AED treatment is multifactorial and likely to be influenced by multiple genes. Five SNPs previously reported to predict treatment outcome in epilepsy were genotyped in 772 patients and the resulting data, together with data from an Australian cohort, incorporated into a predictive algorithm. The algorithm failed to predict treatment outcome in general but was partially successful in identifying responders to CBZ and valproate. These five SNPs may be relevant to the prognosis of epilepsy, particularly when treated with specific AEDs. Primary generalised epilepsies (PGEs) are highly heritable and believed to be polygenic in origin. Predictive algorithms were employed to explore genetic influences on seizure (absence vs. myoclonus) and epilepsy (PGE vs. focal) type using 1,840 SNP genotypes available from 436 patients with PGE. Although the algorithms failed to distinguish PGE patients on the basis of genetic variants, they showed improved association over univariate methods of analysis. Such an approach may be suitable for future investigations using large genomic datasets. A recent genome-wide association study identified multiple genetic variants that approached genome-wide significance for association with 12 month remission from seizures. Five of these SNPs were genotyped in an independent cohort of 424 patients and tested for association with remission and time to remission. No significant associations were found, questioning the validity of the original observation or the method of replication. Further work is required to understand this outcome. In conclusion, the genetic bases of epilepsy, AED response and AED dose requirement are multigenic and thus far undetectable using traditional association studies in modestly-sized patient cohorts. Further advances in genomic, bioinformatics and statistical methodologies are required before the genetic contribution to heterogeneity in epilepsy-related phenotypes can be translated into improved clinical care
    corecore