16 research outputs found

    Method for the analysis of incomplete longitudinal data

    Get PDF
    Unplanned missing data commonly arise in longitudinal trials. When the mechanism driving the missing data process is related to the outcome under investigation, traditional methods of analysis may yield seriously biased parameter estimates. Motivated by data from two clinical trials, this thesis explores various approaches to dealing with data incompleteness. In the first part, a Monte Carlo EM algorithm is developed and used to fit so called random-co efficient-based dropout models; these models relate the probability of a patient's dropout in follow-up studies to some subject-specific characteristics such as their deviation from the average rate of progression of the disease over time. The approach is used to model incomplete data from a 5-year study of patients with Parkinson's disease. The validity of the results obtained using these methods however, depends in general on distributional and modelling assumptions about the missing data that are inherently untestable as no data were collected. For this reason, many have advocated the need for a sensitivity analysis aimed at assessing the robustness of the conclusions from an analysis that ignores the missing data mechanism. In the second part of the thesis we address these issues. In particular, we present results from sensitivity analyses based on local influence and sampling-based methods used in conjunction with the random-coefficient-based dropout model described in the first part. Recently, a more formal approach to sensitivity analysis for missing data problems has been proposed whereby traditional point estimates are replaced by intervals encoding our lack of knowledge due to incompleteness of the data. In the third part of the thesis, we extend these methods to longitudinal ordinal data. Also, for cross-sectional discrete data having distribution belonging to the exponential family, we propose using the proportion of possible estimates of a parameter of interest, over all solutions corresponding to all sample completions, as a measure of ignorance. We develop a computationally efficient algorithm to calculate this proportion and illustrate our methods using data from a dental pain trial

    Learning genetic epistasis using Bayesian network scoring criteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully applied for detecting epistasis is <it>Multifactor Dimensionality Reduction </it>(MDR). Jiang et al. created a combinatorial epistasis learning method called <it>BNMBL </it>to learn Bayesian network (BN) epistatic models. They compared BNMBL to MDR using simulated data sets. Each of these data sets was generated from a model that associates two SNPs with a disease and includes 18 unrelated SNPs. For each data set, BNMBL and MDR were used to score all 2-SNP models, and BNMBL learned significantly more correct models. In real data sets, we ordinarily do not know the number of SNPs that influence phenotype. BNMBL may not perform as well if we also scored models containing more than two SNPs. Furthermore, a number of other BN scoring criteria have been developed. They may detect epistatic interactions even better than BNMBL.</p> <p>Although BNs are a promising tool for learning epistatic relationships from data, we cannot confidently use them in this domain until we determine which scoring criteria work best or even well when we try learning the correct model without knowledge of the number of SNPs in that model.</p> <p>Results</p> <p>We evaluated the performance of 22 BN scoring criteria using 28,000 simulated data sets and a real Alzheimer's GWAS data set. Our results were surprising in that the Bayesian scoring criterion with large values of a hyperparameter called α performed best. This score performed better than other BN scoring criteria and MDR at <it>recall </it>using simulated data sets, at detecting the hardest-to-detect models using simulated data sets, and at substantiating previous results using the real Alzheimer's data set.</p> <p>Conclusions</p> <p>We conclude that representing epistatic interactions using BN models and scoring them using a BN scoring criterion holds promise for identifying epistatic genetic variants in data. In particular, the Bayesian scoring criterion with large values of a hyperparameter α appears more promising than a number of alternatives.</p

    Describing the impact of health research: a Research Impact Framework

    Get PDF
    BACKGROUND: Researchers are increasingly required to describe the impact of their work, e.g. in grant proposals, project reports, press releases and research assessment exercises. Specialised impact assessment studies can be difficult to replicate and may require resources and skills not available to individual researchers. Researchers are often hard-pressed to identify and describe research impacts and ad hoc accounts do not facilitate comparison across time or projects. METHODS: The Research Impact Framework was developed by identifying potential areas of health research impact from the research impact assessment literature and based on research assessment criteria, for example, as set out by the UK Research Assessment Exercise panels. A prototype of the framework was used to guide an analysis of the impact of selected research projects at the London School of Hygiene and Tropical Medicine. Additional areas of impact were identified in the process and researchers also provided feedback on which descriptive categories they thought were useful and valid vis-à-vis the nature and impact of their work. RESULTS: We identified four broad areas of impact: I. Research-related impacts; II. Policy impacts; III. Service impacts: health and intersectoral and IV. Societal impacts. Within each of these areas, further descriptive categories were identified. For example, the nature of research impact on policy can be described using the following categorisation, put forward by Weiss: Instrumental use where research findings drive policy-making; Mobilisation of support where research provides support for policy proposals; Conceptual use where research influences the concepts and language of policy deliberations and Redefining/wider influence where research leads to rethinking and changing established practices and beliefs. CONCLUSION: Researchers, while initially sceptical, found that the Research Impact Framework provided prompts and descriptive categories that helped them systematically identify a range of specific and verifiable impacts related to their work (compared to ad hoc approaches they had previously used). The framework could also help researchers think through implementation strategies and identify unintended or harmful effects. The standardised structure of the framework facilitates comparison of research impacts across projects and time, which is useful from analytical, management and assessment perspectives

    Genetic association mapping via evolutionary-based clustering of haplotypes

    No full text

    Predicting the effect of missense mutations on protein function: analysis with Bayesian networks

    Get PDF
    BACKGROUND A number of methods that use both protein structural and evolutionary information are available to predict the functional consequences of missense mutations. However, many of these methods break down if either one of the two types of data are missing. Furthermore, there is a lack of rigorous assessment of how important the different factors are to prediction. RESULTS Here we use Bayesian networks to predict whether or not a missense mutation will affect the function of the protein. Bayesian networks provide a concise representation for inferring models from data, and are known to generalise well to new data. More importantly, they can handle the noisy, incomplete and uncertain nature of biological data. Our Bayesian network achieved comparable performance with previous machine learning methods. The predictive performance of learned model structures was no better than a naïve Bayes classifier. However, analysis of the posterior distribution of model structures allows biologically meaningful interpretation of relationships between the input variables. CONCLUSION The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used. Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes. With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network
    corecore