13 research outputs found

    On identification problems requiring linked autosomal markers.

    Full text link
    This paper considers identification problems based on DNA marker data. The topics we discuss are general, but we will exemplify them in a simple context. There is DNA available from two persons. There is uncertainty about the relationship between the two individuals and a number of hypotheses describing the possible relationship is available. The task is to determine the most likely pedigree. This problem is fairly standard. However, there are some problems that cannot be solved using DNA from independently segregating loci. For example, the likelihoods for (i) grandparent-grandchild, (ii) uncle-niece and (iii) half-sibs coincide for such DNA data and so these relations cannot be distinguished on the basis of markers normally used for forensic identification problems: the likelihood ratio comparing any pair of hypotheses will be unity. Sometimes, but not in the examples we consider, other sources of DNA like mtDNA or sex chromosomes can help to distinguish between such equally likely possibilities. Prior information can likewise be of use. For instance, age information can exclude alternative (i) above and also indicate that alternative (iii) is apriori more likely than alternative (ii). More generally, the above problems can be solved using linked autosomal markers. To study the problem in detail and understand how linkage works in this regard, we derive an explicit formula for a pair of linked markers. The formula extends to independent pairs of linked markers. While this approach adds to the understanding of the problem, more markers are required to obtain satisfactory results and then the Lander-Green algorithm is needed. Simulation experiments are presented based on a range of scenarios and we conclude that useful results can be obtained using available freeware (MERLIN and R). The main message of this paper is that linked autosomal markers deserve greater attention in forensic genetics and that the required laboratory and statistical analyses can be performed based on existing technology and freeware

    Adjusting for founder relatedness in a linkage analysis using prior information

    Full text link
    Adjusting for founder relatedness in a linkage analysis using prior informatio

    Epidemiology, genetic epidemiology and Mendelian randomisation: more need than ever to attend to detail.

    No full text
    In the current era, with increasing availability of results from genetic association studies, finding genetic instruments for inferring causality in observational epidemiology has become apparently simple. Mendelian randomisation (MR) analyses are hence growing in popularity and, in particular, methods that can incorporate multiple instruments are being rapidly developed for these applications. Such analyses have enormous potential, but they all rely on strong, different, and inherently untestable assumptions. These have to be clearly stated and carefully justified for every application in order to avoid conclusions that cannot be replicated. In this article, we review the instrumental variable assumptions and discuss the popular linear additive structural model. We advocate the use of tests for the null hypothesis of 'no causal effect' and calculation of the bounds for a causal effect, whenever possible, as these do not rely on parametric modelling assumptions. We clarify the difference between a randomised trial and an MR study and we comment on the importance of validating instruments, especially when considering them for joint use in an analysis. We urge researchers to stand by their convictions, if satisfied that the relevant assumptions hold, and to interpret their results causally since that is the only reason for performing an MR analysis in the first place

    Mendelian randomisation as an instrumental variable approach to causal inference

    Full text link
    In epidemiological research, the causal effect of a potentially modifiable phenotype or exposure on a particular outcome or disease is often of public health interest. Randomised controlled trials to investigate this effect are not always possible and inferences based on observational data can be distorted in the presence of confounders. However, if we know of a gene with an indirect effect on the disease via its effect on the phenotype, it can often be reasonably assumed that the gene is not itself associated with any confounding factors - a phenomenon called Mendelian randomisation. It is well known in the economics and causal literature that these properties define an instrumental variable and allow estimation of the causal effect, despite the confounding, under certain model restrictions. In this paper, we present a formal framework for causal inference based on Mendelian randomisation where the causal effect is defined as the effect of an intervention. Furthermore, we suggest a graphical representation of the data situation using directed acyclic graphs so that model assumptions can be checked by visual inspection. This framework allows us to address limitations of the Mendelian randomisation technique that have often been overlooked in the medical literature

    Structured incorporation of prior information in relationship identification problems

    Full text link
    The objective of this paper is to show how various sources of information can be modelled and integrated to address relationship identification problems. Applications come from areas as diverse as evolution and conservation research, genealogical research in human, plant and animal populations, and forensic problems including paternity cases, identification following disasters, family reunion and immigration issues. We propose assigning a prior probability distribution to the sample space of pedigrees, calculating the likelihood based on DNA data using available software and posterior probabilities using Bayes' Theorem. Our emphasis here is on the modelling of this prior information in a formal and consistent manner. We introduce the distinction between local and global prior information whereby local information usually applies to particular components of the pedigree and global prior information refers to more general features. When it is difficult to decide on a prior distribution, robustness to various choices should be studied. When suitable prior information is not available, a flat prior can be used which will then correspond to a strict likelihood approach. In practice, prior information is often considered for these problems, but in a generally ad hoc manner. This paper offers a consistent alternative. We emphasise that many practical problems can be addressed using freely available software

    Improved maximum likelihood reconstruction of complex multi-generational pedigrees.

    No full text
    The reconstruction of pedigrees from genetic marker data is relevant to a wide range of applications. Likelihood-based approaches aim to find the pedigree structure that gives the highest probability to the observed data. Existing methods either entail an exhaustive search and are hence restricted to small numbers of individuals, or they take a more heuristic approach and deliver a solution that will probably have high likelihood but is not guaranteed to be optimal. By encoding the pedigree learning problem as an integer linear program we can exploit efficient optimisation algorithms to construct pedigrees guaranteed to have maximal likelihood for the standard situation where we have complete marker data at unlinked loci and segregation of genes from parents to offspring is Mendelian. Previous work demonstrated efficient reconstruction of pedigrees of up to about 100 individuals. The modified method that we present here is not so restricted: we demonstrate its applicability with simulated data on a real human pedigree structure of over 1600 individuals. It also compares well with a very competitive approximate approach in terms of solving time and accuracy. In addition to identifying a maximum likelihood pedigree, we can obtain any number of pedigrees in decreasing order of likelihood. This is useful for assessing the uncertainty of a maximum likelihood solution and permits model averaging over high likelihood pedigrees when this would be appropriate. More importantly, when the solution is not unique, as will often be the case for large pedigrees, it enables investigation into the properties of maximum likelihood pedigree estimates which has not been possible up to now. Crucially, we also have a means of assessing the behaviour of other approximate approaches which all aim to find a maximum likelihood solution. Our approach hence allows us to properly address the question of whether a reasonably high likelihood solution that is easy to obtain is practically as useful as a guaranteed maximum likelihood solution. The efficiency of our method on such large problems bodes well for extensions beyond the standard setting where some pedigree members may be latent, genotypes may be measured with error and markers may be linked

    Mixtures with relatives: a pedigree perspective.

    No full text
    DNA mixture evidence pertains to cases where several individuals may have contributed to a biological stain. Statistical methods and software for such problems are available and a large number of cases can be handled adequately. However, one class of mixture problems remains untreated in full generality in the literature, namely when the contributors may be related. Disregarding a plausible close relative of the perpetrator as an alternative contributor (identical twin is the most extreme case) may lead to overestimating the evidence against a suspect. Existing methods only accommodate pairwise relationships such as the case where the suspect and the victim are siblings, for example. In this paper we consider relationships in full generality, conveniently represented by pedigrees. In particular, these pedigrees may involve inbreeding, for instance when the parents of an individual of interest are first cousins. Furthermore our framework handles situations where the opposing parties in a court case (prosecution and defence) propose different family relationships. Consequently, our approach combines classical mixture and kinship problems. The basic idea of this paper is to formulate the problem in a way that allows for the exploitation of currently available methods and software designed originally for linkage applications. We have developed a freely available R package, euroMix based on another package, paramlink, and we illustrate the ideas and methods on real and simulated data

    Family based studies and genetic epidemiology: theory and practice

    Full text link
    Family based studies have underpinned many successes in uncovering the causes of monogenic and oligogenic diseases. Now research is focussing on the identification and characterisation of genes underlying common diseases and it is widely accepted that these studies will require large population based samples. Population based family study designs have the potential to facilitate the analysis of the effects of both genes and environment. These types of studies integrate the population based approaches of classic epidemiology and the methods enabling the analysis of correlations between relatives sharing both genes and environment. The extent to which such studies are feasible will depend upon population- and disease-specific factors. To review this topic, a symposium was held to present and discuss the costs, requirements and advantages of population based family study designs. This article summarises the features of the meeting held at The University of Sheffield, August 2006

    Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic.

    No full text
    Background: MR-Egger regression has recently been proposed as a method for Mendelian randomization (MR) analyses incorporating summary data estimates of causal effect from multiple individual variants, which is robust to invalid instruments. It can be used to test for directional pleiotropy and provides an estimate of the causal effect adjusted for its presence. MR-Egger regression provides a useful additional sensitivity analysis to the standard inverse variance weighted (IVW) approach that assumes all variants are valid instruments. Both methods use weights that consider the single nucleotide polymorphism (SNP)-exposure associations to be known, rather than estimated. We call this the `NO Measurement Error' (NOME) assumption. Causal effect estimates from the IVW approach exhibit weak instrument bias whenever the genetic variants utilized violate the NOME assumption, which can be reliably measured using the F-statistic. The effect of NOME violation on MR-Egger regression has yet to be studied. Methods: An adaptation of the I^2 statistic from the field of meta-analysis is proposed to quantify the strength of NOME violation for MR-Egger. It lies between 0 and 1, and indicates the expected relative bias (or dilution) of the MR-Egger causal estimate in the two-sample MR context. We call it I^2GX. The method of simulation extrapolation is also explored to counteract the dilution. Their joint utility is evaluated using simulated data and applied to a real MR example. Results: In simulated two-sample MR analyses we show that, when a causal effect exists, the MR-Egger estimate of causal effect is biased towards the null when NOME is violated, and the stronger the violation (as indicated by lower values of I^2GX), the stronger the dilution. When additionally all genetic variants are valid instruments, the type I error rate of the MR-Egger test for pleiotropy is inflated and the causal effect underestimated. Simulation extrapolation is shown to substantially mitigate these adverse effects. We demonstrate our proposed approach for a two-sample summary data MR analysis to estimate the causal effect of low-density lipoprotein on heart disease risk. A high value of I^2GX close to 1 indicates that dilution does not materially affect the standard MR-Egger analyses for these data. Conclusions: Care must be taken to assess the NOME assumption via the I^2GX statistic before implementing standard MR-Egger regression in the two-sample summary data context. If I^2GX is sufficiently low (less than 90%), inferences from the method should be interpreted with caution and adjustment methods considered

    Using multiple genetic variants as instrumental variables for modifiable risk factors.

    Full text link
    Mendelian randomisation analyses use genetic variants as instrumental variables (IVs) to estimate causal effects of modifiable risk factors on disease outcomes. Genetic variants typically explain a small proportion of the variability in risk factors; hence Mendelian randomisation analyses can require large sample sizes. However, an increasing number of genetic variants have been found to be robustly associated with disease-related outcomes in genome-wide association studies. Use of multiple instruments can improve the precision of IV estimates, and also permit examination of underlying IV assumptions. We discuss the use of multiple genetic variants in Mendelian randomisation analyses with continuous outcome variables where all relationships are assumed to be linear. We describe possible violations of IV assumptions, and how multiple instrument analyses can be used to identify them. We present an example using four adiposity-associated genetic variants as IVs for the causal effect of fat mass on bone density, using data on 5509 children enrolled in the ALSPAC birth cohort study. We also use simulation studies to examine the effect of different sets of IVs on precision and bias. When each instrument independently explains variability in the risk factor, use of multiple instruments increases the precision of IV estimates. However, inclusion of weak instruments could increase finite sample bias. Missing data on multiple genetic variants can diminish the available sample size, compared with single instrument analyses. In simulations with additive genotype-risk factor effects, IV estimates using a weighted allele score had similar properties to estimates using multiple instruments. Under the correct conditions, multiple instrument analyses are a promising approach for Mendelian randomisation studies. Further research is required into multiple imputation methods to address missing data issues in IV estimation
    corecore