347 research outputs found

    Family names as indicators of Britain’s changing regional geography

    Get PDF
    In recent years the geography of surnames has become increasingly researched in genetics, epidemiology, linguistics and geography. Surnames provide a useful data source for the analysis of population structure, migrations, genetic relationships and levels of cultural diffusion and interaction between communities. The Worldnames database (www.publicprofiler.org/worldnames) of 300 million people from 26 countries georeferenced in many cases to the equivalent of UK Postcode level provides a rich source of surname data. This work has focused on the UK component of this dataset, that is the 2001 Enhanced Electoral Role, georeferenced to Output Area level. Exploratory analysis of the distribution of surnames across the UK shows that clear regions exist, such as Cornwall, Central Wales and Scotland, in agreement with anecdotal evidence. This study is concerned with applying a wide range of methods to the UK dataset to test their sensitivity and consistency to surname regions. Methods used thus far are hierarchical and non-hierarchical clustering, barrier algorithms, such as the Monmonier Algorithm, and Multidimensional Scaling. These, to varying degrees, have highlighted the regionality of UK surnames and provide strong foundations to future work and refinement in the UK context. Establishing a firm methodology has enabled comparisons to be made with data from the Great British 1881 census, developing insights into population movements from within and outside Great Britain

    Computation of restricted maximum likelihood estimates of variance components

    Get PDF
    The method preferred by animal breeders for the estimation of variance components is restricted maximum likelihood (REML). Various iterative algorithms have been proposed for computing REML estimates. Five different computational strategies for implementing such an algorithm were compared in terms of flops (floating-point operations). These strategies were based respectively on the LDL\u27 decomposition, the W transformation, the SWEEP method, tridiagonalization and diagonalization of the coefficient matrix of the mixed-model equations;The computational requirements of the orthogonal transformations employed in tridiagonalization and diagonalization were found to be rather extensive. However, these transformations are performed prior to the initiation of the iterative estimation process and need not be repeated during the remainder of the process. Subsequent to either diagonalization or tridiagonalization, the flops required per iteration are very minimal. Thus, for most applications of mixed-effects linear models with a single set of random effects, the use of an orthogonal transformation prior to the initiation of the iterative process is recommended. For most animal breeding applications, tridiagonalization will generally be more efficient than diagonalization;In most animal breeding applications, the coefficient matrix of the mixed-model equations is extremely sparse and of very large order. The use of sparse-matrix techniques for the numerical evaluation of the log-likelihood function and its first- and second-order partial derivatives was investigated in the case of the simple sire and animal models. Instead of applying these techniques directly to the coefficient matrix of the mixed-model equations to obtain the Cholesky factor, they were used to obtain the Cholesky factor indirectly by carrying out a QR decomposition of an augmented model matrix;The feasibility of the computational method for the simple sire model was investigated by carrying out the most computationally intensive part of this method (which is the part consisting of the QR decomposition) for an animal breeding data set comprising 180,994 records and 1,264 sires. The total CPU time required for this part (using an NAS AS/9160 computer) was approximately 75,000 seconds

    Inference about complex relationships using peak height data from DNA mixtures

    Get PDF
    In both criminal cases and civil cases there is an increasing demand for the analysis of DNA mixtures involving relationships. The goal might be, for example, to identify the contributors to a DNA mixture where the donors may be related, or to infer the relationship between individuals based on a mixture. This paper introduces an approach to modelling and computation for DNA mixtures involving contributors with arbitrarily complex relationships. It builds on an extension of Jacquard's condensed coefficients of identity, to specify and compute with joint relationships, not only pairwise ones, including the possibility of inbreeding. The methodology developed is applied to two casework examples involving a missing person, and simulation studies of performance, in which the ability of the methodology to recover complex relationship information from synthetic data with known `true' family structure is examined. The methods used to analyse the examples are implemented in the new KinMix R package, that extends the DNAmixtures package to allow for modelling DNA mixtures with related contributors. KinMix inherits from DNAmixtures the capacity to deal with mixtures with many contributors, in a time- and space-efficient way.Comment: 29 pages, 12 figures, 20 tables; V2 has different casework examples, and general minor edits; V3 has general edits following review, including lengthier exposition; V4 has further explanation, and a supplementary appendix on related softwar

    High performance computing for large-scale genomic prediction

    Get PDF
    In the past decades genetics was studied intensively leading to the knowledge that DNA is the molecule behind genetic inheritance and starting from the new millennium next-generation sequencing methods made it possible to sample this DNA with an ever decreasing cost. Animal and plant breeders have always made use of genetic information to predict agronomic performance of new breeds. While this genetic information previously was gathered from the pedigree of the population under study, genomic information of the DNA makes it possible to also deduce correlations between individuals that do not share any known ancestors leading to so-called genomic prediction of agronomic performance. Nowadays, the number of informative samples that can be taken from a genome ranges from one thousand to one million. Using all this information in a breeding context where agronomic performance is predicted and optimized for different environmental conditions is not a straightforward task. Moreover, the number of individuals for which this information is available keeps on growing and thus sophisticated computational methods are required for analyzing these large scale genomic data sets. This thesis introduces some concepts of high performance computing in a genomic prediction context and shows that analyzing phenotypic records of large numbers of genotyped individuals leads to a better prediction accuracy of the agronomic performance in different environments. Finally, it is even shown that the parts of the DNA that influence the agronomic performance under certain environmental conditions can be pinpointed, and this knowledge can thus be used by breeders to select individuals that thrive better in the targeted environment

    Analysis of combinatorial search spaces for a class of NP-hard problems, An

    Get PDF
    2011 Spring.Includes bibliographical references.Given a finite but very large set of states X and a real-valued objective function ƒ defined on X, combinatorial optimization refers to the problem of finding elements of X that maximize (or minimize) ƒ. Many combinatorial search algorithms employ some perturbation operator to hill-climb in the search space. Such perturbative local search algorithms are state of the art for many classes of NP-hard combinatorial optimization problems such as maximum k-satisfiability, scheduling, and problems of graph theory. In this thesis we analyze combinatorial search spaces by expanding the objective function into a (sparse) series of basis functions. While most analyses of the distribution of function values in the search space must rely on empirical sampling, the basis function expansion allows us to directly study the distribution of function values across regions of states for combinatorial problems without the need for sampling. We concentrate on objective functions that can be expressed as bounded pseudo-Boolean functions which are NP-hard to solve in general. We use the basis expansion to construct a polynomial-time algorithm for exactly computing constant-degree moments of the objective function ƒ over arbitrarily large regions of the search space. On functions with restricted codomains, these moments are related to the true distribution by a system of linear equations. Given low moments supplied by our algorithm, we construct bounds of the true distribution of ƒ over regions of the space using a linear programming approach. A straightforward relaxation allows us to efficiently approximate the distribution and hence quickly estimate the count of states in a given region that have certain values under the objective function. The analysis is also useful for characterizing properties of specific combinatorial problems. For instance, by connecting search space analysis to the theory of inapproximability, we prove that the bound specified by Grover's maximum principle for the Max-Ek-Lin-2 problem is sharp. Moreover, we use the framework to prove certain configurations are forbidden in regions of the Max-3-Sat search space, supplying the first theoretical confirmation of empirical results by others. Finally, we show that theoretical results can be used to drive the design of algorithms in a principled manner by using the search space analysis developed in this thesis in algorithmic applications. First, information obtained from our moment retrieving algorithm can be used to direct a hill-climbing search across plateaus in the Max-k-Sat search space. Second, the analysis can be used to control the mutation rate on a (1+1) evolutionary algorithm on bounded pseudo-Boolean functions so that the offspring of each search point is maximized in expectation. For these applications, knowledge of the search space structure supplied by the analysis translates to significant gains in the performance of search

    Application of DNA marker systems to test for genetic imprints of habitat fragmentation in Juniperus communis L. on different spatial and temporal scales-Integration of scientific knowledge into conservation measures

    Get PDF
    The formerly continuous and widespread juniper (Juniperus communis L.) populations are currently divided into small and fragmented relics in numerous European countries. Additionally, many of these populations suffer from an absence of natural regeneration and consist predominantly of senescent individuals. In order to maintain juniper as a valuable element of the cultural landscape in Europe, I considered a restoration management to be indispensable. The goal of the present thesis is based on this consideration. Using different DNA marker systems, I firstly tested various juniper populations on different spatial and temporal scales for potential imprints of habitat fragmentation. Afterwards my intention was to evaluate the analysed populations on the regional scale in terms of nature conservation and to develop a scientifically based conservation management plan, which should focus on planting activities. In a Europe-wide study I used an AFLP marker approach to reconstruct aspects of the biogeographic history of juniper and to detect potential distinct genetic lineages. Those lineages are supposed to delineate geographic regions within which plant material can be interchanged. The genetic analysis revealed no distinct genetic lineages. Along with other scientific findings about juniper the results point to a glacial persistence of juniper in Central Europe. I suppose that during the last glacial period, this species managed to survive in several small and suitable habitats, which were probably diffusely scattered and permanently changing. Moreover, I hypothesise that recurrent fragmentation and founder events since the last glacial maximum (LGM) up until today are highly likely to have occurred in this species. On a regional scale, i.e. in the Rhenish Uplands (RU) (West Germany), I used nSSR markers to gain insights into the genetic structure and variation of eight relict juniper populations. Such knowledge is necessary for planting activities in order to prevent negative effects in the respective populations. At the same time, I tested these populations for genetic imprints of the recent habitat fragmentation. The investigated nSSR loci in juniper were characterised within the scope of this thesis. A detailed validation of the newly developed nSSR markers is presented. In addition, I performed a case study by investigating the genetic diversity and differentiation of different pollen clouds, which have become reproductively effective in the filial generations (embryos). For this purpose a specific computer software was developed. Next, a palynological study was conducted to determine physical pollen flow distances of juniper pollen grains. In terms of plantings I assume that such data is relevant for the spatial organisation of already existing juniper individuals and the respective plant material. Considerably high levels of genetic diversity and an absence of recent genetic bottlenecks in all populations as well as an absence of an isolation- by-distance effect led me to the assumption that the current habitat fragmentation has not yet affected the genetic diversity in the investigated juniper populations. Instead, I postulate that the genetic diversity and differentiation have been ‘frozen’ since the recent fragmentation started. The genetic diversity of the filial generation is not reduced in comparison to the adult generation, although the palynological study points to locally restricted pollen flow distances. After defining a ‘Leitbild’ for viable juniper populations based on widely accepted population ecological and genetic theories, I used the genetic results to evaluate the analysed populations of the RU in terms of nature conservation. The reasons why this evaluation was not satisfactory are discussed in detail. Further on, I commented critically on the evaluation criteria of the ‘Leitbild’ and their respective quality demands with regard to the life-history traits of juniper and its biogeographic history as presented within this thesis. Based on the presented genetic results and on the apparent absence of natural regeneration in all populations it remains uncertain whether the current habitat fragmentation will affect the genetic diversity and structure of the eight populations deleteriously in the future. However, if juniper will not start with natural regeneration again, this will certainly lead to an extinction of the respective populations because without substitution, senescent individuals will gradually die off. Thus, in the distant future juniper will probably become extinct in areas where it does not regenerate naturally. Therefore, I developed a sustainable, demographically and genetically substantiated restoration management plan as a final outcome of this thesis. It is based on the genetic analysis presented here and on expert knowledge, and it includes guidelines and recommendations concerning the collection of plant material, its treatment in the greenhouse and plantings in the field

    Marker-based prediction of hybrid maize performance using genetic evaluation data

    Get PDF

    Characterization of the genetic structure of the azorean population

    Get PDF
    Tese de doutoramento em Bioquímica (Genética Molecular), apresentada à Universidade de Lisboa através da Faculdade de Ciências, 200
    corecore