Search CORE

347 research outputs found

Family names as indicators of Britain’s changing regional geography

Author: Cheshire JA
Longley PA
Mateos P
Publication venue: Centre for Advanced Spatial Analysis, UCL
Publication date: 01/01/2009
Field of study

In recent years the geography of surnames has become increasingly researched in genetics, epidemiology, linguistics and geography. Surnames provide a useful data source for the analysis of population structure, migrations, genetic relationships and levels of cultural diffusion and interaction between communities. The Worldnames database (www.publicprofiler.org/worldnames) of 300 million people from 26 countries georeferenced in many cases to the equivalent of UK Postcode level provides a rich source of surname data. This work has focused on the UK component of this dataset, that is the 2001 Enhanced Electoral Role, georeferenced to Output Area level. Exploratory analysis of the distribution of surnames across the UK shows that clear regions exist, such as Cornwall, Central Wales and Scotland, in agreement with anecdotal evidence. This study is concerned with applying a wide range of methods to the UK dataset to test their sensitivity and consistency to surname regions. Methods used thus far are hierarchical and non-hierarchical clustering, barrier algorithms, such as the Monmonier Algorithm, and Multidimensional Scaling. These, to varying degrees, have highlighted the regionality of UK surnames and provide strong foundations to future work and refinement in the UK context. Establishing a firm methodology has enabled comparisons to be made with data from the Great British 1881 census, developing insights into population movements from within and outside Great Britain

UCL Discovery

Computation of restricted maximum likelihood estimates of variance components

Author: Takahashi Hiroshi
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1993
Field of study

The method preferred by animal breeders for the estimation of variance components is restricted maximum likelihood (REML). Various iterative algorithms have been proposed for computing REML estimates. Five different computational strategies for implementing such an algorithm were compared in terms of flops (floating-point operations). These strategies were based respectively on the LDL\u27 decomposition, the W transformation, the SWEEP method, tridiagonalization and diagonalization of the coefficient matrix of the mixed-model equations;The computational requirements of the orthogonal transformations employed in tridiagonalization and diagonalization were found to be rather extensive. However, these transformations are performed prior to the initiation of the iterative estimation process and need not be repeated during the remainder of the process. Subsequent to either diagonalization or tridiagonalization, the flops required per iteration are very minimal. Thus, for most applications of mixed-effects linear models with a single set of random effects, the use of an orthogonal transformation prior to the initiation of the iterative process is recommended. For most animal breeding applications, tridiagonalization will generally be more efficient than diagonalization;In most animal breeding applications, the coefficient matrix of the mixed-model equations is extremely sparse and of very large order. The use of sparse-matrix techniques for the numerical evaluation of the log-likelihood function and its first- and second-order partial derivatives was investigated in the case of the simple sire and animal models. Instead of applying these techniques directly to the coefficient matrix of the mixed-model equations to obtain the Cholesky factor, they were used to obtain the Cholesky factor indirectly by carrying out a QR decomposition of an augmented model matrix;The feasibility of the computational method for the simple sire model was investigated by carrying out the most computationally intensive part of this method (which is the part consisting of the QR decomposition) for an animal breeding data set comprising 180,994 records and 1,264 sires. The total CPU time required for this part (using an NAS AS/9160 computer) was approximately 75,000 seconds

Digital Repository @ Iowa State University (ISU)

Inference about complex relationships using peak height data from DNA mixtures

Author: Green Peter J
Mortera Julia
Publication venue: 'Wiley'
Publication date: 09/02/2021
Field of study

In both criminal cases and civil cases there is an increasing demand for the analysis of DNA mixtures involving relationships. The goal might be, for example, to identify the contributors to a DNA mixture where the donors may be related, or to infer the relationship between individuals based on a mixture. This paper introduces an approach to modelling and computation for DNA mixtures involving contributors with arbitrarily complex relationships. It builds on an extension of Jacquard's condensed coefficients of identity, to specify and compute with joint relationships, not only pairwise ones, including the possibility of inbreeding. The methodology developed is applied to two casework examples involving a missing person, and simulation studies of performance, in which the ability of the methodology to recover complex relationship information from synthetic data with known `true' family structure is examined. The methods used to analyse the examples are implemented in the new KinMix R package, that extends the DNAmixtures package to allow for modelling DNA mixtures with related contributors. KinMix inherits from DNAmixtures the capacity to deal with mixtures with many contributors, in a time- and space-efficient way.Comment: 29 pages, 12 figures, 20 tables; V2 has different casework examples, and general minor edits; V3 has general edits following review, including lengthier exposition; V4 has further explanation, and a supplementary appendix on related softwar

arXiv.org e-Print Archive

Explore Bristol Research

High performance computing for large-scale genomic prediction

Author: De Coninck Arne
Publication venue: Ghent University. Faculty of Bioscience Engineering
Publication date: 01/01/2016
Field of study

In the past decades genetics was studied intensively leading to the knowledge that DNA is the molecule behind genetic inheritance and starting from the new millennium next-generation sequencing methods made it possible to sample this DNA with an ever decreasing cost. Animal and plant breeders have always made use of genetic information to predict agronomic performance of new breeds. While this genetic information previously was gathered from the pedigree of the population under study, genomic information of the DNA makes it possible to also deduce correlations between individuals that do not share any known ancestors leading to so-called genomic prediction of agronomic performance. Nowadays, the number of informative samples that can be taken from a genome ranges from one thousand to one million. Using all this information in a breeding context where agronomic performance is predicted and optimized for different environmental conditions is not a straightforward task. Moreover, the number of individuals for which this information is available keeps on growing and thus sophisticated computational methods are required for analyzing these large scale genomic data sets. This thesis introduces some concepts of high performance computing in a genomic prediction context and shows that analyzing phenotypic records of large numbers of genotyped individuals leads to a better prediction accuracy of the agronomic performance in different environments. Finally, it is even shown that the parts of the DNA that influence the agronomic performance under certain environmental conditions can be pinpointed, and this knowledge can thus be used by breeders to select individuals that thrive better in the targeted environment

Ghent University Academic Bibliography

Analysis of combinatorial search spaces for a class of NP-hard problems, An

Author: Sutton Andrew M.
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2011
Field of study

2011 Spring.Includes bibliographical references.Given a finite but very large set of states X and a real-valued objective function ƒ defined on X, combinatorial optimization refers to the problem of finding elements of X that maximize (or minimize) ƒ. Many combinatorial search algorithms employ some perturbation operator to hill-climb in the search space. Such perturbative local search algorithms are state of the art for many classes of NP-hard combinatorial optimization problems such as maximum k-satisfiability, scheduling, and problems of graph theory. In this thesis we analyze combinatorial search spaces by expanding the objective function into a (sparse) series of basis functions. While most analyses of the distribution of function values in the search space must rely on empirical sampling, the basis function expansion allows us to directly study the distribution of function values across regions of states for combinatorial problems without the need for sampling. We concentrate on objective functions that can be expressed as bounded pseudo-Boolean functions which are NP-hard to solve in general. We use the basis expansion to construct a polynomial-time algorithm for exactly computing constant-degree moments of the objective function ƒ over arbitrarily large regions of the search space. On functions with restricted codomains, these moments are related to the true distribution by a system of linear equations. Given low moments supplied by our algorithm, we construct bounds of the true distribution of ƒ over regions of the space using a linear programming approach. A straightforward relaxation allows us to efficiently approximate the distribution and hence quickly estimate the count of states in a given region that have certain values under the objective function. The analysis is also useful for characterizing properties of specific combinatorial problems. For instance, by connecting search space analysis to the theory of inapproximability, we prove that the bound specified by Grover's maximum principle for the Max-Ek-Lin-2 problem is sharp. Moreover, we use the framework to prove certain configurations are forbidden in regions of the Max-3-Sat search space, supplying the first theoretical confirmation of empirical results by others. Finally, we show that theoretical results can be used to drive the design of algorithms in a principled manner by using the search space analysis developed in this thesis in algorithmic applications. First, information obtained from our moment retrieving algorithm can be used to direct a hill-climbing search across plateaus in the Max-k-Sat search space. Second, the analysis can be used to control the mutation rate on a (1+1) evolutionary algorithm on bounded pseudo-Boolean functions so that the offspring of each search point is maximized in expectation. For these applications, knowledge of the search space structure supplied by the analysis translates to significant gains in the performance of search

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Application of DNA marker systems to test for genetic imprints of habitat fragmentation in Juniperus communis L. on different spatial and temporal scales-Integration of scientific knowledge into conservation measures

Author: Michalczyk Inga Maria
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2008
Field of study

The formerly continuous and widespread juniper (Juniperus communis L.) populations are currently divided into small and fragmented relics in numerous European countries. Additionally, many of these populations suffer from an absence of natural regeneration and consist predominantly of senescent individuals. In order to maintain juniper as a valuable element of the cultural landscape in Europe, I considered a restoration management to be indispensable. The goal of the present thesis is based on this consideration. Using different DNA marker systems, I firstly tested various juniper populations on different spatial and temporal scales for potential imprints of habitat fragmentation. Afterwards my intention was to evaluate the analysed populations on the regional scale in terms of nature conservation and to develop a scientifically based conservation management plan, which should focus on planting activities. In a Europe-wide study I used an AFLP marker approach to reconstruct aspects of the biogeographic history of juniper and to detect potential distinct genetic lineages. Those lineages are supposed to delineate geographic regions within which plant material can be interchanged. The genetic analysis revealed no distinct genetic lineages. Along with other scientific findings about juniper the results point to a glacial persistence of juniper in Central Europe. I suppose that during the last glacial period, this species managed to survive in several small and suitable habitats, which were probably diffusely scattered and permanently changing. Moreover, I hypothesise that recurrent fragmentation and founder events since the last glacial maximum (LGM) up until today are highly likely to have occurred in this species. On a regional scale, i.e. in the Rhenish Uplands (RU) (West Germany), I used nSSR markers to gain insights into the genetic structure and variation of eight relict juniper populations. Such knowledge is necessary for planting activities in order to prevent negative effects in the respective populations. At the same time, I tested these populations for genetic imprints of the recent habitat fragmentation. The investigated nSSR loci in juniper were characterised within the scope of this thesis. A detailed validation of the newly developed nSSR markers is presented. In addition, I performed a case study by investigating the genetic diversity and differentiation of different pollen clouds, which have become reproductively effective in the filial generations (embryos). For this purpose a specific computer software was developed. Next, a palynological study was conducted to determine physical pollen flow distances of juniper pollen grains. In terms of plantings I assume that such data is relevant for the spatial organisation of already existing juniper individuals and the respective plant material. Considerably high levels of genetic diversity and an absence of recent genetic bottlenecks in all populations as well as an absence of an isolation- by-distance effect led me to the assumption that the current habitat fragmentation has not yet affected the genetic diversity in the investigated juniper populations. Instead, I postulate that the genetic diversity and differentiation have been ‘frozen’ since the recent fragmentation started. The genetic diversity of the filial generation is not reduced in comparison to the adult generation, although the palynological study points to locally restricted pollen flow distances. After defining a ‘Leitbild’ for viable juniper populations based on widely accepted population ecological and genetic theories, I used the genetic results to evaluate the analysed populations of the RU in terms of nature conservation. The reasons why this evaluation was not satisfactory are discussed in detail. Further on, I commented critically on the evaluation criteria of the ‘Leitbild’ and their respective quality demands with regard to the life-history traits of juniper and its biogeographic history as presented within this thesis. Based on the presented genetic results and on the apparent absence of natural regeneration in all populations it remains uncertain whether the current habitat fragmentation will affect the genetic diversity and structure of the eight populations deleteriously in the future. However, if juniper will not start with natural regeneration again, this will certainly lead to an extinction of the respective populations because without substitution, senescent individuals will gradually die off. Thus, in the distant future juniper will probably become extinct in areas where it does not regenerate naturally. Therefore, I developed a sustainable, demographically and genetically substantiated restoration management plan as a final outcome of this thesis. It is based on the genetic analysis presented here and on expert knowledge, and it includes guidelines and recommendations concerning the collection of plant material, its treatment in the greenhouse and plantings in the field

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg