Search CORE

726 research outputs found

OPENMENDEL: A Cooperative Programming Project for Statistical Genetics

Author: Bates Douglas M.
Chu Benjamin B.
German Christopher A.
Ji Sarah S.
Keys Kevin L.
Kim Juhyun
Ko Seyoon
Lange Kenneth
Mosher Gordon D.
Papp Jeanette C.
Sinsheimer Janet S.
Sobel Eric M.
Zhai Jing
Zhou Hua
Zhou Jin J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/02/2019
Field of study

Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDELproject (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.Comment: 16 pages, 2 figures, 2 table

arXiv.org e-Print Archive

eScholarship - University of California

Large-scale genomic prediction using singular value decomposition of the genotype matrix

Author: A Legarra
CR Henderson
DC Lay
G Campos de los
I Misztal
I Misztal
Ismo Strandén
Jørgen Ødegård
L Tusell
M Kimura
OF Christensen
P VanRaden
PM VanRaden
PM VanRaden
RL Fernando
T Hastie
T Meuwissen
T Meuwissen
THE Meuwissen
THE Meuwissen
Theo H. E. Meuwissen
Ulf Indahl
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Novel techniques for accelerating statistical operations on compressed genomic data

Author: Freudenberg Alexander
Publication venue
Publication date: 01/01/2023
Field of study

Over the last decades, the availability of genetic data has exploded and genomic information is widely used in a variety of fields today. While the cost of genotyping and sequence assembly has been steadily decreasing, software in quantitative genetics has been struggling to keep up with increasing computational demands. Many existing software solutions use strategies for shared-memory parallelism and instruction-level parallelism. However, partly due to a lack of suitable hardware instructions, the dissemination of software that utilizes accelerator hardware has been limited. In this thesis, novel methods for the efficient processing of genomic data are presented. By utilizing low-precision integer instructions on modern NVIDIA® GPUs, the necessity to decompress SNP data for statistical evaluations is avoided. Due to the memory efficiency of compressed genomic storage formats, datasets of large populations with a high number of SNPs can be analyzed on a single datacenter GPU. The benefits of these new techniques are demonstrated through examples of important quantities in quantitative genetics. First, it is shown that the analytical calculation of population statistics, such as the genomic relationship matrix or linkage disequilibrium, is significantly accelerated compared to existing methods. Second, the numerical evaluation of a single-step BLUP model is used to demonstrate that the use of accelerators can significantly reduce computing times required for estimating genetic values based on iterative-solver methods. Lastly, it is illustrated that the estimation of parameters for an important covariance model can be significantly improved

MAnnheim DOCument Server

The discovery of novel recessive genetic disorders in dairy cattle : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Animal Science at AL Rae Centre of Genetics and Breeding, Massey University, Palmerston North, New Zealand

Author: Reynolds Edwardo G M
Publication venue: 'Massey University'
Publication date: 01/01/2022
Field of study

The selection of desirable characteristics in livestock has resulted in the transmission of advantageous genetic variants for generations. The advent of artificial insemination has accelerated the propagation of these advantageous genetic variants and led to tremendous advances in animal productivity. However, this intensive selection has led to the rapid uptake of deleterious alleles as well. Recently, a recessive mutation in the GALNT2 gene was identified to dramatically impair growth and production traits in dairy cattle causing small calf syndrome. The research presented here seeks to further investigate the presence and impact of recessive mutations in dairy cattle. A primary aim of genetics is to identify causal variants and understand how they act to manipulate a phenotype. As datasets have expanded, larger analyses are now possible and statistical methods to discover causal mutations have become commonplace. One such method, the genome-wide association study (GWAS), presents considerable exploratory utility in identifying quantitative trait loci (QTL) and causal mutations. GWAS' have predominantly focused on identifying additive genetic effects assuming that each allele at a locus acts independently of the other, whereas non-additive effects including dominant, recessive, and epistatic effects have been neglected. Here, we developed a single-locus non-additive GWAS model intended for the detection of dominant and recessive genetic mechanisms. We applied our non-additive GWAS model to growth, developmental, and lactation phenotypes in dairy cattle. We identified several candidate causal mutations that are associated with moderate to large deleterious recessive disorders of animal welfare and production. These mutations included premature-stop (MUS81, ITGAL, LRCH4, RBM34), splice disrupting (FGD4, GALNT2), and missense (PLCD4, MTRF1, DPF2, DOCK8, SLC25A4, KIAA0556, IL4R) variants, and these occur at surprisingly high frequencies in cattle. We further investigated these candidates for anatomical, molecular, and metabolic phenotypes to understand how these disorders might manifest. In some cases, these mutations were analogous to disorder-causing mutations in other species, these included: Coffin-Siris syndrome (DPF2); Charcot Marie Tooth disease (FGD4); a congenital disorder of glycosylation (GALNT2); hyper Immunoglobulin-E syndrome (DOCK8); Joubert syndrome (KIAA0556); and mitochondrial disease (SLC25A4). These discoveries demonstrate that deleterious recessive mutations exist in dairy cattle at remarkably high frequencies and we are able to detect these disorders through modern genotyping and phenotyping capabilities. These are important findings that can be used to improve the health and productivity of dairy cattle in New Zealand and internationally

Massey Research Online

Metabolomics : a tool for studying plant biology

Author: Gullberg Jonas
Publication venue
Publication date: 01/09/2005
Field of study

In recent years new technologies have allowed gene expression, protein and metabolite profiles in different tissues and developmental stages to be monitored. This is an emerging field in plant science and is applied to diverse plant systems in order to elucidate the regulation of growth and development. The goal in plant metabolomics is to analyze, identify and quantify all low molecular weight molecules of plant organisms. The plant metabolites are extracted and analyzed using various sensitive analytical techniques, usually mass spectrometry (MS) in combination with chromatography. In order to compare the metabolome of different plants in a high through-put manner, a number of biological, analytical and data processing steps have to be performed. In the work underlying this thesis we developed a fast and robust method for routine analysis of plant metabolite patterns using Gas Chromatography-Mass Spectrometry (GC/MS). The method was performed according to Design of Experiment (DOE) to investigate factors affecting the extraction and derivatization of the metabolites from leaves of the plant Arabidopsis thaliana. The outcome of metabolic analysis by GC/MS is a complex mixture of approximately 400 overlapping peaks. Resolving (deconvoluting) overlapping peaks is time-consuming, difficult to automate and additional processing is needed in order to compare samples. To avoid deconvolution being a major bottleneck in high through-put analyses we developed a new semi-automated strategy using hierarchical methods for processing GC/MS data that can be applied to all samples simultaneously. The two methods include base-line correction of the non-processed MS-data files, alignment, time-window determinations, Alternating Regression and multivariate analysis in order to detect metabolites that differ in relative concentrations between samples. The developed methodology was applied to study the effects of the plant hormone GA on the metabolome, with specific emphasis on auxin levels in Arabidopsis thaliana mutants defective in GA biosynthesis and signalling. A large series of plant samples was analysed and the resulting data were processed in less than one week with minimal labour; similar to the time required for the GC/MS analyses of the samples

Epsilon Open Archive

Evolutionary algorithms in clustering: Challenging problem generation and search space adaptation

Author: Shand Cameron
Publication venue
Publication date: 01/08/2020
Field of study

The University of Manchester - Institutional Repository

Multivariat analyse som verktøy til forståelse og reduksjon av kompleksitet av matematiske modeller i systembiologi

Author: Isaeva Julia
Publication venue: Norwegian University of Life Sciences, Ås
Publication date: 01/01/2011
Field of study

In the area of systems biology, technologies develop very fast, which allows us to collect massive amounts of various data. The main interest of scientists is to receive an insight into the obtained data sets and discover their inherent properties. Since the data often are rather complex and intimidating equations may be required for modelling, data analysis can be quite challenging for the majority of bio-scientists who do not master advanced mathematics. In this thesis it is proposed to use multivariate statistical methods as a tool for understanding the properties of complex models used for describing biological systems. The methods of multivariate analysis employed in this thesis search for latent variables that form a basis of all processes in a system. This often reduces dimensions of the system and makes it easier to get the whole picture of what is going on. Thus, in this work, methods of multivariate analysis were used with a descriptive purpose in Papers I and IV to discover effects of input variables on a response. Often it is necessary to know a functional form that could have generated the collected data in order to study the behaviour of the system when one or another parameter is tuned. For this purpose, we propose the Direct Look-Up (DLU) approach that is claimed here to be a worthy alternative to the already existing fitting methods due to its high computational speed and ability to avoid many problems such as subjectivity, choice of initial values, local optima and so on (Papers II and III). Another aspect covered in this thesis is an interpretation of function parameters by the custom human language with the use of multivariate analysis. This would enable mathematicians and bio-scientists to understand each other when describing the same object. It was accomplished here by using the concept of a metamodel and sensory analysis in Paper IV. In Paper I, a similar approach was used even though the main focus of the paper was slightly different. The original aim of the article was to show the advantages of the multi-way GEMANOVA analysis over the traditional ANOVA analysis for certain types of data. However, in addition, the relationship between human profiling of data samples and function parameters was discovered. In situations when funds for conducting experiments are limited and it is unrealizable to study all possible parameter combinations, it is necessary to have a smart way of choosing a few but most representative conditions for a particular system. In Paper V Multi-level Binary Replacement design (MBR) was developed as such, which can also be used for searching for a relevant parameter range. This new design method was applied here in Papers II and IV for selection of samples for further analyses.Teknologiutviklingen innenfor systembiologien er nå så rask at det gir mulighet til å samle svært store datamengder på kort tid og til relativ lav pris. Hovedinteressen til forskerne er typisk å få innsikt i dataene og deres iboende egenskaper. Siden data kan være ganske komplekse og ofte beskrives ved kompliserte, gjerne ikke-lineære, funksjoner, kan dataanalyse være ganske utfordrende for mange bioforskere som ikke behersker avansert matematikk. I dette arbeidet er det foreslått å bruke multivariat statistisk analyse for å komme nærmere en forståelse av egenskapene av kompliserte modeller som blir brukt for å beskrive biologiske systemer. De multivariate metodene som er benyttet i denne avhandlingen søker etter latente variabler som utgjør en lineær basis og tilnærming til de komplekse prosessene i et system. Dermed kan man oppnå en forenkling av systemet som er lettere å tolke. I dette arbeidet ble multivariate analysemetoder brukt i denne beskrivende hensikten i Artikler (Papers) I og IV til å oppdage effekter av funksjonsparametre på egenskapene til komplekse matematiske modeller. Ofte er det nødvendig å finne en matematisk funksjon som kunne ha generert de innsamlede dataene for å studere oppførselen av systemet. Med den hensikt foreslår vi en metode for modelltilpasning ved DLU-metoden (the Direct Look-Up) som her påstås å være et verdifullt alternativ til de eksisterende estimeringsmetodene på grunn av høy fart og evne til å unngå typiske problemer som for eksempel subjektivitet, valg av initialverdier, lokale optima, m.m (Artikler II og III). Et annet aspekt dekket i denne avhandlingen er bruken av multivariat analyse til å gi tolking av matematiske funksjonsparametre ved hjelp av et dagligdags vokabular. Dette kan gjøre det enklere for matematikere og bioforskere å forstå hverandre når de beskriver det samme objektet. Det var utført her ved å benytte ideen om en metamodell og sensorisk analyse i Artikkel IV. I Artikkel I var en lignende metode også brukt for å få sensoriske beskrivelser av bilder generert fra differensiallikninger. Hovedfokuset i Artikkel I var imidlertid et annet, nemlig å vise fordelen ved multi-way GEMANOVA-analyse fremfor den tradisjonelle ANOVA-analysen for visse datatyper. I denne artikkelen ble GEMANOVA brukt til å avdekke sammenhengen mellom kompliserte kombinasjoner av funksjonsparametrene og bildedeskriptorer. I situasjoner der ressurser til å utføre eksperimenter er begrenset og det er umulig å prøve ut alle kombinasjoner av parametre, er det behov for metoder som kan bestemme et fåtall av parameterinnstillinger som er mest mulig representative for et bestemt system. I Artikkel V ble derfor Multi-level Binary Replacement (MBR) design utviklet som en sådan, og den kan også brukes for å søke etter et relevant parameterrom for datasimuleringer. Den nye designmetoden ble anvendt i Artikler II og IV for utvelgelse av parameterverdier for videre analyser

Brage NMBU

Accelerated matrix-vector multiplications for matrices involving genotype covariates with applications in genomic prediction

Author: Alexander Freudenberg
Jan Ten Napel
Jeremie Vandenplas
Martin Schlather
Ross Evans
Torsten Pook
Publication venue: Frontiers Media S.A.
Publication date: 01/08/2023
Field of study

In the last decade, a number of methods have been suggested to deal with large amounts of genetic data in genomic predictions. Yet, steadily growing population sizes and the suboptimal use of computational resources are pushing the practical application of these approaches to their limits. As an extension to the C/CUDA library miraculix, we have developed tailored solutions for the computation of genotype matrix multiplications which is a critical bottleneck in the empirical evaluation of many statistical models. We demonstrate the benefits of our solutions at the example of single-step models which make repeated use of this kind of multiplication. Targeting modern Nvidia® GPUs as well as a broad range of CPU architectures, our implementation significantly reduces the time required for the estimation of breeding values in large population sizes. miraculix is released under the Apache 2.0 license and is freely available at https://github.com/alexfreudenberg/miraculix

Directory of Open Access Journals

Accelerated matrix-vector multiplications for matrices involving genotype covariates with applications in genomic prediction

Author: Evans Ross
Freudenberg Alexander
Napel Jan ten
Pook Torsten
Schlather Martin
Vandenplas Jeremie
Publication venue: Frontiers Media
Publication date: 01/01/2023
Field of study

MAnnheim DOCument Server