13 research outputs found

    Genomic relatedness and diversity of Swedish native cattle breeds

    Get PDF
    International audienceAbstractBackgroundNative cattle breeds are important genetic resources given their adaptation to the local environment in which they are bred. However, the widespread use of commercial cattle breeds has resulted in a marked reduction in population size of several native cattle breeds worldwide. Therefore, conservation management of native cattle breeds requires urgent attention to avoid their extinction. To this end, we genotyped nine Swedish native cattle breeds with genome-wide 150 K single nucleotide polymorphisms (SNPs) to investigate the level of genetic diversity and relatedness between these breeds.ResultsWe used various SNP-based approaches on this dataset to connect the demographic history with the genetic diversity and population structure of these Swedish cattle breeds. Our results suggest that the Väne and Ringamåla breeds originating from southern Sweden have experienced population isolation and have a low genetic diversity, whereas the Fjäll breed has a large founder population and a relatively high genetic diversity. Based on the shared ancestry and the constructed phylogenetic trees, we identified two major clusters in Swedish native cattle. In the first cluster, which includes Swedish mountain cattle breeds, there was little differentiation among the Fjäll, Fjällnära, Swedish Polled, and Bohus Polled breeds. The second cluster consists of breeds from southern Sweden: Väne, Ringamåla and Swedish Red. Interestingly, we also identified sub-structuring in the Fjällnära breed, which indicates different breeding practices on the farms that maintain this breed.ConclusionsThis study represents the first comprehensive genome-wide analysis of the genetic relatedness and diversity in Swedish native cattle breeds. Our results show that different demographic patterns such as genetic isolation and cross-breeding have shaped the genomic diversity of Swedish native cattle breeds and that the Swedish mountain breeds have retained their authentic distinct gene pool without significant contribution from any of the other European cattle breeds that were included in this study

    GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

    Get PDF
    Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC\u27s data structure and algorithms are valuable for accelerating large-scale genomic research

    주의력 결핍/과잉행동장애의 신경 아형과 임상적 연관성

    Get PDF
    학위논문(석사) -- 서울대학교대학원 : 자연과학대학 뇌인지과학과, 2023. 2. 차지욱.Attention-deficit/hyperactivity disorder (ADHD) is one of childhoods most common neurodevelopmental disorders, typically characterized by inattention, impulsivity, and hyperactivity. Despite previous studies exploring brain abnormalities in ADHD, these studies have frequently compared ADHD to a control group, potentially overlooking the heterogeneity within ADHD. Given the challenge posed by the varying symptoms of ADHD in making accurate diagnoses and providing effective treatments, it is essential to understand the heterogeneity in ADHD. To this end, this study uncovered the heterogeneity of the structural brain in ADHD using unsupervised clustering modeling. The clustering model revealed two distinct groups of ADHD. Then, this study investigated the relationship between the identified ADHD subgroups and clinical characteristics in prepubertal children (ages 9-10 years old; the Adolescent Brain Cognitive Development study). Both subgroups showed higher levels of ADHD symptoms compared to non-ADHD individuals, but ADHD-2 had higher internalizing mood and genome-polygenic scores (GPSs) for bipolar disorder, BMI, and risk tolerance. The brain profiles of each subgroup showed that ADHD-1 had reduced cortical measures with only a few regions, while ADHD-2 had overall brain volume reductions and decreased surface area. Additionally, the longitudinal analysis revealed different developmental patterns, with ADHD-1 showing reductions in cortical and subcortical volume and ADHD-2 showing reduced cortical thickness. The findings suggest the possibility of different brain pathologies within ADHD and the need for further understanding to inform diagnostic strategies. In conclusion, this study sheds light on the heterogeneity of ADHD and the underlying brain differences between subgroups, providing insights for improved diagnostic and therapeutic approaches in the future.주의력 결핍/과잉행동 장애 (ADHD)는 아동기 가장 흔한 신경 발달 장애 중 하나로, 주의력 결핍, 충동, 과잉 행동을 특징으로 한다. ADHD 뇌에서의 구조적, 기능적 이상성은 대조군과 비교하여 발견되어 왔다. 그러나 이러한 접근은 ADHD내에서의 개인 변동성과 이질성을 반영하는데 어려움이 있다. 이를 해결하기 위해 본 연구에서는 감독되지 않은 클러스터링 모델을 사용하여 ADHD 뇌에서의 이질성을 분리하고, 분리된 하위 그룹이 서로 다른 임상적 특성과 관련되는지를 조사하고자 했다. 연구 결과, 클러스터링 모델은 두 개의 ADHD 하위 그룹을 밝혀냈다. 두 개의 ADHD 하위 그룹은 대조군과 비교하여 높은 ADHD 증상 수준을 보였지만, 양극성 장애, BMI, 위험 감수의 유전 점수와 내재화 기분 증상에 대해서는 ADHD-2 하위 그룹에서만 유의미한 높은 점수를 보였다. 각 하위 그룹의 뇌 프로파일에서는, ADHD-1은 일부 영역에서만 피질 측정치가 감소한 반면, ADHD-2는 전반적인 뇌 부피 및 표면적의 감소를 보였다. 종단 연구 결과에서는 ADHD-1은 피질 및 피질하 부피의 감소, ADHD-2 는 피질 두께의 감소를 주요 특징으로 하는 등 뇌 발달 과정에서의 패턴 차이를 보였다. 종합하면, 본 연구는 ADHD 뇌의 이질성과 하위 집단 간의 임상적 지표 및 뇌에서의 차이를 조명하여, 향후 진단 및 치료 접근법에 대한 통찰력을 제공한다.1. INTRODUCTION 1 1.1. Background 1 1.1.1. Attention-deficit/hyperactivity disorder (ADHD) 1 1.1.1.1. ADHD in childhood 1 1.1.1.2. Structural brain abnormalities in ADHD 2 1.1.1.3. Genetic influences on ADHD 4 1.1.2. Heterogeneity in ADHD 5 1.2. Purpose of Research 6 2. Materials and Methods 7 2.1. Participants 7 2.2. ADHD 8 2.2.1. ADHD assessment 8 2.2.2. Comorbid disorders 9 2.2.3. Medication treatment 11 2.3. Neuropsychological measures 12 2.3.1. Cognitive measures 12 2.3.2. Behavioral measures 13 2.4. Missing data imputation 14 2.5. MRI data acquisition and processing 15 2.5.1. Structural magnetic resonance imaging (sMRI) 15 2.5.2. Diffusion magnetic resonance imaging (dMRI) 16 2.5.3. Quality assessment and control 16 2.6. Genetic data acquisition and processing 17 2.6.1. Genotype data 17 2.6.2. Genetic relatedness inference 18 2.6.3. Genome-wide polygenic scores (GPSs) 18 2.7. Dissecting the heterogeneity of the brain structure in ADHD 19 2.7.1. Dimensionality reduction 19 2.7.2. Agglomerative hierarchical clustering analysis 20 2.8. Relation to ADHD subgroups and neuropsychological measures 20 3. Results 22 3.1. Demographic characteristics 22 3.2. Dissecting the heterogeneity of the ADHD brain 24 3.3. Relation to ADHD subgroups and demographic, cognitive and behavioral measures 26 3.4. Relation to ADHD subgroups and GPS measures 31 3.5. Relation to ADHD subgroups and brain measures 34 3.6. Developmental changes of each ADHD subgroup 38 4. DISCUSSION 42 4.1. Summary 42 4.2. Implication and perspective 43 4.3. Limitations and future research direction 45 4.4. Conclusion 47 CONTRIBUTION 48 BIBLIOGRAPHY 49 국문초록 61 ACKNOWLEDGMENT 62석

    The Mega2R package: R tools for accessing and processing genetic data in common formats [version 2; referees: 3 approved]

    Get PDF
    The standalone C++ Mega2 program has been facilitating data-reformatting for linkage and association analysis programs since 2000. Support for more analysis programs has been added over time. Currently, Mega2 converts data from several different genetic data formats (including PLINK, VCF, BCF, and IMPUTE2) into the specific data requirements for over 40 commonly-used linkage and association analysis programs (including Mendel, Merlin, Morgan, SHAPEIT, ROADTRIPS, MaCH/minimac3). Recently, Mega2 has been enhanced to use a SQLite database as an intermediate data representation. Additionally, Mega2 now stores bialleleic genotype data in a highly compressed form, like that of the GenABEL R package and the PLINK binary format. Our new Mega2R package now makes it easy to load Mega2 SQLite databases directly into R as data frames. In addition, Mega2R is memory efficient, keeping its genotype data in a compressed format, portions of which are only expanded when needed. Mega2R has functions that ease the process of applying gene-based tests by looping over genes, efficiently pulling out genotypes for variants within the desired boundaries. We have also created several more functions that illustrate how to use the data frames: these permit one to run the pedgene package to carry out gene-based association tests on family data, to run the SKAT package to carry out gene-based association tests, to output the Mega2R data as a VCF file and related files (for phenotype and family data), and to convert the data frames into GenABEL format. The Mega2R package enhances GenABEL since it supports additional input data formats (such as PLINK, VCF, and IMPUTE2) not currently supported by GenABEL. The Mega2 program and the Mega2R R package are both open source and are freely available, along with extensive documentation, from https://watson.hgen.pitt.edu/register for Mega2 and https://CRAN.R-project.org/package=Mega2R for Mega2R

    The Mega2R package: R tools for accessing and processing genetic data in common formats [version 1; referees: 2 approved]

    Get PDF
    The standalone C++ Mega2 program has been facilitating data-reformatting for linkage and association analysis programs since 2000. Support for more analysis programs has been added over time. Currently, Mega2 converts data from several different genetic data formats (including PLINK, VCF, BCF, and IMPUTE2) into the specific data requirements for over 40 commonly-used linkage and association analysis programs (including Mendel, Merlin, Morgan, SHAPEIT, ROADTRIPS, MaCH/minimac3). Recently, Mega2 has been enhanced to use a SQLite database as an intermediate data representation. Additionally, Mega2 now stores bialleleic genotype data in a highly compressed form, like that of the GenABEL R package and the PLINK binary format. Our new Mega2R package now makes it easy to load Mega2 SQLite databases directly into R as data frames. In addition, Mega2R is memory efficient, keeping its genotype data in a compressed format, portions of which are only expanded when needed. Mega2R has functions that ease the process of applying gene-based tests by looping over genes, efficiently pulling out genotypes for variants within the desired boundaries. We have also created several more functions that illustrate how to use the data frames: these permit one to run the pedgene package to carry out gene-based association tests on family data, to run the SKAT package to carry out gene-based association tests, to output the Mega2R data as a VCF file and related files (for phenotype and family data), and to convert the data frames into GenABEL format. The Mega2R package enhances GenABEL since it supports additional input data formats (such as PLINK, VCF, and IMPUTE2) not currently supported by GenABEL. The Mega2 program and the Mega2R R package are both open source and are freely available, along with extensive documentation, from https://watson.hgen.pitt.edu/register for Mega2 and https://CRAN.R-project.org/package=Mega2R for Mega2R

    Standardization of a methodology for identification and annotation of associations between single nucleotide polymorphisms and highly polygenic traits in ruminants

    Get PDF
    Given the importance of the production of ruminants, it is necessary to investigate the genetic variants associated with the traits of economic interest in these animals, as well as the biology underlying the genotype-phenotype associations. To conduct these associations, a widely used strategy is to perform genome-wide association studies (GWAS). The GWAS must have the support of adequate quality control (QC), to then identify the associations between genetic markers type SNP and phenotypes. Additionally, the biological contextualization of these associations starts from the annotation of the genes close to the associated markers. Currently, there are several tools, including R libraries, to perform these analyses. However, it is necessary to develop a tool that allows unifying the three main steps (QC, GWAS, and annotation) for species other than human. For the above, the present work developed a methodology that unified the three mentioned steps in the R environment. The generated code was submitted for publication and is freely available in the repository https://github.com/bojusemo/Diploid-GWAS. The code was tested in two populations of ruminants, the Colombian Creole Hair Sheep and Simmental cattle. In these populations, the SNPs with low quality were removed, there was no detected population stratification, and no samples were removed for low quality. The SNP OAR26_10469468.1 was associated with the meat tenderness of Colombian Creole hair sheep. This SNP is in the gene TENM3. TENM3 protein has two domains with functions associated with meat tenderness in cattle and pigs. The SNP BovineHD4100012055 was associated with birth weight in Simmental. The closest gene to this SNP is the olfactory receptor 52E8-like, which is a member of the protein family G protein-coupled receptor (GPCR). GPCR has associated with birth weight in humans. Six markers were associated with 305-day milk yield in Simmental. Neither the closest genes of these markers nor their protein domains have been reported as associated with milk production.Resumen: Dada la importancia que tiene la producción de rumiantes, es necesario investigar las variantes genéticas asociadas a las características de interés comercial de dichos animales, así como la biología subyacente a esas asociaciones genotipo-fenotipo. Para hacer dichas asociaciones, una estrategia ampliamente utilizada es realizar estudios de asociación del genoma completo (GWAS). Los GWAS deben partir de un filtro adecuado de la información de las variables y de los individuos, denominado control de calidad (QC), para luego identificar las asociaciones entre marcadores genéticos tipo SNP y los fenotipos. Por su parte, la contextualización biológica de estas asociaciones parte de la anotación de los genes cercanos a los marcadores asociados. Para realizar estos análisis, actualmente hay varias herramientas, incluidas librerías de R. Sin embargo, falta desarrollar una herramienta que permita unificar los tres principales pasos (QC, GWAS y anotación) para datos de especies distintas al humano en R. Por lo anterior, el presente trabajo desarrolló una metodología que unificó en el entorno de R los tres pasos mencionados. El código generado se sometió a publicación y se encuentran disponibles de manera libre en el repositorio https://github.com/bojusemo/Diploid-GWAS. El código fue probado en dos poblaciones de rumiantes, el Ovino de Pelo Criollo Colombiano y los bovinos Simmental. En estas poblaciones, se eliminaron los SNPs con una baja calidad, no se detectó estratificación poblacional y no se eliminaron muestras por baja calidad. El SNP OAR26_10469468.1 estuvo asociado con la terneza de la carne del Ovino de Pelo Criollo Colombiano. Éste SNP está en el gen TENM3. La proteína TENM3 tiene dos dominios con funciones asociadas con la terneza de la carne en bovinos y porcinos. El SNP BovineHD4100012055 estuvo asociado con el peso al nacimiento de Simmental. El gen más cercano a este SNP es el olfactory receptor 52E8-like, que pertenece a la familia de proteínas G protein-coupled receptor (GPCR). Se ha reportado asociación entre GPCR y el peso al nacimiento en humanos. Seis marcadores estuvieron asociados a la producción de leche a los 305 días en Simmental. Ni los genes más cercanos a los marcadores, ni los dominios de las proteínas han sido reportados como asociados con la producción de leche.Maestrí
    corecore