7 research outputs found
Recommended from our members
Joint Multivariate Modelling and Prediction for Genetic and Biomedical Data
In the area of statistical genetics, classical genome-wide association studies (GWAS) assess the association between a biological characteristic and genetic variants, working with one variant at a time in a regression model, and reporting the most significant associations. These studies test genetic markers individually, even though the data may exhibit multivariate structure due to the way genes are transmitted together from the parents to the offspring. Despite considering covariates like age and sex in the model, the classical GWAS does not account for the joint effects of genetic variants. Moreover, when multiple genetic variants within a gene have small effects on a phenotype, testing them individually can lack statistical power, but testing them together in a joint model can be more useful in pooling together all the evidence. In this thesis, I reviewed different multivariate testing procedures in joint multivariate model settings, explored their properties, and demonstrated them in further real-life database applications, such as enhancing statistical power by conditioning on major variants.
I studied the mathematical properties of various multivariate test procedures, particularly within the context of multiple linear regression. Considering the theoretical aspect as well as their availability in literature, I adapt various multivariate test procedures for canonical correlation in multiple regression settings. These procedures have been demonstrated to asymptotically follow the chi-square distribution. Importantly, these test procedures exhibit asymptotic equivalence among themselves and with the Wald test statistic. This indicates that the Wald test statistic may be sufficient for future studies, given its equivalence to the multivariate test procedures.
In many cases, there are known databases of major genetic variants that have a substantial effect on the trait. In such situations, it makes sense statistically to condition on these major variants to improve power in detecting associations with new variants, but this is not a common practice in GWAS applications. In this study, we also showed theoretically and computationally how conducting a joint analysis of the genetic variants in a multiple regression model, where the estimated effect of a new variant is conditioned upon some major variants, can improve the performance of the model in terms of reducing the standard error and improving the power. The amount of gain of power will depend on the correlation between the response and the covariates, as well as the correlation
between the covariates. I further show that conditional results can sometimes
be obtained from publicly available summary statistics reported for univariate associations in published GWAS studies, even when the individual-level data are unavailable. A prominent example of such a trait is skin color, for which there are many studies consistently identifying a handful of major genes. I looked into a dataset of over 6,500 mixed-ethnicity Latin Americans to see how the conditioning process can improve the detection power of GWAS studies and identify new genetic variants in such a situation.
In practical applications, the statistical models I worked with for association testing can be carried forward for predictive purposes in new datasets. In this thesis, I have also demonstrated mathematical formulations of prediction errors in different linear models, including simple linear regression models, as well as shrinkage methods like ridge regression and lasso regression. These expressions for prediction errors show the inherent trade-off between bias and variance at both individual data points and across a set of observations. Moreover, these formulations have found the connections between prediction errors and genetic heritability that can enhance prediction performance in genetic association studies. Additionally, I reviewed various statistical and machine learning predictive models. Based on a dental morphology dataset, I compared their performance using classification metrics such as average error rate and maximum classification error rate per specimen
Recommended from our members
Exploring The Diversity Of Our Skin
This report will discuss the process of collecting skin tone measurements from a diverse group of participants and analyse the accuracies of machine-based readings through comparing with reference measurements. The machines usually take the readings of the skin tone level by generating the images and analysing them with various statistical techniques to read the information received. Most methods used today can be completed using machines and technology and we have taken advantage of this by obtaining image samples of the participants faces or/and parts of their arms and have the image processed by self-created algorithm to get the colour values of each image sample for every sample given. Instead of the results received being based on societies ethnicities the colours are received in RGB (red, green, and blue) reflectance values with a median number between 0 and 255 for each image sample processed. Using this information, we can observe the accuracy that is assumed for the machine to have when reading the skin tone of a range of skin tones and use the results as a basis to improve accuracy in machines when it comes to recognising those of a darker skin tone removing the Caucasian bias within them
Fully automatic landmarking of 2D photographs identifies novel genetic loci influencing facial features
We report a genome-wide association study for facial features in > 6,000 Latin Americans. We placed 106 landmarks on 2D frontal photographs using the cloud service platform Face++. After Procrustes superposition, genome-wide association testing was performed for 301 inter-landmark distances. We detected nominally significant association (P-value < 5×10− 8) for 42 genome regions. Of these, 9 regions have been previously reported in GWAS of facial features. In follow-up analyses, we replicated 26 of the 33 novel regions (in East Asians or Europeans). The replicated regions include 1q32.3, 3q21.1, 8p11.21, 10p11.1, and 22q12.1, all comprising strong candidate genes involved in craniofacial development. Furthermore, the 1q32.3 region shows evidence of introgression from archaic humans. These results provide novel biological insights into facial variation and establish that automatic landmarking of standard 2D photographs is a simple and informative approach for the genetic analysis of facial variation, suitable for the rapid analysis of large population samples.- Introduction - Results And Discussion -- Study sample and phenotyping -- Trait/covariate correlation and heritability -- Overview of GWAS results and integration with the literature -- Follow-up of genomic regions newly associated with facial features: Replication in two human cohorts -- Follow-up of genomic regions newly associated with facial features: effects in the mouse -- Genome annotations at associated loci - Conclusion - Methods -- Study subjects -- Genotype data -- Phenotyping -- Statistical genetic analysis -- Interaction of EDAR with other genes -- Expression analysis for significant SNPs -- Detection of archaic introgression near ATF3 and association with facial features -- Annotation of SNPs in FUMA -- Shape GWAS in outbred mic
Youth Engagement with Race and Faith at School:National Pupil Survey Headline Findings Report
Youth Engagement with Race and Faith at School is a peer-reviewed study funded by the Leverhulme Trust, and carried out by a team of researchers at University of Birmingham. The study runs from 2022 to 2025, and seeks to make a major contribution to understanding the factors in and out of schools that support young people to express themselves democratically on race and faith equality issues. Part of the study involves a national survey of Year 10 pupils and their teachers in state- funded mainstream secondary schools across England. This report presents key descriptive statistics from the pupil survey, and further group-specific and correlational analysis will be carried out in subsequent publications
Recommended from our members
Automatic landmarking identifies new loci associated with face morphology and implicates Neanderthal introgression in human nasal shape
We report a genome-wide association study of facial features in >6000 Latin Americans based on automatic landmarking of 2D portraits and testing for association with inter-landmark distances. We detected significant associations (P-value −8) at 42 genome regions, nine of which have been previously reported. In follow-up analyses, 26 of the 33 novel regions replicate in East Asians, Europeans, or Africans, and one mouse homologous region influences craniofacial morphology in mice. The novel region in 1q32.3 shows introgression from Neanderthals and we find that the introgressed tract increases nasal height (consistent with the differentiation between Neanderthals and modern humans). Novel regions include candidate genes and genome regulatory elements previously implicated in craniofacial development, and show preferential transcription in cranial neural crest cells. The automated approach used here should simplify the collection of large study samples from across the world, facilitating a cosmopolitan characterization of the genetics of facial features