7 research outputs found
Recommended from our members
Novel regularization models for dynamic and discrete response data
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonRegularized regression models have gained popularity in recent years. The addition of a penalty term to the likelihood function allows parameter estimation where traditional methods fail, such as in the p » n case. The use of an l1 penalty in particular leads to simultaneous parameter estimation and variable selection, which is rather convenient in practice. Moreover, computationally efficient algorithms make these methods really attractive in many applications. This thesis is inspired by this literature and investigates the development of novel penalty functions and regression methods within this context. In particular, Chapter 2 deals with linear models for time-dependent response and explanatory variables. This is beyond the independent framework which is common to many of the developed regularized regression models. We propose to account for the time dependency in the data by explicitly adding autoregressive terms to the response variable together with an autoregressive process for the residuals. In addition, the use of a l1 penalized likelihood approach for parameter estimation leads to automatic order and variable selection and makes this method feasible for high-dimensional data. Theoretical properties of the estimators are provided and an extensive simulation study is performed. Finally, we show the application of the model on air pollution and stock market data and discuss its implementation in the R package DREGAR, which is freely available in CRAN. In Chapter 3, we develop a new penalty function. Despite all the advantages of the l1 penalty, this penalty is not differentiable at zero, and neither are the alternatives that are proposed in the literature. The only exception is the ridge penalty, which does not lead to variable selection. Motivated by this gap, and noting the advantages that a differentiable penalty can give, such as increased computational efficiency in some cases and the derivation of more accurate model selection criteria, we develop a new penalty function based on the error function. We study the theoretical properties of this function and of the estimators obtained in a regularized regression context. Finally, we perform a simulation study and we use the new penalty to analyse a diabetes and prostate cancer dataset. The new method is implemented in the R package DLASSO, that is freely available in CRAN. Finally, Chapter 4 deals with regression models for discrete response data, which is frequently collected in many application areas. In particular, we consider a discrete Weibull regression model that has recently been introduced in the literature. In this chapter, we propose the first Bayesian implementation of this model. We consider a general parametrization, where both parameters of the discrete Weibull distribution can be conditioned on the predictors, and show theoretically how, under a uniform noninformative
prior, the posterior distribution is proper with finite moments. In addition, we consider closely the case of Laplace priors for parameter shrinkage and variable selection. A simulation study and the analysis of four real datasets of medical records show the applicability of this approach to the analysis of count data. The method is implemented in the R package BDWreg, which is freely available in CRAN
The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation.
The International Mouse Phenotyping Consortium (IMPC) is building a catalogue of mammalian gene function by producing and phenotyping a knockout mouse line for every protein-coding gene. To date, the IMPC has generated and characterised 5186 mutant lines. One-third of the lines have been found to be non-viable and over 300 new mouse models of human disease have been identified thus far. While current bioinformatics efforts are focused on translating results to better understand human disease processes, IMPC data also aids understanding genetic function and processes in other species. Here we show, using gorilla genomic data, how genes essential to development in mice can be used to help assess the potentially deleterious impact of gene variants in other species. This type of analyses could be used to select optimal breeders in endangered species to maintain or increase fitness and avoid variants associated to impaired-health phenotypes or loss-of-function mutations in genes of critical importance. We also show, using selected examples from various mammal species, how IMPC data can aid in the identification of candidate genes for studying a condition of interest, deliver information about the mechanisms involved, or support predictions for the function of genes that may play a role in adaptation. With genotyping costs decreasing and the continued improvements of bioinformatics tools, the analyses we demonstrate can be routinely applied
Mendelian gene identification through mouse embryo viability screening.
BACKGROUND: The diagnostic rate of Mendelian disorders in sequencing studies continues to increase, along with the pace of novel disease gene discovery. However, variant interpretation in novel genes not currently associated with disease is particularly challenging and strategies combining gene functional evidence with approaches that evaluate the phenotypic similarities between patients and model organisms have proven successful. A full spectrum of intolerance to loss-of-function variation has been previously described, providing evidence that gene essentiality should not be considered as a simple and fixed binary property.
METHODS: Here we further dissected this spectrum by assessing the embryonic stage at which homozygous loss-of-function results in lethality in mice from the International Mouse Phenotyping Consortium, classifying the set of lethal genes into one of three windows of lethality: early, mid, or late gestation lethal. We studied the correlation between these windows of lethality and various gene features including expression across development, paralogy and constraint metrics together with human disease phenotypes. We explored a gene similarity approach for novel gene discovery and investigated unsolved cases from the 100,000 Genomes Project.
RESULTS: We found that genes in the early gestation lethal category have distinct characteristics and are enriched for genes linked with recessive forms of inherited metabolic disease. We identified several genes sharing multiple features with known biallelic forms of inborn errors of the metabolism and found signs of enrichment of biallelic predicted pathogenic variants among early gestation lethal genes in patients recruited under this disease category. We highlight two novel gene candidates with phenotypic overlap between the patients and the mouse knockouts.
CONCLUSIONS: Information on the developmental period at which embryonic lethality occurs in the knockout mouse may be used for novel disease gene discovery that helps to prioritise variants in unsolved rare disease cases
Comprehensive ECG reference intervals in C57BL/6N substrains provide a generalizable guide for cardiac electrophysiology studies in mice.
Reference ranges provide a powerful tool for diagnostic decision-making in clinical medicine and are enormously valuable for understanding normality in pre-clinical scientific research that uses in vivo models. As yet, there are no published reference ranges for electrocardiography (ECG) in the laboratory mouse. The first mouse-specific reference ranges for the assessment of electrical conduction are reported herein generated from an ECG dataset of unprecedented scale. International Mouse Phenotyping Consortium data from over 26,000 conscious or anesthetized C57BL/6N wildtype control mice were stratified by sex and age to develop robust ECG reference ranges. Interesting findings include that heart rate and key elements from the ECG waveform (RR-, PR-, ST-, QT-interval, QT corrected, and QRS complex) demonstrate minimal sexual dimorphism. As expected, anesthesia induces a decrease in heart rate and was shown for both inhalation (isoflurane) and injectable (tribromoethanol) anesthesia. In the absence of pharmacological, environmental, or genetic challenges, we did not observe major age-related ECG changes in C57BL/6N-inbred mice as the differences in the reference ranges of 12-week-old compared to 62-week-old mice were negligible. The generalizability of the C57BL/6N substrain reference ranges was demonstrated by comparison with ECG data from a wide range of non-IMPC studies. The close overlap in data from a wide range of mouse strains suggests that the C57BL/6N-based reference ranges can be used as a robust and comprehensive indicator of normality. We report a unique ECG reference resource of fundamental importance for any experimental study of cardiac function in mice
The International Mouse Phenotyping Consortium (IMPC): A functional catalogue of the mammalian genome that informs conservation.
The International Mouse Phenotyping Consortium (IMPC) is building a catalogue of mammalian gene function by producing and phenotyping a knockout mouse line for every protein-coding gene. To date, the IMPC has generated and characterised 5186 mutant lines. One-third of the lines have been found to be non-viable and over 300 new mouse models of human disease have been identified thus far. While current bioinformatics efforts are focused on translating results to better understand human disease processes, IMPC data also aids understanding genetic function and processes in other species. Here we show, using gorilla genomic data, how genes essential to development in mice can be used to help assess the potentially deleterious impact of gene variants in other species. This type of analyses could be used to select optimal breeders in endangered species to maintain or increase fitness and avoid variants associated to impaired-health phenotypes or loss-of-function mutations in genes of critical importance. We also show, using selected examples from various mammal species, how IMPC data can aid in the identification of candidate genes for studying a condition of interest, deliver information about the mechanisms involved, or support predictions for the function of genes that may play a role in adaptation. With genotyping costs decreasing and the continued improvements of bioinformatics tools, the analyses we demonstrate can be routinely applied
Extensive identification of genes involved in congenital and structural heart disorders and cardiomyopathy
Clinical presentation of congenital heart disease is heterogeneous, making identification of the disease-causing genes and their genetic pathways and mechanisms of action challenging. By using in vivo electrocardiography, transthoracic echocardiography and microcomputed tomography imaging to screen 3,894 single-gene-null mouse lines for structural and functional cardiac abnormalities, here we identify 705 lines with cardiac arrhythmia, myocardial hypertrophy and/or ventricular dilation. Among these 705 genes, 486 have not been previously associated with cardiac dysfunction in humans, and some of them represent variants of unknown relevance (VUR). Mice with mutations in Casz1, Dnajc18, Pde4dip, Rnf38 or Tmem161b genes show developmental cardiac structural abnormalities, with their human orthologs being categorized as VUR. Using UK Biobank data, we validate the importance of the DNAJC18 gene for cardiac homeostasis by showing that its loss of function is associated with altered left ventricular systolic function. Our results identify hundreds of previously unappreciated genes with potential function in congenital heart disease and suggest causal function of five VUR in congenital heart disease