26 research outputs found

    Galaxy And Mass Assembly: automatic morphological classification of galaxies using statistical learning

    Get PDF
    © 2018 The Author(s). We apply four statistical learning methods to a sample of 7941 galaxies (z < 0.06) from the Galaxy And Mass Assembly survey to test the feasibility of using automated algorithms to classify galaxies. Using 10 features measured for each galaxy (sizes, colours, shape parameters, and stellar mass), we apply the techniques of Support Vector Machines, Classification Trees, Classification Trees with Random Forest (CTRF) and Neural Networks, and returning True Prediction Ratios (TPRs) of 75.8 per cent, 69.0 per cent, 76.2 per cent, and 76.0 per cent, respectively. Those occasions whereby all four algorithms agree with each other yet disagree with the visual classification ('unanimous disagreement') serves as a potential indicator of human error in classification, occurring in ~ 9 per cent of ellipticals, ~ 9 per cent of little blue spheroids, ~ 14 per cent of early-type spirals, ~ 21 per cent of intermediate-type spirals, and ~ 4 per cent of late-type spirals and irregulars. We observe that the choice of parameters rather than that of algorithms is more crucial in determining classification accuracy. Due to its simplicity in formulation and implementation, we recommend the CTRF algorithm for classifying future galaxy data sets. Adopting the CTRF algorithm, the TPRs of the five galaxy types are: E, 70.1 per cent; LBS, 75.6 per cent; S0-Sa, 63.6 per cent; Sab-Scd, 56.4 per cent, and Sd-Irr, 88.9 per cent. Further, we train a binary classifier using this CTRF algorithm that divides galaxies into spheroid-dominated (E, LBS, and S0-Sa) and disc-dominated (Sab-Scd and Sd-Irr), achieving an overall accuracy of 89.8 per cent. This translates into an accuracy of 84.9 per cent for spheroid-dominated systems and 92. 5 per cent for disc-dominated systems

    Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families.

    Get PDF
    Discovery of most autosomal recessive disease-associated genes has involved analysis of large, often consanguineous multiplex families or small cohorts of unrelated individuals with a well-defined clinical condition. Discovery of new dominant causes of rare, genetically heterogeneous developmental disorders has been revolutionized by exome analysis of large cohorts of phenotypically diverse parent-offspring trios. Here we analyzed 4,125 families with diverse, rare and genetically heterogeneous developmental disorders and identified four new autosomal recessive disorders. These four disorders were identified by integrating Mendelian filtering (selecting probands with rare, biallelic and putatively damaging variants in the same gene) with statistical assessments of (i) the likelihood of sampling the observed genotypes from the general population and (ii) the phenotypic similarity of patients with recessive variants in the same candidate gene. This new paradigm promises to catalyze the discovery of novel recessive disorders, especially those with less consistent or nonspecific clinical presentations and those caused predominantly by compound heterozygous genotypes

    Prevalence and architecture of de novo mutations in developmental disorders.

    Get PDF
    The genomes of individuals with severe, undiagnosed developmental disorders are enriched in damaging de novo mutations (DNMs) in developmentally important genes. Here we have sequenced the exomes of 4,293 families containing individuals with developmental disorders, and meta-analysed these data with data from another 3,287 individuals with similar disorders. We show that the most important factors influencing the diagnostic yield of DNMs are the sex of the affected individual, the relatedness of their parents, whether close relatives are affected and the parental ages. We identified 94 genes enriched in damaging DNMs, including 14 that previously lacked compelling evidence of involvement in developmental disorders. We have also characterized the phenotypic diversity among these disorders. We estimate that 42% of our cohort carry pathogenic DNMs in coding sequences; approximately half of these DNMs disrupt gene function and the remainder result in altered protein function. We estimate that developmental disorders caused by DNMs have an average prevalence of 1 in 213 to 1 in 448 births, depending on parental age. Given current global demographics, this equates to almost 400,000 children born per year

    Multiple novel prostate cancer susceptibility signals identified by fine-mapping of known risk loci among Europeans

    Get PDF
    Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain ∼38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same regio

    Heterozygous Variants in KMT2E Cause a Spectrum of Neurodevelopmental Disorders and Epilepsy.

    Get PDF
    We delineate a KMT2E-related neurodevelopmental disorder on the basis of 38 individuals in 36 families. This study includes 31 distinct heterozygous variants in KMT2E (28 ascertained from Matchmaker Exchange and three previously reported), and four individuals with chromosome 7q22.2-22.23 microdeletions encompassing KMT2E (one previously reported). Almost all variants occurred de novo, and most were truncating. Most affected individuals with protein-truncating variants presented with mild intellectual disability. One-quarter of individuals met criteria for autism. Additional common features include macrocephaly, hypotonia, functional gastrointestinal abnormalities, and a subtle facial gestalt. Epilepsy was present in about one-fifth of individuals with truncating variants and was responsive to treatment with anti-epileptic medications in almost all. More than 70% of the individuals were male, and expressivity was variable by sex; epilepsy was more common in females and autism more common in males. The four individuals with microdeletions encompassing KMT2E generally presented similarly to those with truncating variants, but the degree of developmental delay was greater. The group of four individuals with missense variants in KMT2E presented with the most severe developmental delays. Epilepsy was present in all individuals with missense variants, often manifesting as treatment-resistant infantile epileptic encephalopathy. Microcephaly was also common in this group. Haploinsufficiency versus gain-of-function or dominant-negative effects specific to these missense variants in KMT2E might explain this divergence in phenotype, but requires independent validation. Disruptive variants in KMT2E are an under-recognized cause of neurodevelopmental abnormalities

    Bi-allelic Loss-of-Function CACNA1B Mutations in Progressive Epilepsy-Dyskinesia.

    Get PDF
    The occurrence of non-epileptic hyperkinetic movements in the context of developmental epileptic encephalopathies is an increasingly recognized phenomenon. Identification of causative mutations provides an important insight into common pathogenic mechanisms that cause both seizures and abnormal motor control. We report bi-allelic loss-of-function CACNA1B variants in six children from three unrelated families whose affected members present with a complex and progressive neurological syndrome. All affected individuals presented with epileptic encephalopathy, severe neurodevelopmental delay (often with regression), and a hyperkinetic movement disorder. Additional neurological features included postnatal microcephaly and hypotonia. Five children died in childhood or adolescence (mean age of death: 9 years), mainly as a result of secondary respiratory complications. CACNA1B encodes the pore-forming subunit of the pre-synaptic neuronal voltage-gated calcium channel Cav2.2/N-type, crucial for SNARE-mediated neurotransmission, particularly in the early postnatal period. Bi-allelic loss-of-function variants in CACNA1B are predicted to cause disruption of Ca2+ influx, leading to impaired synaptic neurotransmission. The resultant effect on neuronal function is likely to be important in the development of involuntary movements and epilepsy. Overall, our findings provide further evidence for the key role of Cav2.2 in normal human neurodevelopment.MAK is funded by an NIHR Research Professorship and receives funding from the Wellcome Trust, Great Ormond Street Children's Hospital Charity, and Rosetrees Trust. E.M. received funding from the Rosetrees Trust (CD-A53) and Great Ormond Street Hospital Children's Charity. K.G. received funding from Temple Street Foundation. A.M. is funded by Great Ormond Street Hospital, the National Institute for Health Research (NIHR), and Biomedical Research Centre. F.L.R. and D.G. are funded by Cambridge Biomedical Research Centre. K.C. and A.S.J. are funded by NIHR Bioresource for Rare Diseases. The DDD Study presents independent research commissioned by the Health Innovation Challenge Fund (grant number HICF-1009-003), a parallel funding partnership between the Wellcome Trust and the Department of Health, and the Wellcome Trust Sanger Institute (grant number WT098051). We acknowledge support from the UK Department of Health via the NIHR comprehensive Biomedical Research Centre award to Guy's and St. Thomas' National Health Service (NHS) Foundation Trust in partnership with King's College London. This research was also supported by the NIHR Great Ormond Street Hospital Biomedical Research Centre. J.H.C. is in receipt of an NIHR Senior Investigator Award. The research team acknowledges the support of the NIHR through the Comprehensive Clinical Research Network. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, Department of Health, or Wellcome Trust. E.R.M. acknowledges support from NIHR Cambridge Biomedical Research Centre, an NIHR Senior Investigator Award, and the University of Cambridge has received salary support in respect of E.R.M. from the NHS in the East of England through the Clinical Academic Reserve. I.E.S. is supported by the National Health and Medical Research Council of Australia (Program Grant and Practitioner Fellowship)

    World literacy: obstacles and opportunities

    No full text
    corecore