10 research outputs found
The correlation between the mutation of protein kinase genes and the clinical characteristics of breast cancer progression
It is accepted that breast cancer (BC) is a heterogeneous disease. In order to investigate BC as a group of disease sub-types, the varying clinical characteristics of BC patients must be considered. In this project a series of clinical, pathological, genetic and genomic data, retrieved from multiple data repositories, will be reviewed for selection in a large-scale meta-analysis and then categorised into 5 sub-groups (Luminal A, Luminal B, Basal, HER2 and Normal). The meta-analysis is primarily designed to ascertain if a correlation exists between the mutation of protein kinase (PK) genes and BC progression. As PK genes play important roles in regulating most cellular processes (e.g. cell proliferation, differentiation and apoptosis), it is no surprise that deregulated PK activity is a frequent cause of disease, and that PK genes are often oncogenes.
The meta-analysis objectives are two-fold:
1. To conduct an integrative meta-analysis of the differential gene expression of the PK gene family between clinical categories of BC progression (low vs high proliferation; luminal vs basal tissue; and grade 1 vs grade 3 tumours). Results from the meta-analysis will generate a ranked list of PK gene expression profiles observed in BC progression.
2. Through the use of powerful bioinformatics tools and sequence analysis interfaces the ranked PK list will be used to direct investigations into the correlations between: codon usage bias; aberrant epigenetic factors; somatic mutations; and observed structural/functional changes of deregulated PK genes in different BC progression categories.
To address these objectives a series of in silico bioinformatics experiments have been designed. A software program (MYGEO) has been specifically written for: multiple dataset download; calculation of p-values between BC progression groups; finding Q-values to control for the false discovery rate over multiple dataset comparisons; and to perform permutation testing on the ranked PK gene list; and 2D/3D sequence analysis functions for the analysis of structure/function relationships in significantly differentiated PK genes in BC progression.
This project will benefit our understanding of the complex system of BC biology by identifying significantly deregulated PK genes in BC progression. The results will identify BC biomarkers and structural/functional locations within PK genes not yet elucidated, thus providing new directions for the development of PK inhibitors and improving the effectiveness of current BC treatment strategies
Integrating Diverse Datasets Improves Developmental Enhancer Prediction
Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology. © 2014 Erwin et al
Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel
Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants
COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records
BACKGROUND:
Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework.
METHODS:
In this cohort study, we used eight linked National Health Service (NHS) datasets for people in England alive on Jan 23, 2020. Data on COVID-19 testing, vaccination, primary and secondary care records, and death registrations were collected until Nov 30, 2021. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity and encompassing five categories: positive SARS-CoV-2 test, primary care diagnosis, hospital admission, ventilation modality (four phenotypes), and death (three phenotypes). We constructed patient trajectories illustrating transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status.
FINDINGS:
Among 57 032 174 individuals included in the cohort, 13 990 423 COVID-19 events were identified in 7 244 925 individuals, equating to an infection rate of 12·7% during the study period. Of 7 244 925 individuals, 460 737 (6·4%) were admitted to hospital and 158 020 (2·2%) died. Of 460 737 individuals who were admitted to hospital, 48 847 (10·6%) were admitted to the intensive care unit (ICU), 69 090 (15·0%) received non-invasive ventilation, and 25 928 (5·6%) received invasive ventilation. Among 384 135 patients who were admitted to hospital but did not require ventilation, mortality was higher in wave 1 (23 485 [30·4%] of 77 202 patients) than wave 2 (44 220 [23·1%] of 191 528 patients), but remained unchanged for patients admitted to the ICU. Mortality was highest among patients who received ventilatory support outside of the ICU in wave 1 (2569 [50·7%] of 5063 patients). 15 486 (9·8%) of 158 020 COVID-19-related deaths occurred within 28 days of the first COVID-19 event without a COVID-19 diagnoses on the death certificate. 10 884 (6·9%) of 158 020 deaths were identified exclusively from mortality data with no previous COVID-19 phenotype recorded. We observed longer patient trajectories in wave 2 than wave 1.
INTERPRETATION:
Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources.
FUNDING:
British Heart Foundation Data Science Centre, led by Health Data Research UK
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
Whole-genome sequence-based analysis of thyroid function
Tiina Paunio on työryhmän UK10K Consortium jäsen.Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF >= 1%) associated with TSH and FT4 (N = 16,335). For TSH, we identify a novel variant in SYN2 (MAF = 23.5%, P = 6.15 x 10(-9)) and a new independent variant in PDE8B (MAF = 10.4%, P = 5.94 x 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/ SLC25A52 (MAF = 3.2%, P = 1.27 x 10(-9)) tagging a rare TTR variant (MAF = 0.4%, P = 2.14 x 10(-11)). All common variants explain >= 20% of the variance in TSH and FT4. Analysis of rare variants (MAFPeer reviewe
Bi-allelic Loss-of-Function CACNA1B Mutations in Progressive Epilepsy-Dyskinesia.
The occurrence of non-epileptic hyperkinetic movements in the context of developmental epileptic encephalopathies is an increasingly recognized phenomenon. Identification of causative mutations provides an important insight into common pathogenic mechanisms that cause both seizures and abnormal motor control. We report bi-allelic loss-of-function CACNA1B variants in six children from three unrelated families whose affected members present with a complex and progressive neurological syndrome. All affected individuals presented with epileptic encephalopathy, severe neurodevelopmental delay (often with regression), and a hyperkinetic movement disorder. Additional neurological features included postnatal microcephaly and hypotonia. Five children died in childhood or adolescence (mean age of death: 9 years), mainly as a result of secondary respiratory complications. CACNA1B encodes the pore-forming subunit of the pre-synaptic neuronal voltage-gated calcium channel Cav2.2/N-type, crucial for SNARE-mediated neurotransmission, particularly in the early postnatal period. Bi-allelic loss-of-function variants in CACNA1B are predicted to cause disruption of Ca2+ influx, leading to impaired synaptic neurotransmission. The resultant effect on neuronal function is likely to be important in the development of involuntary movements and epilepsy. Overall, our findings provide further evidence for the key role of Cav2.2 in normal human neurodevelopment.MAK is funded by an NIHR Research Professorship and receives funding from the Wellcome Trust, Great Ormond Street Children's Hospital Charity, and Rosetrees Trust. E.M. received funding from the Rosetrees Trust (CD-A53) and Great Ormond Street Hospital Children's Charity. K.G. received funding from Temple Street Foundation. A.M. is funded by Great Ormond Street Hospital, the National Institute for Health Research (NIHR), and Biomedical Research Centre. F.L.R. and D.G. are funded by Cambridge Biomedical Research Centre. K.C. and A.S.J. are funded by NIHR Bioresource for Rare Diseases. The DDD Study presents independent research commissioned by the Health Innovation Challenge Fund (grant number HICF-1009-003), a parallel funding partnership between the Wellcome Trust and the Department of Health, and the Wellcome Trust Sanger Institute (grant number WT098051). We acknowledge support from the UK Department of Health via the NIHR comprehensive Biomedical Research Centre award to Guy's and St. Thomas' National Health Service (NHS) Foundation Trust in partnership with King's College London. This research was also supported by the NIHR Great Ormond Street Hospital Biomedical Research Centre. J.H.C. is in receipt of an NIHR Senior Investigator Award. The research team acknowledges the support of the NIHR through the Comprehensive Clinical Research Network. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, Department of Health, or Wellcome Trust. E.R.M. acknowledges support from NIHR Cambridge Biomedical Research Centre, an NIHR Senior Investigator Award, and the University of Cambridge has received salary support in respect of E.R.M. from the NHS in the East of England through the Clinical Academic Reserve. I.E.S. is supported by the National Health and Medical Research Council of Australia (Program Grant and Practitioner Fellowship)
Unpaved road verges as hotspots of fleshy-fruited shrub recruitment and establishment
Hypothetical low-quality habitats can hold an overlooked conservation value. Some frugivorous mam-
mals such as the red fox (Vulpes vulpes) and the European rabbit (Oryctolagus cuniculus) disperse many
viable seeds of fleshy-fruited shrubs along the verges of soft linear developments (SLD), such as trails
and firebreaks. However, seed arrival does not guarantee plant recruitment, since several post-dispersal
processes can alter seed rain. To examine whether SLD verges assist shrub recruitment and establish-
ment, we compared the density and the structure of a community of Mediterranean shrubs between
SLD verges and the adjacent scrubland.
Both seedlings and adult fleshy-fruited shrubs dispersed by foxes and rabbits reached higher densities
along SLD verges than in the scrubland, suggesting SLD verges can be suitable habitats for shrub recruit-
ment and establishment. Bird-dispersed shrubs showed a similar pattern, whereas shrubs dispersed by
ungulates and badgers (Meles meles) as well as rockroses (Cistaceae) showed similar densities in both hab-
itats. Shrub species composition and diversity were similar between habitats.
Due to a marked differential seed arrival, SLD verges housed higher densities of fleshy-fruited shrubs
than the adjacent scrubland. Established shrubs may attract seed-dispersing wildlife, and create proper
environments for plant recruitment, generating a reforestation feedback. Incipient shrub populations
along roadsides may act as stepping stones with potential to connect isolated populations in fragmented
landscapes, where SLD are pervasive. We recommend careful management of frugivore populations and
SLD verges in order to favor the diversity and the structural complexity of native vegetation while preventing the spread of invasive species.Peer reviewe
Rare Variant Analysis of Human and Rodent Obesity Genes in Individuals with Severe Childhood Obesity
Obesity is a genetically heterogeneous disorder. Using targeted and whole-exome sequencing, we studied 32 human and 87 rodent obesity genes in 2,548 severely obese children and 1,117 controls. We identified 52 variants contributing to obesity in 2% of cases including multiple novel variants in GNAS, which were sometimes found with accelerated growth rather than short stature as describedw previously. Nominally significant associations were found for rare functional variants in BBS1, BBS9, GNAS, MKKS, CLOCK and ANGPTL6. The p.S284X variant in ANGPTL6 drives the association signal (rs201622589, MAF∼0.1%, odds ratio = 10.13, p-value = 0.042) and results in complete loss of secretion in cells. Further analysis including additional case-control studies and population controls (N = 260,642) did not support association of this variant with obesity (odds ratio = 2.34, p-value = 2.59 × 10-3), highlighting the challenges of testing rare variant associations and the need for very large sample sizes. Further validation in cohorts with severe obesity and engineering the variants in model organisms will be needed to explore whether human variants in ANGPTL6 and other genes that lead to obesity when deleted in mice, do contribute to obesity. Such studies may yield druggable targets for weight loss therapies