55 research outputs found

    MapMi: automated mapping of microRNA loci.

    Get PDF
    BACKGROUND: A large effort to discover microRNAs (miRNAs) has been under way. Currently miRBase is their primary repository, providing annotations of primary sequences, precursors and probable genomic loci. In many cases miRNAs are identical or very similar between related (or in some cases more distant) species. However, miRBase focuses on those species for which miRNAs have been directly confirmed. Secondly, specific miRNAs or their loci are sometimes not annotated even in well-covered species. We sought to address this problem by developing a computational system for automated mapping of miRNAs within and across species. Given the sequence of a known miRNA in one species it is relatively straightforward to determine likely loci of that miRNA in other species. Our primary goal is not the discovery of novel miRNAs but the mapping of validated miRNAs in one species to their most likely orthologues in other species. RESULTS: We present MapMi, a computational system for automated miRNA mapping across and within species. This method has a sensitivity of 92.20% and a specificity of 97.73%. Using the latest release (v14) of miRBase, we obtained 10,944 unannotated potential miRNAs when MapMi was applied to all 21 species in Ensembl Metazoa release 2 and 46 species from Ensembl release 55. CONCLUSIONS: The pipeline and an associated web-server for mapping miRNAs are freely available on http://www.ebi.ac.uk/enright-srv/MapMi/. In addition precomputed miRNA mappings of miRBase miRNAs across a large number of species are provided

    Large-scale analysis of microRNA evolution.

    Get PDF
    BACKGROUND: In animals, microRNAs (miRNA) are important genetic regulators. Animal miRNAs appear to have expanded in conjunction with an escalation in complexity during early bilaterian evolution. Their small size and high-degree of similarity makes them challenging for phylogenetic approaches. Furthermore, genomic locations encoding miRNAs are not clearly defined in many species. A number of studies have looked at the evolution of individual miRNA families. However, we currently lack resources for large-scale analysis of miRNA evolution. RESULTS: We addressed some of these issues in order to analyse the evolution of miRNAs. We perform syntenic and phylogenetic analysis for miRNAs from 80 animal species. We present synteny maps, phylogenies and functional data for miRNAs across these species. These data represent the basis of our analyses and also act as a resource for the community. CONCLUSIONS: We use these data to explore the distribution of miRNAs across phylogenetic space, characterise their birth and death, and examine functional relationships between miRNAs and other genes. These data confirm a number of previously reported findings on a larger scale and also offer novel insights into the evolution of the miRNA repertoire in animals, and it's genomic organization

    The Personal Genome Project-UK, an open access resource of human multi-omics data

    Get PDF
    Integrative analysis of multi-omics data is a powerful approach for gaining functional insights into biological and medical processes. Conducting these multifaceted analyses on human samples is often complicated by the fact that the raw sequencing output is rarely available under open access. The Personal Genome Project UK (PGP-UK) is one of few resources that recruits its participants under open consent and makes the resulting multi-omics data freely and openly available. As part of this resource, we describe the PGP-UK multi-omics reference panel consisting of ten genomic, methylomic and transcriptomic data. Specifically, we outline the data processing, quality control and validation procedures which were implemented to ensure data integrity and exclude sample mix-ups. In addition, we provide a REST API to facilitate the download of the entire PGP-UK dataset. The data are also available from two cloud-based environments, providing platforms for free integrated analysis. In conclusion, the genotype-validated PGP-UK multi-omics human reference panel described here provides a valuable new open access resource for integrated analyses in support of personal and medical genomics

    A blood pressure-associated variant of the SLC39A8 gene influences cellular cadmium accumulation and toxicity.

    Get PDF
    Genome-wide association studies have revealed a relationship between inter-individual variation in blood pressure and the single nucleotide polymorphism rs13107325 in the SLC39A8 gene. This gene encodes the ZIP8 protein which co-transports divalent metal cations, including heavy metal cadmium, the accumulation of which has been associated with increased blood pressure. The polymorphism results in two variants of ZIP8 with either an alanine (Ala) or a threonine (Thr) at residue 391. We investigated the functional impact of this variant on protein conformation, cadmium transport, activation of signalling pathways and cell viability in relation to blood pressure regulation. Following incubation with cadmium, higher intracellular cadmium was detected in cultured human embryonic kidney cells (HEK293) expressing heterologous ZIP8-Ala391, compared with HEK293 cells expressing heterologous ZIP8-Thr391. This Ala391-associated cadmium accumulation also increased the phosphorylation of the signal transduction molecule ERK2, activation of the transcription factor NFκB, and reduced cell viability. Similarly, vascular endothelial cells with the Ala/Ala genotype had higher intracellular cadmium concentration and lower cell viability than their Ala/Thr counterpart following cadmium exposure. These results indicate that the ZIP8 Ala391-to-Thr391 substitution has an effect on intracellular cadmium accumulation and cell toxicity, providing a potential mechanistic explanation for the association of this genetic variant with blood pressure

    Whole Genome Sequencing Shows a Low Proportion of Tuberculosis Disease Is Attributable to Known Close Contacts in Rural Malawi.

    Get PDF
    BACKGROUND: The proportion of tuberculosis attributable to transmission from close contacts is not well known. Comparison of the genome of strains from index patients and prior contacts allows transmission to be confirmed or excluded. METHODS: In Karonga District, Malawi, all tuberculosis patients are asked about prior contact with others with tuberculosis. All available strains from culture-positive patients were sequenced. Up to 10 single nucleotide polymorphisms between index patients and their prior contacts were allowed for confirmation, and ≥ 100 for exclusion. The population attributable fraction was estimated from the proportion of confirmed transmissions and the proportion of patients with contacts. RESULTS: From 1997-2010 there were 1907 new culture-confirmed tuberculosis patients, of whom 32% reported at least one family contact and an additional 11% had at least one other contact; 60% of contacts had smear-positive disease. Among case-contact pairs with sequences available, transmission was confirmed from 38% (62/163) smear-positive prior contacts and 0/17 smear-negative prior contacts. Confirmed transmission was more common in those related to the prior contact (42.4%, 56/132) than in non-relatives (19.4%, 6/31, p = 0.02), and in those with more intense contact, to younger index cases, and in more recent years. The proportion of tuberculosis attributable to known contacts was estimated to be 9.4% overall. CONCLUSIONS: In this population known contacts only explained a small proportion of tuberculosis cases. Even those with a prior family contact with smear positive tuberculosis were more likely to have acquired their infection elsewhere

    Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data.

    Get PDF
    BACKGROUND: Mixed, polyclonal Mycobacterium tuberculosis infection occurs in natural populations. Developing an effective method for detecting such cases is important in measuring the success of treatment and reconstruction of transmission between patients. Using whole genome sequence (WGS) data, we assess two methods for detecting mixed infection: (i) a combination of the number of heterozygous sites and the proportion of heterozygous sites to total SNPs, and (ii) Bayesian model-based clustering of allele frequencies from sequencing reads at heterozygous sites. RESULTS: In silico and in vitro artificially mixed and known pure M. tuberculosis samples were analysed to determine the specificity and sensitivity of each method. We found that both approaches were effective in distinguishing between pure strains and mixed infection where there was relatively high (> 10%) proportion of a minor strain in the mixture. A large dataset of clinical isolates (n = 1963) from the Karonga Prevention Study in Northern Malawi was tested to examine correlations with patient characteristics and outcomes with mixed infection. The frequency of mixed infection in the population was found to be around 10%, with an association with year of diagnosis, but no association with age, sex, HIV status or previous tuberculosis. CONCLUSIONS: Mixed Mycobacterium tuberculosis infection was identified in silico using whole genome sequence data. The methods presented here can be applied to population-wide analyses of tuberculosis to estimate the frequency of mixed infection, and to identify individual cases of mixed infections. These cases are important when considering the evolution and transmission of the disease, and in patient treatment

    Integrated analysis of microRNA and mRNA expression and association with HIF binding reveals the complexity of microRNA expression regulation under hypoxia.

    Get PDF
    BACKGROUND: In mammalians, HIF is a master regulator of hypoxia gene expression through direct binding to DNA, while its role in microRNA expression regulation, critical in the hypoxia response, is not elucidated genome wide. Our aim is to investigate in depth the regulation of microRNA expression by hypoxia in the breast cancer cell line MCF-7, establish the relationship between microRNA expression and HIF binding sites, pri-miRNA transcription and microRNA processing gene expression. METHODS: MCF-7 cells were incubated at 1% Oxygen for 16, 32 and 48 h. SiRNA against HIF-1α and HIF-2α were performed as previously published. MicroRNA and mRNA expression were assessed using microRNA microarrays, small RNA sequencing, gene expression microarrays and Real time PCR. The Kraken pipeline was applied for microRNA-seq analysis along with Bioconductor packages. Microarray data was analysed using Limma (Bioconductor), ChIP-seq data were analysed using Gene Set Enrichment Analysis and multiple testing correction applied in all analyses. RESULTS: Hypoxia time course microRNA sequencing data analysis identified 41 microRNAs significantly up- and 28 down-regulated, including hsa-miR-4521, hsa-miR-145-3p and hsa-miR-222-5p reported in conjunction with hypoxia for the first time. Integration of HIF-1α and HIF-2α ChIP-seq data with expression data showed overall association between binding sites and microRNA up-regulation, with hsa-miR-210-3p and microRNAs of miR-27a/23a/24-2 and miR-30b/30d clusters as predominant examples. Moreover the expression of hsa-miR-27a-3p and hsa-miR-24-3p was found positively associated to a hypoxia gene signature in breast cancer. Gene expression analysis showed no full coordination between pri-miRNA and microRNA expression, pointing towards additional levels of regulation. Several transcripts involved in microRNA processing were found regulated by hypoxia, of which DICER (down-regulated) and AGO4 (up-regulated) were HIF dependent. DICER expression was found inversely correlated to hypoxia in breast cancer. CONCLUSIONS: Integrated analysis of microRNA, mRNA and ChIP-seq data in a model cell line supports the hypothesis that microRNA expression under hypoxia is regulated at transcriptional and post-transcriptional level, with the presence of HIF binding sites at microRNA genomic loci associated with up-regulation. The identification of hypoxia and HIF regulated microRNAs relevant for breast cancer is important for our understanding of disease development and design of therapeutic interventions

    PolyTB: a genomic variation map for Mycobacterium tuberculosis.

    Get PDF
    Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest
    corecore