507 research outputs found

    Using RNase sequence specificity to refine the identification of RNA-protein binding regions

    Get PDF
    Massively parallel pyrosequencing is a high-throughput technology that can sequence hundreds of thousands of DNA/RNA fragments in a single experiment. Combining it with immunoprecipitation-based biochemical assays, such as cross-linking immunoprecipitation (CLIP), provides a genome-wide method to detect the sites at which proteins bind DNA or RNA. In a CLIP-pyrosequencing experiment, the resolutions of the detected protein binding regions are partially determined by the length of the detected RNA fragments (CLIP amplicons) after trimming by RNase digestion. The lengths of these fragments usually range from 50-70 nucleotides. Many genomic regions are marked by multiple RNA fragments. In this paper, we report an empirical approach to refine the localization of protein binding regions by using the distribution pattern of the detected RNA fragments and the sequence specificity of RNase digestion. We present two regions to which multiple amplicons map as examples to demonstrate this approach

    The Study of Hepatitis B Virus Using Bioinformatics

    Get PDF
    Hepatitis refers to the inflammation of the liver. A major cause of hepatitis is the hepatotropic virus, hepatitis B virus (HBV). Annually, more than 786,000 people die as a result of the clinical manifestations of HBV infection, which include cirrhosis and hepatocellular carcinoma. Sequence heterogeneity is a feature of HBV, because the viral-encoded polymerase lacks proof-reading ability. HBV has been classified into nine genotypes, A to I, with a putative 10th genotype, “J,” isolated from a single individual. Comparative analysis of HBV strains from various geographic regions of the world and from different eras can shed light on the origin, evolution, transmission and response to anti-HBV preventative, and treatment measures. Bioinformatics tools and databases have been used to better understand HBV mutations and how they develop, especially in response to antiviral therapy and vaccination. Despite its small genome size of ~3.2 kb, HBV presents several bioinformatic challenges, which include the circular genome, the overlapping open reading frames, and the different genome lengths of the genotypes. Thus, bioinformatics tools and databases have been developed to facilitate the study of HBV

    Genome reconstructions indicate the partitioning of ecological functions inside a phytoplankton bloom in the Amundsen Sea, Antarctica

    Get PDF
    © The Author(s), 2015. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Frontiers in Microbiology 6 (2015): 1090, doi:10.3389/fmicb.2015.01090.Antarctica polynyas support intense phytoplankton blooms, impacting their environment by a substantial depletion of inorganic carbon and nutrients. These blooms are dominated by the colony-forming haptophyte Phaeocystis antarctica and they are accompanied by a distinct bacterial population. Yet, the ecological role these bacteria may play in P. antarctica blooms awaits elucidation of their functional gene pool and of the geochemical activities they support. Here, we report on a metagenome (~160 million reads) analysis of the microbial community associated with a P. antarctica bloom event in the Amundsen Sea polynya (West Antarctica). Genomes of the most abundant Bacteroidetes and Proteobacteria populations have been reconstructed and a network analysis indicates a strong functional partitioning of these bacterial taxa. Three of them (SAR92, and members of the Oceanospirillaceae and Cryomorphaceae) are found in close association with P. antarctica colonies. Distinct features of their carbohydrate, nitrogen, sulfur and iron metabolisms may serve to support mutualistic relationships with P. antarctica. The SAR92 genome indicates a specialization in the degradation of fatty acids and dimethylsulfoniopropionate (compounds released by P. antarctica) into dimethyl sulfide, an aerosol precursor. The Oceanospirillaceae genome carries genes that may enhance algal physiology (cobalamin synthesis). Finally, the Cryomorphaceae genome is enriched in genes that function in cell or colony invasion. A novel pico-eukaryote, Micromonas related genome (19.6 Mb, ~94% completion) was also recovered. It contains the gene for an anti-freeze protein, which is lacking in Micromonas at lower latitudes. These draft genomes are representative for abundant microbial taxa across the Southern Ocean surface.This work was performed with financial support from NSF Antarctic Sciences awards ANT-1142095 to AP

    Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. However, determining protein-coding genes for most new genomes is almost completely performed by inference using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function.</p> <p>Results</p> <p>We experimentally annotated the bacterial pathogen <it>Salmonella </it>Typhimurium 14028, using "shotgun" proteomics to accurately uncover the translational landscape and post-translational features. The data provide protein-level experimental validation for approximately half of the predicted protein-coding genes in <it>Salmonella </it>and suggest revisions to several genes that appear to have incorrectly assigned translational start sites, including a potential novel alternate start codon. Additionally, we uncovered 12 non-annotated genes missed by gene prediction programs, as well as evidence suggesting a role for one of these novel ORFs in <it>Salmonella </it>pathogenesis. We also characterized post-translational features in the <it>Salmonella </it>genome, including chemical modifications and proteolytic cleavages. We find that bacteria have a much larger and more complex repertoire of chemical modifications than previously thought including several novel modifications. Our <it>in vivo </it>proteolysis data identified more than 130 signal peptide and N-terminal methionine cleavage events critical for protein function.</p> <p>Conclusion</p> <p>This work highlights several ways in which application of proteomics data can improve the quality of genome annotations to facilitate novel biological insights and provides a comprehensive proteome map of <it>Salmonella </it>as a resource for systems analysis.</p

    A disulfide bond A-like oxidoreductase is a strong candidate gene for self-incompatibility in apricot (Prunus armeniaca) pollen

    Full text link
    [EN] S-RNase based gametophytic self-incompatibility (SI) is a widespread prezygotic reproductive barrier in flowering plants. In the Solanaceae, Plantaginaceae and Rosaceae gametophytic SI is controlled by the pistil-specific S-RNases and the pollen S-locus F-box proteins but non-S-specific factors, namely modifiers, are also required. In apricot, Prunus armeniaca (Rosaceae), we previously mapped two pollen-part mutations that confer self-compatibility in cultivars Canino and Katy at the distal end of chromosome 3 (M-locus) unlinked to the S-locus. Here, we used high-resolution mapping to identify the M-locus with an similar to 134 kb segment containing ParM-1-16 genes. Gene expression analysis identified four genes preferentially expressed in anthers as modifier gene candidates, ParM-6, -7, -9 and -14. Variant calling of WGS Illumina data from Canino, Katy, and 10 self-incompatible cultivars detected a 358 bp miniature inverted-repeat transposable element (MITE) insertion in ParM-7 shared only by self-compatible apricots, supporting ParM-7 as strong candidate gene required for SI. ParM-7 encodes a disulfide bond A-like oxidoreductase protein, which we named ParMDO. The MITE insertion truncates the ParMDO ORF and produces a loss of SI function, suggesting that pollen rejection in Prunus is dependent on redox regulation. Based on phylogentic analyses we also suggest that ParMDO may have originated from a tandem duplication followed by subfunctionalization and pollenspecific expression.This work was supported by two grants from the Ministerio de Economia y Competitividad del Gobierno de Espana (AGL19018-2010 and AGL2015-64625-C2-2-R). The authors want to thank Inmaculada Lopez for her technical contribution and Gary Clark for his assistance with the manuscript. Chris Dardick, Tetyana Zhebentyayeva, and Albert Abbott kindly provided Goldrich and SEO genomic sequences. We also thank Mario Fares, and especially, Bruce McClure for their insights and comments on the manuscript.Muñoz-Sanz, JV.; Zuriaga García, E.; Badenes, ML.; Romero Salvador, C. (2017). A disulfide bond A-like oxidoreductase is a strong candidate gene for self-incompatibility in apricot (Prunus armeniaca) pollen. Journal of Experimental Botany. 68(18):5069-5078. https://doi.org/10.1093/jxb/erx336S50695078681

    차세대 염기서열 분석 장비로 생성한 메타지놈 데이터 분석을 위한 최적의 생물정보학 시스템 개발

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2014. 2. 천종식.Metagenome is total DNA directly extracted from environment, and the purpose of metagenomics is to reveal the function of the metagenome as well as the taxonomic structure in the metagenome. There are two analysis approaches for metagenomics, namely amplicon based approach and random shotgun based approach. Both approaches require large scale sequencing reads which could not be satisfied through Sanger sequencing. However, high throughput sequencing of reads at relatively low cost by Next Generation Sequencing (NGS) technologies meets the requirement of metagenomics. In addition, the advent of NGS technologies gave rise to the development of bioinformatic algorithms necessary for processing this large and complex sequencing data. Consequently, the large amount of sequencing data obtained from NGS and corresponding proper bioinformatic algorithms facilitated the metagenomics to become essential tool for microbiology. However, limitations incurred by NGS sequencing errors, short read length, and lack of analysis system still hinder accurate metagenome analysis. Therefore, evaluation of currently used NGS error handling algorithms and development of systematic pipeline with more efficient algorithms are required to improve the accuracy of analysis. In this study, bioinformatic pipelines were constructed for both metagenome analysis approaches. The pipelines were dedicated to improve the accuracy of the final end result by minimizing the effect of errors and short read length. For the amplicon based metagenomics, two different analysis pipelines were developed for both 454 pyrosequencing and Illumina MiSeq. During the construction of 454 pyrosequencing pipeline, new error handling algorithm was developed to treat homo-polymer and PCR errors. Upon completion of the pipeline construction, household microbial community was analyzed using 454 pyrosequencing data as a case study. As for Illumina MiSeq data, the most appropriate sequencing conditions and sequencing target region were settled. Paired end merging programs were evaluated and correlation of the sequencing errors and quality was studied to correct the errors within 3 overlap regions. Novel iterative consensus clustering method was developed to correct the errors occurring ubiquitously in a single read. For shotgun metagenomics approach, bioinformatic analysis system for Illumina MiSeq paired end data was constructed. Unlike the targeted amplicon sequencing reads, most of the shotgun sequencing reads are not mergedthus short reads are used for both functional and taxonomical profiling. However, a short read has less information than longer contigs, so the use of short reads is likely to cause biased characterization of the metagenome. Therefore, the development of analysis system did focus on creating longer contigs by means of mapping and de novo assembly. For raw read mapping, a dynamic mapping genome set construction method was developed. A list of mapping genomes was selected from the taxonomic profile inferred from the ribosomal RNA profiles. The genome sequence of the selected genomes were downloaded from Ezbiocloud. By mapping raw reads to the genome sequences, the longer contigs can be obtained in case of the relatively simple metagenome such as fecal matter. However in case of the complex metagenomes such as soil sample, both mapping and de novo assembly did not perform properly due to a lack of sequencing coverage and numerousity of uncultured microorganisms in the metagenome. In addition to the pipeline construction, visualization tools were also developed to display resultant taxonomic and functional profile at the same time. Newly developed JAVA-based standalone sequence alignment editing application was named as EzEditor. As both, conserved functional coding sequences and 16S rRNA gene have been used copiously in bacterial molecular phylogenetics, the codon-based sequence alignment editing functions are required for the coding genes. EzEditor provides simultaneous DNA and protein sequence alignment editing interface which enables us with the robust sequence alignment for both protein and rRNA sequences. EzEditor can be applied to various molecular sequence involved analysis not only as a basic sequence editor but also for phylogenetic application.ABSTRACT I TABLE OF CONTENTS IV ABBREVIATIONS VI FIGURE LIST VII TABLE LIST XII Chapter 1 General Introduction 1 1.1 Bioinformatics 2 1.2 Next Generation Sequencing 5 1.3 Metagenomics 11 1.4 Objectives of This Study 21 Chapter 2 Amplicon-based Metagenome Analysis Systems 23 2.1 Introduction 24 2.2 Analysis System for 454 Pyrosequencing 35 2.2.1 Methods 36 2.2.2 Results 39 2.3 Analysis System for Illumina MiSeq 60 2.3.1 Methods 62 2.3.2 Results 68 2.4 Summary and Discussion 93 Chapter 3 Shotgun-based Metagenome Analysis System 99 3.1 Introduction 100 3.1.1 Tools for Metagenomics 101 3.2 Methods 118 3.3 Results 125 3.4 Summary and Discussion 165 Chapter 4 EzEditor: A versatile Molecular Sequence Editor for Both Ribosomal RNA and Protein Coding Genes 169 4.1 Overview 170 4.2 Features of EzEditor 172 4.2.1 Algorithms and Models Implemented in EzEditor 177 4.2.2 Miscellaneous Functions 178 4.3 Summary and Discussion 181 Conclusions 183 References 187 APPENDIX I. Estimated Diversity Index of Household Microbiome 217 국문 초록 (Abstract in Korean) 221Docto

    The influence of genetic variation in PSIP1 on HIV-1 infectivity in black South Africans

    Get PDF
    Genetic variation plays an important role in determining an individual’s susceptibility to infectious disease. PSIP1 encodes LEDGF/p75, which stably associates with the core domain of HIV-1 integrase via a highly-conserved integrase binding domain (IBD) located in its C-terminal. Through this interaction, the protein tethers HIV-1 IN to chromosomes at sites corresponding to regions of high LEDGF/p75-mediated transcription. Genetic variation within PSIP1 was identified and characterized in black South Africans to establish whether variation in this influences an individual’s susceptibility to HIV infection. PCR assays were designed to amplify regions within the upstream non-coding region, IBD and DNA-binding domains of the gene and selected polymorphisms were then genotyped using allele-specific PCR, RFLP-PCR and Pyrosequencing™ assays. Three insertion-deletion (indel) and eight single nucleotide polymorphisms (SNP) where identified through sequencing. Four of the SNPs had been recorded previously, while the seven other polymorphisms had not and appear to be unique to our population. Differences in allelic and genotypic frequencies where found between the various ethnic groups represented in this study, which were reflected in the underlying haplotype structure within this gene, suggesting that genetic substructure exists within the black South African population. Differences in allele and genotype frequencies were also seen between HIV+ individuals and the general population. Thus variation within PSIP1 may influence an individual’s susceptibility to HIV-1 infectivity and/or rate of disease progression

    Human Endogenous Retrovirus Type W in Multiple Sclerosis

    Get PDF
    La esclerosis múltiple (MS, multiple sclerosis) es una enfermedad crónica inflamatoria y degenerativa del sistema nervioso central (CNS, central nervous system). Ya en los años 90 se relacionó la presencia de partículas parecidas a retrovirus con la patología de MS (Perron et al., 1997). El ARN (RNA, ribonucleic acid) de dichas partículas se caracterizó como el retrovirus endógeno asociado a esclerosis múltiple (MSRV, multiple sclerosis-associated retrovirus), el cual comparte mucha homología con la familia W de los retrovirus endógenos humanos (HERV-W, human endogenous retrovirus family W) (Blond et al., 1999). Tanto la presencia como la expresión de RNA y proteínas de MSRV/HERV-W se encuentra elevada en pacientes de MS y se asocia al progreso de la enfermedad (Garcia-Montojo et al., 2013; Mameli et al., 2009; Perron et al., 2012). Aunque la MS afecta sobre todo al cerebro, el aumento de expresión de MSRV/HERV-W se ha observado también en las células mononucleares de la sangre periférica (PBMC, periferal blood mononuclear cells). HERV-W es una familia compuesta por más de 600 elementos similares integrados a lo largo del genoma humano (Pavlicek et al., 2002). Sin embargo, la relación exacta entre MSRV y HERV-W se desconoce a día de hoy.Para abordar cuál es la relación entre MSRV y HERV-W y, cuál es la contribución de HERV-W a la patología, el primer objetivo ha consistido en intentar localizar MSRV o copias de HERV-W que no están en la base de datos del genoma en el ADN (DNA, desoxyribonucleic acid) genómico de PBMCs de pacientes de MS. Para ello se ha modificado un método que nos permitiera amplificar secuencias de HERV-W. El ensayo ha permitido localizar en el genoma varios elementos de HERV-W presentes en el genoma de pacientes de MS, cuya relación con MS se desconoce a día de hoy.Gracias a una colaboración con el Servicio de Neurología del Hospital Universitario Miguel Servet (Zaragoza, España), se han obtenido muestras de sangre de pacientes de MS y controles sanos de las cuales se han extraído las PBMCs. Posteriormente, se ha analizado el número de copias de HERV-W en el genoma humano, y se ha detectado que es contante en todos los individuos. También se han analizado los niveles de expresión de HERV-W a nivel de RNA en PBMCs. Se ha detectado que la expresión de HERV-W se encuentra ligeramente elevada en los pacientes de MS, y que dicha expresión no proviene de una sola copia de HERV-W sino de un grupo de copias de HERV-W menos abundantes pero asociadas a los pacientes de MS.Como en otros HERV, se supone que la regulación de HERV-W está controlada, entre otros mecanismos, por mecanismos epigenéticos como la metilación del ADN. Como el estado de metilación de HERV-W no ha sido analizado en el contexto de MS, se planteó la posibilidad de que el grado de metilación de HERV-W está relacionado con su expresión. Se han diseñado y aplicado ensayos para medir el grado de metilación de algunas de las copias de HERV-W previamente identificadas en pacientes de MS. Todos los loci analizados han resultado estar altamente metilados tanto en PBMCs de pacientes como en controles. Por lo que no parece que los niveles de metilación de HERV-W regulen su expresión.Los receptores tipo Toll (TLR; Toll-like receptors) detectan productos virales en forma de proteína o ácidos nucleicos y median la respuesta antiviral. He iniciado estudios para investigar la posibilidad de que la sobre-expresión de HERV-W indujera una respuesta inflamatoria dentro del CNS. Debido a la dificultad de generar oligodendrocitos, se han utilizado precursores neuronales humanos. Se ha detectado que la sobre-expresión de HERV-W desencadena una respuesta inflamatoria mediada por interferón β. Estos resultados sugieren que la desregulación de HERV-W podría activar el sistema innato residente en el CNS y contribuir así en la neuro-inflamación presente en MS.<br /

    Cutaneous Melanoma Classification: The Importance of High-Throughput Genomic Technologies

    Get PDF
    Cutaneous melanoma is an aggressive tumor responsible for 90% of mortality related to skin cancer. In the recent years, the discovery of driving mutations in melanoma has led to better treatment approaches. The last decade has seen a genomic revolution in the field of cancer. Such genomic revolution has led to the production of an unprecedented mole of data. High-throughput genomic technologies have facilitated the genomic, transcriptomic and epigenomic profiling of several cancers, including melanoma. Nevertheless, there are a number of newer genomic technologies that have not yet been employed in large studies. In this article we describe the current classification of cutaneous melanoma, we review the current knowledge of the main genetic alterations of cutaneous melanoma and their related impact on targeted therapies, and we describe the most recent high-throughput genomic technologies, highlighting their advantages and disadvantages. We hope that the current review will also help scientists to identify the most suitable technology to address melanoma-related relevant questions. The translation of this knowledge and all actual advancements into the clinical practice will be helpful in better defining the different molecular subsets of melanoma patients and provide new tools to address relevant questions on disease management. Genomic technologies might indeed allow to better predict the biological - and, subsequently, clinical - behavior for each subset of melanoma patients as well as to even identify all molecular changes in tumor cell populations during disease evolution toward a real achievement of a personalized medicine

    Determining the Roles that DICER1 and Noncoding RNAs Play in Endometrial Tumorigenesis

    Get PDF
    Cancer is both a genetic and epigenetic disease. Changes in DNA methylation, histone modifications, and microRNA processing promote tumorigenesis, just as mutations in coding sequences of specific genes contribute to cancer development. In my thesis work I sought to determine the role that noncoding RNAs play in endometrial tumorigenesis. Aberrant methylation of the promoter region of the MLH1 DNA mismatch repair gene in endometrial cancer is associated with loss of MLH1 expression and a mutator phenotype in endometrial and other cancers. The molecular and cellular processes leading to aberrant methylation of the MLH1 promoter region are largely unknown. I tested the hypothesis that the EPM2AIP1 antisense transcript at the MLH1 locus could be involved in MLH1 transcriptional silencing. I characterized the MLH1/EPM2AIP1 bidirectional promoter region in endometrial cancer and normal cell lines and found an abundance of forward and reverse transcripts initiating from a large region of nucleosome-free DNA in expressing cells. The DICER1 protein, which is necessary for processing small RNAs involved in post-transcriptional silencing, is downregulated in many cancers, including endometrial cancer. I used genomic methods: RNA-Seq and MeDIP/MRE) to characterize the transcriptome and methylome of endometrial cancer cells depleted of DICER1. Using a combination of computational and wet lab methods I showed that reduced DICER1 triggers an interferon response in cancer cells because of accumulation of pre-microRNAs that activate immune sensors of viral dsRNA. The methylome of DICER1 knockdown cells revealed subtle changes in methylation, including decreased methylation at the Alu family of repetitive elements. Small RNAs processed by DICER1 may thus be involved in silencing repetitive regions. Non-coding RNA has effects on endometrial cancer cells that may contribute to tumorigenesis, such as influencing the active state of the MLH1 gene and modulating the immune response
    corecore