8 research outputs found

    cRegions—a tool for detecting conserved cis-elements in multiple sequence alignment of diverged coding sequences

    Get PDF
    Identifying cis-acting elements and understanding regulatory mechanisms of a gene is crucial to fully understand the molecular biology of an organism. In general, it is difficult to identify previously uncharacterised cis-acting elements with an unknown consensus sequence. The task is especially problematic with viruses containing regions of limited or no similarity to other previously characterised sequences. Fortunately, the fast increase in the number of sequenced genomes allows us to detect some of these elusive cis-elements. In this work, we introduce a web-based tool called cRegions. It was developed to identify regions within a protein-coding sequence where the conservation in the amino acid sequence is caused by the conservation in the nucleotide sequence. The cRegion can be the first step in discovering novel cis-acting sequences from diverged protein-coding genes. The results can be used as a basis for future experimental analysis. We applied cRegions on the non-structural and structural polyproteins of alphaviruses as an example and successfully detected all known cis-acting elements. In this publication and in previous work, we have shown that cRegions is able to detect a wide variety of functional elements in DNA and RNA viruses. These functional elements include splice sites, stem-loops, overlapping reading frames, internal promoters, ribosome frameshifting signals and other embedded elements with yet unknown function. The cRegions web tool is available at http://bioinfo.ut.ee/cRegions/

    The enigmatic origin of papillomavirus protein domains

    Get PDF
    Almost a century has passed since the discovery of papillomaviruses. A few decades of research have given a wealth of information on the molecular biology of papillomaviruses. Several excellent studies have been performed looking at the long- and short-term evolution of these viruses. However, when and how papillomaviruses originate is still a mystery. In this study, we systematically searched the (sequenced) biosphere to find distant homologs of papillomaviral protein domains. Our data show that, even including structural information, which allows us to find deeper evolutionary relationships compared to sequence-only based methods, only half of the protein domains in papillomaviruses have relatives in the rest of the biosphere. We show that the major capsid protein L1 and the replication protein E1 have relatives in several viral families, sharing three protein domains with Polyomaviridae and Parvoviridae. However, only the E1 replication protein has connections with cellular organisms. Most likely, the papillomavirus ancestor is of marine origin, a biotope that is not very well sequenced at the present time. Nevertheless, there is no evidence as to how papillomaviruses originated and how they became vertebrate and epithelium specifi

    Papilloomiviirustes esinevate valkude päritolu

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneViirused on parasiitse eluviisiga bioloogilised objektid, mis kasutavad peremeesraku ressursse endi paljundamiseks. Erinevalt rakulistest organismidest puudub viirustel ühine eellane. Nende tekke kohta eksisteerib mitmeid hüpoteese, kuid viiruste täpne päritolu on tänini ebaselge. Antud doktoritöös keskenduti papilloomiviiruse (PV) sugukonna päritolu uuringutele. PV-d on võimelised nakatama imetajaid, linde, roomajaid ja ka kalu. Kõrge riskiga inimese PV-d põhjustavad enamiku emakakaelavähi juhtudest ja on ka paljude teiste kasvajate tekitajad. Uuringu käigus analüüsiti erinevaid järjestuste andmebaase, tuvastamaks PV-des leiduvate valgugeenide homolooge (ühtset päritolu geene) rakulistes organismides. Tulemused näitasid, et rakuliste organismide genoomides leidub vaid PV-te replikatsioonivalgu E1 homolooge, jättes PV-te päritolu siiski veel ebaselgeks. Samas näitasid meie tulemused, et PV-d on evolutsiooniliselt suguluses polüoomiviiruste ja parvoviiruste sugukonnaga. Seosele viitasid nii kapsiidivalgu L1 kui ka E1 valgu geen. Viiruse genoomid on kompaktsed ning kodeeriva potentsiaali efektiivsemaks kasutamiseks võivad mõned geenid üksteisega kas osaliselt või täielikult kattuda. Üheks selliseks geeniks on E8, mis asub PV-de E1 geeni sees ja mida on eksperimentaalselt tuvastatud väga vähestes PV-des. Töö teise eesmärgina analüüsiti in silico üle 300 PV genoomi, tuvastamaks E8 olemasolu nende E1 geenides. E8 ei suudetud tuvastada ainult nendes PV-des, mis nakatavad roomajaid, linde ja kalu. Tulemused viitavad hilisemale E8 tekkele, pärast imetajate lahknemist teistest selgroogsetest. Eelpool nimetatud topelt kodeeriva ala, aga ka paljude teiste geenisiseste elementide tuvastamine nõuab spetsiifilisi lahendusi. Kolmanda eesmärgina loodi antud doktoritöö käigus veebitööriist nimega cRegions [http://bioinfo.ut.ee/cRegions/], mis on võimeline tuvastama erinevaid funktsionaalseid elemente viiruste geenidest.Viruses are obligatory intracellular parasites harbouring enormous genetic and biological diversity. Viruses are the most abundant biological entities on Earth. Unlike cellular organisms, viruses have multiple evolutionary origins. While there are many hypotheses how viruses emerged, their exact origin is still unknown. In the current thesis, papillomaviruses (PVs) were used as an example to study the potential origin of a viral family. PVs infect many mammalian species, but also birds, turtles, snakes, and fish. PVs have been of interest due to their association with various cancers. Oncogenic human papillomaviruses (HPVs) are responsible for almost all cases of cervical and anal cancers. In this thesis, various sequence collections were analysed to detect distant homologs to PV protein domains in other organisms. We found that PVs have very weak connections to cellular organisms, as only domains from the E1 replication protein had distant homologs in cellular organisms. However, our study revealed that PVs are evolutionarily related to Polyomaviridae and Parvoviridae family. Both of them shared structural homologs of capsid protein L1 and two domains of replication protein E1. Viral genomes mainly encode protein-coding genes. Occasionally, some of these genes are fully embedded inside one another. In this thesis, over 300 PV genomes were analysed in silico to detect an embedded gene called E8, located within the E1 gene. The E8 was detected in almost all PV-s, except PVs infecting Sauropsida and fish. As these hosts are evolutionarily older than mammalian species, it confirms that E8 emerged after the divergence of mammals. The detection of the dual-coding region E8 and other embedded elements needs specific solutions. In this thesis, a web tool called cRegions [http://bioinfo.ut.ee/cRegions/] was developed to detect overlapping genes and other embedded elements in protein-coding genes of viruses.https://www.ester.ee/record=b524476

    PlasmidSeeker: identification of known plasmids from bacterial whole genome sequencing reads

    No full text
    Background Plasmids play an important role in the dissemination of antibiotic resistance, making their detection an important task. Using whole genome sequencing (WGS), it is possible to capture both bacterial and plasmid sequence data, but short read lengths make plasmid detection a complex problem. Results We developed a tool named PlasmidSeeker that enables the detection of plasmids from bacterial WGS data without read assembly. The PlasmidSeeker algorithm is based on k-mers and uses k-mer abundance to distinguish between plasmid and bacterial sequences. We tested the performance of PlasmidSeeker on a set of simulated and real bacterial WGS samples, resulting in 100% sensitivity and 99.98% specificity. Conclusion PlasmidSeeker enables quick detection of known plasmids and complements existing tools that assemble plasmids de novo. The PlasmidSeeker source code is stored on GitHub: https://github.com/bioinfo-ut/PlasmidSeeker
    corecore