165 research outputs found

    Low-frequency variant detection in viral populations using massively parallel sequencing data

    Get PDF

    454 screening of individual MHC variation in an endemic island passerine

    Get PDF
    Genes of the major histocompatibility complex (MHC) code for receptors that are central to the adaptive immune response of vertebrates. These genes are therefore important genetic markers with which to study adaptive genetic variation in the wild. Next-generation sequencing (NGS) has increasingly been used in the last decade to genotype the MHC. However, NGS methods are highly prone to sequencing errors, and although several methodologies have been proposed to deal with this, until recently there have been no standard guidelines for the validation of putative MHC alleles. In this study, we used the 454 NGS platform to screen MHC class I exon 3 variation in a population of the island endemic Berthelot’s pipit (Anthus berthelotii). We were able to characterise MHC genotypes across 309 individuals with high levels of repeatability. We were also able to determine alleles that had low amplification efficiencies, whose identification within individuals may thus be less reliable. At the population level we found lower levels of MHC diversity in Berthelot’s pipit than in its widespread continental sister species the tawny pipit (Anthus campestris), and observed trans-species polymorphism. Using the sequence data, we identified signatures of gene conversion and evidence of maintenance of functionally divergent alleles in Berthelot’s pipit. We also detected positive selection at 10 codons. The present study therefore shows that we have an efficient method for screening individual MHC variation across large datasets in Berthelot’s pipit, and provides data that can be used in future studies investigating spatio-temporal patterns and scales of selection on the MHC

    Computational genomics of lactobacilli

    Get PDF
    Lactobacilli are generally harmless gram-positive lactic acid bacteria and well known for their broad spectrum of beneficial effects on human health and usage in food production. However, relatively little is known at the molecular level about the relationships between lactobacilli and humans and about their food processing abilities. The aim of this thesis was to establish bioinformatics approaches for classifying proteins involved in the health effects and food production abilities of lactobacilli and to elucidate the functional potential of two biomedically important Lactobacillus species using whole-genome sequencing. To facilitate the genome-based analysis of lactobacilli, two new bioinformatics approaches were developed for the systematic analysis of protein function. The first approach, called LOCP, fulfilled the need for accurate genome-wide annotation of putative pilus operons in gram-positive bacteria, whereas the second approach, BLANNOTATOR, represented an improved homology-based solution for general function annotation of bacterial proteins. Importantly, both approaches showed superior accuracy in evaluation tests and proved to be useful in finding information ignored by other homology-search methods, illustrating their added value to the current repertoire of function classification systems. Their application also led to the discovery of several putative pilus operons and new potential effector molecules in lactobacilli, including many of the key findings of this thesis work. Lactobacillus rhamnosus GG is one of the clinically best-studied Lactobacillus strains and has a long history of safe use in the food industry. The whole-genome sequencing of the strain GG and a closely related dairy strain L. rhamnosus LC705 revealed two almost identical genomes, despite the physiological differences between the strains. Nevertheless of the extensive genomic similarity, present only in GG was a genomic region containing genes for three pilin subunits and a pilin-dedicated sortase. The presence of these pili on the cell surface of L. rhamnosus GG was also confirmed, and one of the GG-specific pilins was demonstrated to be central for the mucus interaction of strain GG. These discoveries established the presence of gram-positive pilus structures also in non-pathogenic bacteria and provided a long-awaited explanation for the highly efficient adhesion of the strain GG to the intestinal mucosa. The other Lactobacillus species investigated in this thesis was Lactobacillus crispatus. To gain insights into its physiology and to identify components by which this important constituent of the healthy human vagina may promote urogenital health, the genome of a representative L. crispatus strain was sequenced and compared to those of nine others. These analyses provided an accurate account of features associated with vaginal health and revealed a set of 1,224 gene families that were universally conserved across all the ten strains, and, most likely, also across the entire L. crispatus species. Importantly, this set of genes was shown to contain adhesion genes involved in the displacement of the bacterial vaginosis-associated Gardnerella vaginalis from vaginal cells and provided a molecular explanation for the inverse association between L. crispatus and G. vaginalis colonisation in the vagina. Taken together, the present study demonstrates the power of whole-genome sequencing and computer-assisted genome annotation in identifying genes that are involved in host-interactions and have industrial value. The discovery of gram-positive pili in L. rhamnosus GG and the mechanism by which L. crispatus excludes G. vaginalis from vaginal cells are both major steps forward in understanding the interaction between lactobacilli and host. We envisage that these findings together with the developed bioinformatics methods will aid the improvement of probiotic products and human health in the future.Laktobasillit ovat enimmäkseen harmittomia gram-positiivisia maitohappobakteereja. Vaikka näitä terveysvaikutteisiakin hyötybakteereja on hyödynnetty elintarvikkeiden valmistuksessa jo vuosisatoja, tietämyksemme laktobasillien molekyylibiologisista perusteista on varsin rajallinen. Tämän väitöskirjatyön tavoitteena oli kehittää uusia laskennallisia työkaluja laktobasillien tuottamien biomolekyylien karakterisointiin sekä selvittää kahden biolääketieteellisestikin merkittävän laktobasillilajin toimintaan perimän luentaa hyödyntäen. Väitöskirjatutkimuksessa esitellään kaksi laskennallisen biologian menetelmää laktobasillien ilmentämien ominaisuuksien ennustamiseen perimätiedosta sekä hyödynnetään näitä laktobasillien toiminnan tulkinnassa. Menetelmistä ensimmäinen, LOCP, on luotu seulomaan perimätiedosta pili-tartuntaelimien tuottamiseen tarvittavia geeniryhmiä, kun taas menetelmistä jälkimmäinen, BLANNOTATOR, on sekvenssivertailuihin ja lähisukuisista biomolekyyleistä lainattuun tietoon perustuva uusi proteiinisekvenssien luokitintyökalu. Osatöissä tehdyissä selvityksissä molemmat kehitetyistä menetelmistä osoittautuivat ennennäkemättömän tarkoiksi ja kykeneviksi löytämään muiden tehtäviin soveltuvien menetelmien erheellisesti sivuttamaa tietoa. Ohjelmien avulla pystyttiin myös löytämään uusia pili-tartuntaelimien tuottamiseen tarvittavia geeniryhmiä sekä muita mahdollisesti biolääketieteellisesti merkittäviä ominaisuuksia laktobasilleista, mukaan lukien useimmat tässäkin väitöskirjatyössä esitetyt havainnot. Ensimmäinen väitöskirjatyössä tarkasteltu bakteeri oli Lactobacillus rhamnosus GG, joka on eräs tunnetuimmista ja tutkituimmista probiooteista, eli terveysvaikutteisista bakteereista. Tämän teollisestikin merkittävän laktobasillin perimän luenta ja perimän vertailu toisen lähisukulaisen laktobasillin, L. rhamnosus LC705, perimään paljasti yllätyksellisen vähän perinnöllisiä eroja näiden kahden biologisesti erilaisen bakteerin välillä. Perimien vastaavuudesta huolimatta tutkimuksessa onnistuttiin laskennallisia menetelmiä hyödyntämällä kuitenkin myös tunnistamaan yhteensä viisi L. rhamnosus GG -bakteerille ominaista perimäjaksoa, joista merkittävimmän havaittiin sisältävän pili-tartuntaelimien biosynteesissä tarvittavan geeniryhmän. Työssä myös todistettiin pili-tartuntaelimen ilmentyminen bakteerisolun pinnalle ja tartuntaelimen erään osakomponentin merkitys L. rhamnosus GG -bakteerin sitoutumiselle ihmisen ruuansulatusjärjestelmää peittävään limaan. Yhdessä nämä löydökset todistivat kiistatta ensimmäistä kertaa pili-tartuntaelimen ilmentymisen hyötybakteerissa ja tarjosivat uraauurtavan näkökulman L. rhamnosus GG -bakteerin terveysvaikutuksille sekä kyvylle sitoutua ruuansulatusjärjestelmän eri osiin L. rhamnosus LC705 -bakteeria paremmin. Lisäksi väitöskirjatyössä selvitettiin ihmisen emättimessä runsaastikin läsnä olevan ja emätinterveydelle tärkeän Lactobacillus crispatus -bakteerin perinnöllistä perustaa. Työssä kartoitettiin L. crispatus -lajia hyvin edustavan kannan perimä. Vertaamalla kannan perimää yhdeksän muun saman lajin kannan perimiin, luotiin kattava kuvaus lajin ominaisuuksista ja tunnistettiin yhteensä 1224 geeniperhettä, joiden voidaan olettaa vastaavan bakteerin lajityypillisistä piirteistä. Nämä lajityypilliset geeniperheet muodostavat merkittävän osan kunkin L. crispatus -kannan perimästä, ja niiden joukosta onnistuttiin tunnistamaan lajin tarttumiskyvystä mahdollisesti vastaavia geenejä. Erään tällaisen tarttumisgeenin tuotteen osoitettiin myös kykenevän estämään Gardnerella vaginalis -haittabakteerin kiinnittymistä emättimen epiteelin. Tämä löydös selittää osaltaan L. crispatus -bakteerin roolia terveen emättimen valtalajina. Loppupäätelmänä voidaan esittää, että bakteerien perimän luenta ja bakteeriperäisten proteiinisekvenssien luokitusennustukset ovat äärimmäisen hyödyllinen tapa tulkita laktobasillien ilmentämiä ominaisuuksia ja löytää terveysvaikutteisia biomolekyylejä. Pili-tartuntaelimien ja G. vaginalis -haittabakteerin kiinnittymistä estävän proteiinin löytyminen ovat tärkeä edistysaskel kohti kokonaisvaltaista laktobasillien ja ihmisten vuorovaikutuksien ymmärtämistä ja voivat avata yhdessä kehitettyjen laskennallisten biologisten menetelmien kanssa täysin uudenlaisia lähestymistapoja tuottaa entistä parempia terveyttä edistäviä elintarvikkeita ja parantaa ihmisterveyttä

    Generating genomic resources for two crustacean species and their application to the study of White Spot Disease

    Get PDF
    Over the last decades the crustacean aquaculture sector has been steadily growing, in order to meet global demands for its products. A major hurdle for further growth of the industry is the prevalence of viral disease epidemics that are facilitated by the intense culture conditions. A devastating virus impacting on the sector is the White Spot Syndrome Virus (WSSV), responsible for over US $ 10 billion in losses in shrimp production and trade. The Pathogenicity of WSSV is high, reaching 100 % mortality within 3-10 days in penaeid shrimps. In contrast, the European shore crab Carcinus maenas has been shown to be relatively resistant to WSSV. Uncovering the basis of this resistance could help inform on the development of strategies to mitigate the WSSV threat. C. maenas has been used widely in studies on ecotoxicology and host-pathogen interactions. However, like most aquatic crustaceans, the genomic resources available for this species are limited, impairing experimentation. Therefore, to facilitate interpretations of the exposure studies, we first produced a C. maenas transcriptome and genome scaffold assembly. We also produced a transcriptome for the European lobster (Homarus gammarus), an ecologically and commercially important crustacean species in United Kingdom waters, for use in comparing WSSV responses in this, a susceptible species, and C. maenas. For the C. maenas transcriptome assembly we isolated and pooled RNA from twelve different tissues and sequenced RNA on an Illumina HiSeq 2500 platform. After de novo assembly a transcriptome encompassing 212,427 transcripts was produced. Similar, the H. gammarus transcriptome was based on RNA from nine tissues and contained 106,498 transcripts. The transcripts were filtered and annotated using a variety of tools (including BLAST, MEGAN and RSEM) and databases (including GenBank, Gene Ontology and KEGG). The annotation rate for transcripts in both transcriptomes was around 20-25 % which appears to be common for aquatic crustacean species, as a result of the lack of well annotated gene sequences for this clade. Since it is likely that the host immune system would play an important role in WSSV infection we characterized the IMD, JAK/STAT, Toll-like receptor and other innate immune system pathways. We found a strong overlap between the immune system pathways in C. maenas and H. gammarus. In addition we investigated the sequence diversity of known WSSV interacting proteins amongst susceptible penaeid shrimp/lobster and the more resistant C. maenas. There were differences in viral receptor sequences, like Rab7, that correlate with a less efficient infection by WSSV. To produce the genome scaffold assembly for C. maenas we isolated DNA from muscle tissue and produced both paired-end and mate pair libraries for processing on the Illumina HiSeq 2500 platform. A de novo draft genome assembly consisting of 338,980 scaffolds and covering 362 Mb (36 % of estimated genome size) was produced, using SOAP-denovo2 coupled with the BESST scaffolding system. The generated assembly was highly fragmented due to the presence of repetitive areas in the C. maenas genome. Using a combination of ab initio predictors, RNA-sequencing data from the transcriptome datasets and curated C. maenas sequences we produced a model encompassing 10,355 genes. The gene model for C. maenas Dscam, a gene potentially involved in (pan)crustacean immune memory, was investigated in greater detail as manual curation can improve on the results of ab initio predictors. The scaffold containing C. maenas Dscam was fragmented, thus only contained the latter exons of the gene. The assembled draft genome and transcriptomes for C. maenas and H. gammarus are valuable molecular resources for studies involving these and other aquatic crustacean species. To uncover the basis of their resistance to WSSV, we infected C. maenas with WSSV and measured mRNA and miRNA expression for 7 time points spread over a period of 28 days, using RNA-Seq and miRNA-Seq. The resistance of C. maenas to WSSV infection was confirmed by the fact that no mortalities occurred. In these animals replicating WSSV was latent and detected only after 7 days, and this occurred in five of out 28 infected crabs only. Differential expression of transcripts and miRNAs were identified for each time point. In the first 12 hours post exposure we observed decreased expression of important regulators in endocytosis. Since it is established that WSSV enters the host cells through endocytosis and that interactions between the viral protein VP28 and Rab7 are important in successful infection, it is likely that changes in this process could impact WSSV infection success. Additionally we observed an increased expression of transcripts involved in RNA interference pathways across many time points, indicating a longer term response to initial viral exposure. miRNA sequencing showed several miRNAs that were differentially expressed. The most striking finding was a novel C. maenas miRNA that we found to be significantly downregulated in every WSSV infected individual, suggesting that it may play an important role in mediating the response of the host to the virus. In silico target prediction pointed to the involvement of this miRNA in endocytosis regulation. Taken together we hypothesize that C. maenas resistance to WSSV involves obstruction of viral entry by endocytosis, a process probably regulated through miRNAs, resulting in inefficient uptake of virions.Cefa

    Quasispecies dynamics and treatment outcome during early hepatitis C infection in a cohort of HIV-infected men

    Get PDF
    Hepatitis C virus (HCV) is emerging as one of the leading causes of morbidity and mortality in individuals infected with HIV and has overtaken AIDS-defining illnesses as a cause of death in HIV patient populations who have access to highly active antiretroviral therapy. For many years, the clonal analysis was the reference method for investigating viral diversity. In this thesis, a next generation sequencing (NGS) approach was developed using 454 pyrosequencing and Illumina-based technology. A sequencing pipeline was developed using two different NGS approaches, nested PCR, and metagenomics. The pipeline was used to study the viral populations in the sera of HCV-infected patients from a unique cohort of 160 HIV-positive patients with early HCV infection. These pipelines resulted in an improved understanding of HCV quasispecies dynamics, especially regarding studying response to treatment. Low viral diversity at baseline correlated with sustained virological response (SVR) while high viral diversity at baseline was associated with treatment failure. The emergence of new viral strains following treatment failure was most commonly associated with emerging dominance of pre-existing minority variants rather than re-infection. In the new era of direct-acting antivirals, next generation sequencing technologies are the most promising tool for identifying minority variants present in the HCV quasispecies populations at baseline. In this cohort, several mutations conferring resistance were detected in genotype 1a treatment-naïve patients. Further research into the impact of baseline HCV variants on SVR rates should be carried out in this population. A clearer understanding of the properties of viral quasispecies would enable clinicians to make improved treatment choices for their patients
    corecore