1,288 research outputs found

    Analysis of RAD sequencing data from species of Mediterranean cicadas

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2019Compreender a divergência e especiação entre espécies próximas sempre foi um tema desafiador no âmbito da biologia evolutiva. Os marcadores de DNA citoplasmáticos, os quais muitas vezes são usados em investigações no contexto de marcadores moleculares, nem sempre deram resultados bem-sucedidos que conseguissem resolver as respetivas filogenias e outras questões. Nos últimos anos, com o surgimento da Nova Geração de Tecnologias de Sequenciação e técnicas associadas que tiram partido de uma reduzida representação do genoma, é agora possível responder a questões relacionadas com a divergência populações e especiação. Aqui retratamos o potencial de uma dessas técnicas – Restriction-site Associated DNA (RAD) Sequencing -, para contribuir para a resolução de algumas questões no âmbito da especiação de um grupo particular de insetos, as cigarras mediterrânicas do género Tettigettalna. A técnica RAD sequencing tira partido da Illumina, uma das Tecnologias da Nova Geração de Sequenciação, para gerar dados genómicos de zonas adjacentes a locais de corte de restrição por enzimas (RAD tags). Isto permite simultaneamente identificar e marcar milhares de SNPs espalhados por todo o genoma, de qualquer tamanho, em centenas de indivíduos e para organismos modelo ou não. Como a RAD-Seq é uma técnica de sequenciação de reduzida representação do genoma, é claro que o seu uso tem muitas mais vantagens em comparação com técnicas de sequenciação de todo o genoma. Isto permitiu que a RAD-Seq se tenha tornado a metodologia genómica mais usada para a descoberta de SNPs em estudos filogenéticos e de evolução de organismos não-modelo como é o caso das espécies de cicadas do género Tettigettalna. Este género constitui um complexo de espécies de cigarras intimamente relacionadas que divergiram recentemente. Elas são morfologicamente semelhantes o que as torna um desafiante grupo taxonómico. Adicionalmente, o canto de chamamento produzido pelos machos é a principal característica que permite a distinção entre as espécies. Na Península Ibérica, a diversidade das cigarras foi amplamente subestimada até à recente descrição e revisão taxonómica de nove espécies de cicadas de pequeno porte pertencentes ao género Tettigettalna: Tettigettalna mariae, Tettigettalna argentata, Tettigettalna aneabi, Tettigettalna josei, Tettigettalna defauti, Tettigettalna armandi, Tettigettalna helianthemi, Tettigettalna boulardi e Tettigettalna estrellae. Algumas das espécies mencionadas são restritas a Espanha, sendo que apenas uma delas, Tettigettalna estrellae, é restrita a Portugal. Tettigettalna argentata é a única que para além da Península Ibérica se estende para mais países Europeus. Alguns estudos focados nas espécies da zona do Mediterrâneo pertencentes a este género evidenciaram a ocorrência de simpatria entre algumas espécies de Tettigettalna do sudoeste da Península Ibérica. As populações de Tettigettalna argentata têm uma distribuição que faz com que por vezes se sobreponham com outras populações de outras espécies. No Algarve (Portugal), as populações de Tettigettalna mariae e Tettigettalna argentata podem ser encontradas em simpatria ou parapatria. Estas duas espécies são consideradas um complexo de espécies gémeas, sendo morfologicamente muito semelhantes e apenas se distinguindo pelo seu canto de chamamento. Trabalhos baseados na análise de sequências mitocondriais (COI) permitiram a separação de populações de Tettigettalna argentata em clade do norte e clade do Sul. Adicionalmente, este clade do Sul revelou não ser geneticamente distinto dos espécimenes de Tettigettalna mariae, com o qual partilha a maior parte dos haplótipos. Assim, é muitas vezes impossível discriminar os espécimenes de T. mariae dos espécimenes de T. argentata (clade do Sul) com base apenas na análise de sequências COI. Como referido, as espécies de Tettigettalna podem ser distinguidas através dos sons produzidos pelos machos, pelo que se pensa que estes sinais acústicos possam ter um papel preponderante no isolamento reprodutivo das espécies. Na verdade, estudos baseados em dados de acústica revelam que diferentes espécies têm diferentes padrões acústicos. Porém, outros trabalhos com dados genéticos não esclarecem várias questões. Nomeadamente, se a partilha de haplótipos entre o clade Sul de Tettigettalna argentata e as Tettigettalna mariae será devida a introgressão (existência de fluxo genético entre populações) ou “Incomplete Lineage Sorting”, (segregação imperfeita de alelos em linhagens bem definidas). Os trabalhos realizados apontam assim para a necessidade de uma metodologia multilocus que possa ser uma melhor abordagem a adotar, por forma a responder às questoes acima mencionadas. Neste trabalho, utilizámos então uma abordagem multilocus, ou seja, dados de RAD-Seq das cigarras do género Tettigettalna. Com este tipo de dados e utilizando ferramentas de limpeza e filtragem dos dados, como o Ipyrad, VCFtools e outros scripts, foi assim possível gerar resultados que permitiram responder melhor a questões que até agora não tinham sido respondidas à luz de abordagens single locus e/ou com dados de outras naturezas. Com esta nova abordagem mostrámos que os dados RAD-Seq tornam evidentes os padrões de distribuição geográficos das espécies/populações das cigarras do género Tettigettalna, bem como parecem indicar que a partilha de haplótipos entre Tettigettalna argentata e Tettigettalna mariae de populações simpátricas na região Algarvia, é explicada pelo fenómeno de introgressão.Understanding population divergence and speciation among closely related species has long been a challenge in evolutionary biology. Cytoplasmic DNA markers, which have been widely used in the context of molecular barcoding, have not always proved successful in resolving phylogenies and other related questions. With the advent of Next-Generation Sequencing technologies and associated techniques of reduced genome representation, not only the phylogenies of closely related species are now being resolved at a much greater detail, but are also allpwing a much better understanding on divergence and speciation patterns and processes. Here we examine the potential of one of such techniques - Restriction-site Associated DNA (RAD) sequencing -, in disentangling questions related to the divergence and speciation of a particular group of insects, the meditteranean cicadas from the Tettigettalna genus. This genus constitutes a complex of closely related and recently diverged species. They are morphologically similar what makes them a taxonomical challenging group. The calling songs are the main character used for their identification. Work focused on the Mediterranean species of this genus revealed the accurance of sympatric populations among some of the southern Iberian Tettigettalna species. In fact, Tettigettalna mariae and Tettigettalna argentata populations can be found in sympatry or close parapatry. As already referred, these two species are morphologically very similar and only distinguishable by their calling songs. However, mitochondrial COI studies also showed that these species share haplotypes but the results couldn’t reveal if this sharing was due to introgression (existence of gene flow between populations) or incomplete lineage sorting (defective segregation of alleles into well-defined lineages). The present multilocus approach with RAD-Seq data, not only revealed a better understanding of the geographical patterns of distribution of the Tettigettalna species and populations, but also gave evidence that it is the phenomenom of introgression that explains the sharing of haplotypes between Tettigettalna argentata and Tettigettalna mariae, when in sympatry. Therefore, the use of the Next-Generation sequencing data, in particular RAD-seq data, in this thesis has reinforced the utility of the methodology applied to solve problems related to recent diverged complexes of species, such our study group of insects in which we were able to give a significant contribution to a better understanding of its divergence and speciation

    Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris

    Get PDF
    Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity.Peer reviewe

    Phylogenomics of Porites from the Arabian Peninsula

    Get PDF
    The advent of high throughput sequencing technologies provides an opportunity to resolve phylogenetic relationships among closely related species. By incorporating hundreds to thousands of unlinked loci and single nucleotide polymorphisms (SNPs), phylogenomic analyses have a far greater potential to resolve species boundaries than approaches that rely on only a few markers. Scleractinian taxa have proved challenging to identify using traditional morphological approaches and many groups lack an adequate set of molecular markers to investigate their phylogenies. Here, we examine the potential of Restriction-site Associated DNA sequencing (RADseq) to investigate phylogenetic relationships and species limits within the scleractinian coral genus Porites. A total of 126 colonies were collected from 16 localities in the seas surrounding the Arabian Peninsula and ascribed to 12 nominal and two unknown species based on their morphology. Reference mapping was used to retrieve and compare nearly complete mitochondrial genomes, ribosomal DNA, and histone loci. De novo assembly and reference mapping to the P. lobata coral transcriptome were compared and used to obtain thousands of genome-wide loci and SNPs. A suite of species discovery methods (phylogenetic, ordination, and clustering analyses) and species delimitation approaches (coalescent-based, species tree, and Bayesian Factor delimitation) suggested the presence of eight molecular lineages, one of which included six morphospecies. Our phylogenomic approach provided a fully supported phylogeny of Porites from the Arabian Peninsula, suggesting the power of RADseq data to solve the species delineation problem in this speciose coral genus

    Low impact of different SNP panels from two building-loci pipelines on RAD-Seq population genomic metrics: case study on five diverse aquatic species

    Get PDF
    The irruption of Next-generation sequencing (NGS) and restriction site-associated DNA sequencing(RAD-seq) in the last decade has led to the identification of thousands of molecular markers and their genotyping for refined genomic screening. This approach has been especially useful for non-model organisms with limited genomic resources. Many building-loci pipelines have been developed to obtain robust single nucleotide polymorphism (SNPs) genotyping datasets using a de novo RAD-seq approach, i.e. without reference genomes.Here, the performances of two building-loci pipelines, STACKS 2 and Meyer’s 2b-RAD v2.1 pipeline, were compared using a diverse set of aquatic species representing different genomic and/or population structure scenarios. Two bivalve species (Manila clam and common edible cockle) and three fish species (brown trout, silver catfish and small-spotted catshark) were studied. Four SNP panels were evaluated in each species to test both different building-loci pipelines and criteria for SNP selection. Furthermore, for Manila clam and brown trout, a reference genome approach was used as control

    Low impact of different SNP panels from two building-loci pipelines on RAD-Seq population genomic metrics: case study on five diverse aquatic species

    Get PDF
    Background: The irruption of Next-generation sequencing (NGS) and restriction site-associated DNA sequencing (RAD-seq) in the last decade has led to the identification of thousands of molecular markers and their genotyping for refined genomic screening. This approach has been especially useful for non-model organisms with limited genomic resources. Many building-loci pipelines have been developed to obtain robust single nucleotide polymorphism (SNPs) genotyping datasets using a de novo RAD-seq approach, i.e. without reference genomes. Here, the performances of two building-loci pipelines, STACKS 2 and Meyer’s 2b-RAD v2.1 pipeline, were compared using a diverse set of aquatic species representing different genomic and/or population structure scenarios. Two bivalve species (Manila clam and common edible cockle) and three fish species (brown trout, silver catfish and small-spotted catshark) were studied. Four SNP panels were evaluated in each species to test both different building-loci pipelines and criteria for SNP selection. Furthermore, for Manila clam and brown trout, a reference genome approach was used as control. Results: Despite different outcomes were observed between pipelines and species with the diverse SNP calling and filtering steps tested, no remarkable differences were found on genetic diversity and differentiation within species with the SNP panels obtained with a de novo approach. The main differences were found in brown trout between the de novo and reference genome approaches. Genotyped vs missing data mismatches were the main genotyping difference detected between the two building-loci pipelines or between the de novo and reference genome comparisons. Conclusions: Tested building-loci pipelines for selection of SNP panels seem to have low influence on population genetics inference across the diverse case-study scenarios here studied. However, preliminary trials with different bioinformatic pipelines are suggested to evaluate their influence on population parameters according with the specific goals of each studyThe work undertaken in this project was funded by Xunta de Galicia Autonomous Government (GRC2014/010), Interreg Atlantic Area (Cockles project, EAPA_458/2016) and Girona University (MPCUdG2016/060) projects. Adrián Casanova was a Xunta de Galicia fellowship (ED481A-2017/091)S

    Low impact of different SNP panels from two building-loci pipelines on RAD-Seq population genomic metrics: case study on five diverse aquatic species

    Get PDF
    Información complementaria: https://doi.org/10.1186/s12864-021-07465-w.Background: The irruption of Next-generation sequencing (NGS) and restriction site-associated DNA sequencing (RAD-seq) in the last decade has led to the identification of thousands of molecular markers and their genotyping for refined genomic screening. This approach has been especially useful for non-model organisms with limited genomic resources. Many building-loci pipelines have been developed to obtain robust single nucleotide polymorphism (SNPs) genotyping datasets using a de novo RAD-seq approach, i.e. without reference genomes. Here, the performances of two building-loci pipelines, STACKS 2 and Meyer’s 2b-RAD v2.1 pipeline, were compared using a diverse set of aquatic species representing different genomic and/or population structure scenarios. Two bivalve species (Manila clam and common edible cockle) and three fish species (brown trout, silver catfish and small-spotted catshark) were studied. Four SNP panels were evaluated in each species to test both different building-loci pipelines and criteria for SNP selection. Furthermore, for Manila clam and brown trout, a reference genome approach was used as control. Results: Despite different outcomes were observed between pipelines and species with the diverse SNP calling and filtering steps tested, no remarkable differences were found on genetic diversity and differentiation within species with the SNP panels obtained with a de novo approach. The main differences were found in brown trout between the de novo and reference genome approaches. Genotyped vs missing data mismatches were the main genotyping difference detected between the two building-loci pipelines or between the de novo and reference genome comparisons. Conclusions: Tested building-loci pipelines for selection of SNP panels seem to have low influence on population genetics inference across the diverse case-study scenarios here studied. However, preliminary trials with different bioinformatic pipelines are suggested to evaluate their influence on population parameters according with the specific goals of each study

    CAPRG: Sequence Assembling Pipeline for Next Generation Sequencing of Non-Model Organisms

    Get PDF
    Our goal is to introduce and describe the utility of a new pipeline “Contigs Assembly Pipeline using Reference Genome” (CAPRG), which has been developed to assemble “long sequence reads” for non-model organisms by leveraging a reference genome of a closely related phylogenetic relative. To facilitate this effort, we utilized two avian transcriptomic datasets generated using ROCHE/454 technology as test cases for CAPRG assembly. We compared the results of CAPRG assembly using a reference genome with the results of existing methods that utilize de novo strategies such as VELVET, PAVE, and MIRA by employing parameter space comparisons (intra-assembling comparison). CAPRG performed as well or better than the existing assembly methods based on various benchmarks for “gene-hunting.” Further, CAPRG completed the assemblies in a fraction of the time required by the existing assembly algorithms. Additional advantages of CAPRG included reduced contig inflation resulting in lower computational resources for annotation, and functional identification for contigs that may be categorized as “unknowns” by de novo methods. In addition to providing evaluation of CAPRG performance, we observed that the different assembly (inter-assembly) results could be integrated to enhance the putative gene coverage for any transcriptomics study

    Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification

    Get PDF
    K-mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k-mers. These were obtained by generating the possible combinations of match positions and don't care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k-mers could reduce the size of the k-mer frequency feature's dimension. To measure the accuracy of the proposed method we used the naïve Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k-mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k-mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.
    corecore