696 research outputs found

    Recovering the state sequence of hidden Markov models using mean-field approximations

    Full text link
    Inferring the sequence of states from observations is one of the most fundamental problems in Hidden Markov Models. In statistical physics language, this problem is equivalent to computing the marginals of a one-dimensional model with a random external field. While this task can be accomplished through transfer matrix methods, it becomes quickly intractable when the underlying state space is large. This paper develops several low-complexity approximate algorithms to address this inference problem when the state space becomes large. The new algorithms are based on various mean-field approximations of the transfer matrix. Their performances are studied in detail on a simple realistic model for DNA pyrosequencing.Comment: 43 pages, 41 figure

    Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data

    Get PDF
    There is a high prevalence of coronary artery disease (CAD) in patients with left bundle branch block (LBBB); however there are many other causes for this electrocardiographic abnormality. Non-invasive assessment of these patients remains difficult, and all commonly used modalities exhibit several drawbacks. This often leads to these patients undergoing invasive coronary angiography which may not have been necessary. In this review, we examine the uses and limitations of commonly performed non-invasive tests for diagnosis of CAD in patients with LBBB

    Low-frequency variant detection in viral populations using massively parallel sequencing data

    Get PDF

    Analysis of RAD sequencing data from species of Mediterranean cicadas

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2019Compreender a divergência e especiação entre espécies próximas sempre foi um tema desafiador no âmbito da biologia evolutiva. Os marcadores de DNA citoplasmáticos, os quais muitas vezes são usados em investigações no contexto de marcadores moleculares, nem sempre deram resultados bem-sucedidos que conseguissem resolver as respetivas filogenias e outras questões. Nos últimos anos, com o surgimento da Nova Geração de Tecnologias de Sequenciação e técnicas associadas que tiram partido de uma reduzida representação do genoma, é agora possível responder a questões relacionadas com a divergência populações e especiação. Aqui retratamos o potencial de uma dessas técnicas – Restriction-site Associated DNA (RAD) Sequencing -, para contribuir para a resolução de algumas questões no âmbito da especiação de um grupo particular de insetos, as cigarras mediterrânicas do género Tettigettalna. A técnica RAD sequencing tira partido da Illumina, uma das Tecnologias da Nova Geração de Sequenciação, para gerar dados genómicos de zonas adjacentes a locais de corte de restrição por enzimas (RAD tags). Isto permite simultaneamente identificar e marcar milhares de SNPs espalhados por todo o genoma, de qualquer tamanho, em centenas de indivíduos e para organismos modelo ou não. Como a RAD-Seq é uma técnica de sequenciação de reduzida representação do genoma, é claro que o seu uso tem muitas mais vantagens em comparação com técnicas de sequenciação de todo o genoma. Isto permitiu que a RAD-Seq se tenha tornado a metodologia genómica mais usada para a descoberta de SNPs em estudos filogenéticos e de evolução de organismos não-modelo como é o caso das espécies de cicadas do género Tettigettalna. Este género constitui um complexo de espécies de cigarras intimamente relacionadas que divergiram recentemente. Elas são morfologicamente semelhantes o que as torna um desafiante grupo taxonómico. Adicionalmente, o canto de chamamento produzido pelos machos é a principal característica que permite a distinção entre as espécies. Na Península Ibérica, a diversidade das cigarras foi amplamente subestimada até à recente descrição e revisão taxonómica de nove espécies de cicadas de pequeno porte pertencentes ao género Tettigettalna: Tettigettalna mariae, Tettigettalna argentata, Tettigettalna aneabi, Tettigettalna josei, Tettigettalna defauti, Tettigettalna armandi, Tettigettalna helianthemi, Tettigettalna boulardi e Tettigettalna estrellae. Algumas das espécies mencionadas são restritas a Espanha, sendo que apenas uma delas, Tettigettalna estrellae, é restrita a Portugal. Tettigettalna argentata é a única que para além da Península Ibérica se estende para mais países Europeus. Alguns estudos focados nas espécies da zona do Mediterrâneo pertencentes a este género evidenciaram a ocorrência de simpatria entre algumas espécies de Tettigettalna do sudoeste da Península Ibérica. As populações de Tettigettalna argentata têm uma distribuição que faz com que por vezes se sobreponham com outras populações de outras espécies. No Algarve (Portugal), as populações de Tettigettalna mariae e Tettigettalna argentata podem ser encontradas em simpatria ou parapatria. Estas duas espécies são consideradas um complexo de espécies gémeas, sendo morfologicamente muito semelhantes e apenas se distinguindo pelo seu canto de chamamento. Trabalhos baseados na análise de sequências mitocondriais (COI) permitiram a separação de populações de Tettigettalna argentata em clade do norte e clade do Sul. Adicionalmente, este clade do Sul revelou não ser geneticamente distinto dos espécimenes de Tettigettalna mariae, com o qual partilha a maior parte dos haplótipos. Assim, é muitas vezes impossível discriminar os espécimenes de T. mariae dos espécimenes de T. argentata (clade do Sul) com base apenas na análise de sequências COI. Como referido, as espécies de Tettigettalna podem ser distinguidas através dos sons produzidos pelos machos, pelo que se pensa que estes sinais acústicos possam ter um papel preponderante no isolamento reprodutivo das espécies. Na verdade, estudos baseados em dados de acústica revelam que diferentes espécies têm diferentes padrões acústicos. Porém, outros trabalhos com dados genéticos não esclarecem várias questões. Nomeadamente, se a partilha de haplótipos entre o clade Sul de Tettigettalna argentata e as Tettigettalna mariae será devida a introgressão (existência de fluxo genético entre populações) ou “Incomplete Lineage Sorting”, (segregação imperfeita de alelos em linhagens bem definidas). Os trabalhos realizados apontam assim para a necessidade de uma metodologia multilocus que possa ser uma melhor abordagem a adotar, por forma a responder às questoes acima mencionadas. Neste trabalho, utilizámos então uma abordagem multilocus, ou seja, dados de RAD-Seq das cigarras do género Tettigettalna. Com este tipo de dados e utilizando ferramentas de limpeza e filtragem dos dados, como o Ipyrad, VCFtools e outros scripts, foi assim possível gerar resultados que permitiram responder melhor a questões que até agora não tinham sido respondidas à luz de abordagens single locus e/ou com dados de outras naturezas. Com esta nova abordagem mostrámos que os dados RAD-Seq tornam evidentes os padrões de distribuição geográficos das espécies/populações das cigarras do género Tettigettalna, bem como parecem indicar que a partilha de haplótipos entre Tettigettalna argentata e Tettigettalna mariae de populações simpátricas na região Algarvia, é explicada pelo fenómeno de introgressão.Understanding population divergence and speciation among closely related species has long been a challenge in evolutionary biology. Cytoplasmic DNA markers, which have been widely used in the context of molecular barcoding, have not always proved successful in resolving phylogenies and other related questions. With the advent of Next-Generation Sequencing technologies and associated techniques of reduced genome representation, not only the phylogenies of closely related species are now being resolved at a much greater detail, but are also allpwing a much better understanding on divergence and speciation patterns and processes. Here we examine the potential of one of such techniques - Restriction-site Associated DNA (RAD) sequencing -, in disentangling questions related to the divergence and speciation of a particular group of insects, the meditteranean cicadas from the Tettigettalna genus. This genus constitutes a complex of closely related and recently diverged species. They are morphologically similar what makes them a taxonomical challenging group. The calling songs are the main character used for their identification. Work focused on the Mediterranean species of this genus revealed the accurance of sympatric populations among some of the southern Iberian Tettigettalna species. In fact, Tettigettalna mariae and Tettigettalna argentata populations can be found in sympatry or close parapatry. As already referred, these two species are morphologically very similar and only distinguishable by their calling songs. However, mitochondrial COI studies also showed that these species share haplotypes but the results couldn’t reveal if this sharing was due to introgression (existence of gene flow between populations) or incomplete lineage sorting (defective segregation of alleles into well-defined lineages). The present multilocus approach with RAD-Seq data, not only revealed a better understanding of the geographical patterns of distribution of the Tettigettalna species and populations, but also gave evidence that it is the phenomenom of introgression that explains the sharing of haplotypes between Tettigettalna argentata and Tettigettalna mariae, when in sympatry. Therefore, the use of the Next-Generation sequencing data, in particular RAD-seq data, in this thesis has reinforced the utility of the methodology applied to solve problems related to recent diverged complexes of species, such our study group of insects in which we were able to give a significant contribution to a better understanding of its divergence and speciation

    Statistical methods for high-throughput genomic data

    Get PDF

    Multi-omics analysis of early molecular mechanisms of type 1 diabetes

    Get PDF
    Type 1 diabetes (T1D) is a complicated autoimmune disease with largely unknown disease mechanisms. The diagnosis is preceded by a long asymptomatic period of autoimmune activity in the insulin-producing pancreatic islets. Currently the only clinical markers used for T1D prediction are islet autoantibodies, which are a sign of already-broken immune tolerance. The focus of this dissertation is on the early asymptomatic period preceding seroconversion to islet autoantibody positivity. The genetic risk of type 1 diabetes has been thoroughly mapped in genome-wide association studies, but environmental factors and molecular mechanisms that mediate the risk are less well understood. According to the hygiene hypothesis, the risk of immune-mediated disorders is increased by the lack of exposure to pathogens in modern environments. Within a study on the hygiene hypothesis, we compared umbilical cord blood gene expression patterns between children born in environments with contrasting standards of living and type 1 diabetes incidences (Finland, Russia, and Estonia). The differentially expressed genes were associated with innate immunity and immune maturation. Our results suggest that the environment influences the immune system development already in-utero. Furthermore, we analyzed genome-wide DNA methylation and gene expression profiles in samples collected prospectively from Finnish children and newborn infants at risk of type 1 diabetes. Bisulfite sequencing analysis did not show any association of neonatal DNA methylation with later progression to T1D. However, antiviral type I interferon response in early childhood was found to be a risk factor of T1D. This transcriptomic signature was detectable in the peripheral blood already before islet autoantibodies, and the main observations were confirmed in an independent German study. These results contributed to the hypothesis that virus infections might play a role in T1D. Additionally, this dissertation contributed to transcriptomic and epigenomic data analysis workflows. Simple probe-level analysis of exon array data was shown to improve the reproducibility, specificity, and sensitivity of detected differential exon inclusion events. Type 1 error rate was markedly reduced by permutation-based significance assessment of differential methylation in bisulfite sequencing studies.Tyypin 1 diabeteksen varhaisten molekulaaristen mekanismien multiomiikka-analyysi Tyypin 1 diabetes (T1D) on autoimmuunitauti, jonka taustalla olevista mekanismeista tiedetään vähän. Diagnoosia edeltää pitkä oireeton jakso, jonka aikana insuliinia tuottaviin beetasoluihin kohdistuva autoimmuunireaktio etenee haiman saarekkeissa. Tämä väitöskirjatutkimus keskittyy T1D:n varhaiseen oireettomaan ajanjaksoon, joka edeltää serokonversiota autovasta-ainepositiiviseksi. Tyypin 1 diabeteksen geneettiset riskitekijät on kartoitettu perusteellisesti genominlaajuisissa assosiaatiotutkimuksissa, mutta ympäristön riskitekijöistä ja riskiä välittävistä molekyylimekanismeista tiedetään vähemmän. Hygieniahypoteesin mukaan vähäinen altistuminen taudinaiheuttajille lisää immuunijärjestelmän häiriöiden riskiä. Hygieniahypoteesiin liittyvässä osatyössä vertasimme hygienian ja T1D:n ilmaantuvuuden suhteen erilaisissa ympäristöissä (Suomi, Venäjä ja Viro) syntyneiden lasten napaveren geeniekpressioprofiileja. Erilaisesti ekspressoituneet geenit liittyivät synnynnäiseen immuniteettiin ja immuunijärjestelmän maturaatioon. Näiden tulosten perusteella ympäristö saattaa vaikuttaa immuunijärjestelmän kehitykseen jo raskauden aikana. Genominlaajuista DNA-metylaatiota ja geeniekspressiota analysoitiin näytteistä, jotka oli kerätty laajassa suomalaisessa seurantatutkimuksessa T1D:n riskiryhmään kuuluvilta lapsilta ja vastasyntyneiltä. Bisulfiittisekvensointianalyysin perusteella vastasyntyneen DNA-metylaation ja lapsuuden aikana kehittyvän T1D:n välillä ei ollut yhteyttä. Sen sijaan RNA:n tasolla havaittava viruksiin kohdistuva tyypin 1 interferonivaste varhaislapsuudessa todettiin T1D:n riskitekijäksi. Tämä havainto tehtiin perifeerisestä verestä jo ennen saarekevasta-aineiden ilmaantumista, ja päähavainnot vahvistettiin saksalaisessa tutkimuksessa. Nämä tulokset vahvistivat hypoteesia, jonka mukaan virukset voivat vaikuttaa T1D:n puhkeamiseen. T1D-tutkimuksen ohella tämä väitöskirjatyö kehitti transkriptomiikkaan ja epigenomiikkaan sopivia analyysimenetelmiä. Eksonimikrosirujen koetintasoisen analyysin todettiin parantavan toistettavuutta, sensitiivisyyttä ja tarkkuutta vaihtoehtoisen silmukoinniin kartoittamisessa. Tilastollisen merkitsevyyden permutaatiopohjainen analyysi vähensi tyypin 1 virhettä bisulfiittisekvensointidatan analyysissa

    Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Big sagebrush (<it>Artemisia tridentata</it>) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush.</p> <p>Results</p> <p>cDNA of <it>A. tridentata </it>sspp. <it>tridentata </it>and <it>vaseyana </it>were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. <it>tridentata </it>and 20,250 contigs in ssp. <it>vaseyana</it>. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e<sup>-15</sup>). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. <it>wyomingensis</it>) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. <it>tridentata </it>and <it>vaseyana </it>identified in the combined assembly were also polymorphic within the two geographically distant ssp. <it>wyomingensis </it>samples.</p> <p>Conclusion</p> <p>We have produced a large EST dataset for <it>Artemisia tridentata</it>, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. <it>wyomingensis </it>via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches.</p
    corecore