976 research outputs found
Fast local fragment chaining using sum-of-pair gap costs
<p>Abstract</p> <p>Background</p> <p>Fast seed-based alignment heuristics such as <monospace>BLAST</monospace> and <monospace>BLAT</monospace> have become indispensable tools in comparative genomics for all studies aiming at the evolutionary relations of proteins, genes, and non-coding RNAs. This is true in particular for the large mammalian genomes. The sensitivity and specificity of these tools, however, crucially depend on parameters such as seed sizes or maximum expectation values. In settings that require high sensitivity the amount of short local match fragments easily becomes intractable. Then, fragment chaining is a powerful leverage to quickly connect, score, and rank the fragments to improve the specificity.</p> <p>Results</p> <p>Here we present a fast and flexible fragment chainer that for the first time also supports a sum-of-pair gap cost model. This model has proven to achieve a higher accuracy and sensitivity in its own field of application. Due to a highly time-efficient index structure our method outperforms the only existing tool for fragment chaining under the linear gap cost model. It can easily be applied to the output generated by alignment tools such as <monospace>segemehl</monospace> or <monospace>BLAST</monospace>. As an example we consider homology-based searches for human and mouse snoRNAs demonstrating that a highly sensitive <monospace>BLAST</monospace> search with subsequent chaining is an attractive option. The sum-of-pair gap costs provide a substantial advantage is this context.</p> <p>Conclusions</p> <p>Chaining of short match fragments helps to quickly and accurately identify regions of homology that may not be found using local alignment heuristics alone. By providing both the linear and the sum-of-pair gap cost model, a wider range of application can be covered. The software clasp is available at <url>http://www.bioinf.uni-leipzig.de/Software/clasp/</url>.</p
Sensitive Long-Indel-Aware Alignment of Sequencing Reads
The tremdendous advances in high-throughput sequencing technologies have made
population-scale sequencing as performed in the 1000 Genomes project and the
Genome of the Netherlands project possible. Next-generation sequencing has
allowed genom-wide discovery of variations beyond single-nucleotide
polymorphisms (SNPs), in particular of structural variations (SVs) like
deletions, insertions, duplications, translocations, inversions, and even more
complex rearrangements. Here, we design a read aligner with special emphasis on
the following properties: (1) high sensitivity, i.e. find all (reasonable)
alignments; (2) ability to find (long) indels; (3) statistically sound
alignment scores; and (4) runtime fast enough to be applied to whole genome
data. We compare performance to BWA, bowtie2, stampy and find that our methods
is especially advantageous on reads containing larger indels
Computational Molecular Biology
Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography
Improvements in the Accuracy of Pairwise Genomic Alignment
Pairwise sequence alignment is a fundamental problem in bioinformatics with wide applicability. This thesis presents three new algorithms for this well-studied problem. First, we present a new algorithm, RDA, which aligns sequences in small segments, rather than by individual bases. Then, we present two algorithms for aligning long genomic sequences: CAPE, a pairwise global aligner, and FEAST, a pairwise local aligner.
RDA produces interesting alignments that can be substantially different in structure than traditional alignments. It is also better than traditional alignment at the task of homology detection. However, its main negative is a very slow run time. Further, although it produces alignments with different structure, it is not clear if the differences have a practical value in genomic research.
Our main success comes from our local aligner, FEAST. We describe two main improvements: a new more descriptive model of evolution, and a new local extension algorithm that considers all possible evolutionary histories rather than only the most likely. Our new model of evolution provides for improved alignment accuracy, and substantially improved parameter training. In particular, we produce a new parameter set for aligning human and mouse sequences that properly describes regions of weak similarity and regions of strong similarity. The second result is our new extension algorithm. Depending on heuristic settings, our new algorithm can provide for more sensitivity than existing extension algorithms, more specificity, or a combination of the two.
By comparing to CAPE, our global aligner, we find that the sensitivity increase provided by our local extension algorithm is so substantial that it outperforms CAPE on sequence with 0.9 or more expected substitutions per site. CAPE itself gives improved sensitivity for sequence with 0.7 or more expected substitutions per site, but at a great run time cost. FEAST and our local extension algorithm improves on this too, the run time is only slightly slower than existing local alignment algorithms and asymptotically the same
Efficient methods for read mapping
DNA sequencing is the mainstay of biological and medical research. Modern sequencing machines can read millions of DNA fragments, sampling the underlying genomes at high-throughput. Mapping the resulting reads to a reference genome is typically the first step in sequencing data analysis. The problem has many variants as the reads can be short or long with a low or high error rate for different sequencing technologies, and the reference can be a single genome or a graph representation of multiple genomes. Therefore, it is crucial to develop efficient computational methods for these different problem classes. Moreover, continually declining sequencing costs and increasing throughput pose challenges to the previously developed methods and tools that cannot handle the growing volume of sequencing data.
This dissertation seeks to advance the state-of-the-art in the established field of read mapping by proposing more efficient and scalable read mapping methods as well as tackling emerging new problem areas. Specifically, we design ultra-fast methods to map two types of reads: short reads for high-throughput chromatin profiling and nanopore raw reads for targeted sequencing in real-time. In tune with the characteristics of these types of reads, our methods can scale to larger sequencing data sets or map more reads correctly compared with the state-of-the-art mapping software. Furthermore, we propose two algorithms for aligning sequences to graphs, which is the foundation of mapping reads to graph-based reference genomes. One algorithm improves the time complexity of existing sequence to graph alignment algorithms for linear or affine gap penalty. The other algorithm provides good empirical performance in the case of the edit distance metric. Finally, we mathematically formulate the problem of validating paired-end read constraints when mapping sequences to graphs, and propose an exact algorithm that is also fast enough for practical use.Ph.D
The mapping task and its various applications in next-generation sequencing
The aim of this thesis is the development and benchmarking of
computational methods for the analysis of high-throughput data from
tiling arrays and next-generation sequencing. Tiling arrays have been
a mainstay of genome-wide transcriptomics, e.g., in the identification
of functional elements in the human genome. Due to limitations of
existing methods for the data analysis of this data, a novel
statistical approach is presented that identifies expressed segments
as significant differences from the background distribution and thus
avoids dataset-specific parameters. This method detects differentially
expressed segments in biological data with significantly lower false
discovery rates and equivalent sensitivities compared to commonly used
methods. In addition, it is also clearly superior in the recovery of
exon-intron structures. Moreover, the search for local accumulations
of expressed segments in tiling array data has led to the
identification of very large expressed regions that may constitute a
new class of macroRNAs.
This thesis proceeds with next-generation sequencing for which various
protocols have been devised to study genomic, transcriptomic, and
epigenomic features. One of the first crucial steps in most NGS data
analyses is the mapping of sequencing reads to a reference
genome. This work introduces algorithmic methods to solve the mapping
tasks for three major NGS protocols: DNA-seq, RNA-seq, and
MethylC-seq. All methods have been thoroughly benchmarked and
integrated into the segemehl mapping suite.
First, mapping of DNA-seq data is facilitated by the core mapping
algorithm of segemehl. Since the initial publication, it has been
continuously updated and expanded. Here, extensive and reproducible
benchmarks are presented that compare segemehl to state-of-the-art
read aligners on various data sets. The results indicate that it is
not only more sensitive in finding the optimal alignment with respect
to the unit edit distance but also very specific compared to most
commonly used alternative read mappers. These advantages are
observable for both real and simulated reads, are largely independent
of the read length and sequencing technology, but come at the cost of
higher running time and memory consumption.
Second, the split-read extension of segemehl, presented by Hoffmann,
enables the mapping of RNA-seq data, a computationally more difficult
form of the mapping task due to the occurrence of splicing. Here, the
novel tool lack is presented, which aims to recover missed RNA-seq
read alignments using de novo splice junction information. It
performs very well in benchmarks and may thus be a beneficial
extension to RNA-seq analysis pipelines.
Third, a novel method is introduced that facilitates the mapping of
bisulfite-treated sequencing data. This protocol is considered the
gold standard in genome-wide studies of DNA methylation, one of the
major epigenetic modifications in animals and plants. The treatment of
DNA with sodium bisulfite selectively converts unmethylated cytosines
to uracils, while methylated ones remain unchanged. The bisulfite
extension developed here performs seed searches on a collapsed
alphabet followed by bisulfite-sensitive dynamic programming
alignments. Thus, it is insensitive to bisulfite-related mismatches
and does not rely on post-processing, in contrast to other methods. In
comparison to state-of-the-art tools, this method achieves
significantly higher sensitivities and performs time-competitive in
mapping millions of sequencing reads to vertebrate
genomes. Remarkably, the increase in sensitivity does not come at the
cost of decreased specificity and thus may finally result in a better
performance in calling the methylation rate.
Lastly, the potential of mapping strategies for de novo genome
assemblies is demonstrated with the introduction of a new guided
assembly procedure. It incorporates mapping as major component and
uses the additional information (e.g., annotation) as guide. With this
method, the complete mitochondrial genome of Eulimnogammarus verrucosus has been
successfully assembled even though the sequencing library has been
heavily dominated by nuclear DNA.
In summary, this thesis introduces algorithmic methods that
significantly improve the analysis of tiling array, DNA-seq, RNA-seq,
and MethylC-seq data, and proposes standards for benchmarking NGS read
aligners. Moreover, it presents a new guided assembly procedure that
has been successfully applied in the de novo assembly of a
crustacean mitogenome.Diese Arbeit befasst sich mit der Entwicklung und dem Benchmarken von
Verfahren zur Analyse von Daten aus Hochdurchsatz-Technologien, wie
Tiling Arrays oder Hochdurchsatz-Sequenzierung. Tiling Arrays bildeten
lange Zeit die Grundlage fĂŒr die genomweite Untersuchung des
Transkriptoms und kamen beispielsweise bei der Identifizierung
funktioneller Elemente im menschlichen Genom zum Einsatz. In dieser
Arbeit wird ein neues statistisches Verfahren zur Auswertung von
Tiling Array-Daten vorgestellt. Darin werden Segmente als exprimiert
klassifiziert, wenn sich deren Signale signifikant von der
Hintergrundverteilung unterscheiden. Dadurch werden keine auf den
Datensatz abgestimmten Parameterwerte benötigt. Die hier
vorgestellte Methode erkennt differentiell exprimierte Segmente in
biologischen Daten bei gleicher SensitivitÀt mit geringerer
Falsch-Positiv-Rate im Vergleich zu den derzeit hauptsÀchlich
eingesetzten Verfahren. Zudem ist die Methode bei der Erkennung von
Exon-Intron Grenzen prÀziser. Die Suche nach AnhÀufungen
exprimierter Segmente hat darĂŒber hinaus zur Entdeckung von sehr
langen Regionen gefĂŒhrt, welche möglicherweise eine neue
Klasse von macroRNAs darstellen.
Nach dem Exkurs zu Tiling Arrays konzentriert sich diese Arbeit nun
auf die Hochdurchsatz-Sequenzierung, fĂŒr die bereits verschiedene
Sequenzierungsprotokolle zur Untersuchungen des Genoms, Transkriptoms
und Epigenoms etabliert sind. Einer der ersten und entscheidenden
Schritte in der Analyse von Sequenzierungsdaten stellt in den meisten
FĂ€llen das Mappen dar, bei dem kurze Sequenzen (Reads) auf ein
groĂes Referenzgenom aligniert werden. Die vorliegende Arbeit
stellt algorithmische Methoden vor, welche das Mapping-Problem fĂŒr
drei wichtige Sequenzierungsprotokolle (DNA-Seq, RNA-Seq und
MethylC-Seq) lösen. Alle Methoden wurden ausfĂŒhrlichen
Benchmarks unterzogen und sind in der segemehl-Suite integriert.
Als Erstes wird hier der Kern-Algorithmus von segemehl vorgestellt,
welcher das Mappen von DNA-Sequenzierungsdaten ermöglicht. Seit
der ersten Veröffentlichung wurde dieser kontinuierlich optimiert
und erweitert. In dieser Arbeit werden umfangreiche und auf
Reproduzierbarkeit bedachte Benchmarks prÀsentiert, in denen
segemehl auf zahlreichen DatensÀtzen mit bekannten
Mapping-Programmen verglichen wird. Die Ergebnisse zeigen, dass
segemehl nicht nur sensitiver im Auffinden von optimalen Alignments
bezĂŒglich der Editierdistanz sondern auch sehr spezifisch im
Vergleich zu anderen Methoden ist. Diese Vorteile sind in realen und
simulierten Daten unabhÀngig von der Sequenzierungstechnologie
oder der LĂ€nge der Reads erkennbar, gehen aber zu Lasten einer
lÀngeren Laufzeit und eines höheren Speicherverbrauchs.
Als Zweites wird das Mappen von RNA-Sequenzierungsdaten untersucht,
welches bereits von der Split-Read-Erweiterung von segemehl
unterstĂŒtzt wird. Aufgrund von SpleiĂen ist diese Form des
Mapping-Problems rechnerisch aufwendiger. In dieser Arbeit wird das
neue Programm lack vorgestellt, welches darauf abzielt, fehlende
Read-Alignments mit Hilfe von de novo SpleiĂ-Information zu
finden. Es erzielt hervorragende Ergebnisse und stellt somit eine
sinnvolle ErgĂ€nzung zu Analyse-Pipelines fĂŒr
RNA-Sequenzierungsdaten dar.
Als Drittes wird eine neue Methode zum Mappen von Bisulfit-behandelte
Sequenzierungsdaten vorgestellt. Dieses Protokoll gilt als
Goldstandard in der genomweiten Untersuchung der DNA-Methylierung,
einer der wichtigsten epigenetischen Modifikationen in Tieren und
Pflanzen. Dabei wird die DNA vor der Sequenzierung mit Natriumbisulfit
behandelt, welches selektiv nicht methylierte Cytosine zu Uracilen
konvertiert, wĂ€hrend Methylcytosine davon unberĂŒhrt
bleiben. Die hier vorgestellte Bisulfit-Erweiterung fĂŒhrt die
Seed-Suche auf einem reduziertem Alphabet durch und verifiziert die
erhaltenen Treffer mit einem auf dynamischer Programmierung
basierenden Bisulfit-sensitiven Alignment-Algorithmus. Das verwendete
Verfahren ist somit unempfindlich gegenĂŒber
Bisulfit-Konvertierungen und erfordert im Gegensatz zu anderen
Verfahren keine weitere Nachverarbeitung. Im Vergleich zu aktuell
eingesetzten Programmen ist die Methode sensitiver und benötigt
eine vergleichbare Laufzeit beim Mappen von Millionen von Reads auf
groĂe Genome. Bemerkenswerterweise wird die erhöhte
SensitivitÀt bei gleichbleibend guter SpezifizitÀt
erreicht. Dadurch könnte diese Methode somit auch bessere
Ergebnisse bei der prÀzisen Bestimmung der Methylierungsraten
erreichen.
SchlieĂlich wird noch das Potential von Mapping-Strategien fĂŒr
Assemblierungen mit der EinfĂŒhrung eines neuen,
Kristallisation-genanntes Verfahren zur unterstĂŒtzten
Assemblierung aufgezeigt. Es enthÀlt Mapping als Hauptbestandteil
und nutzt Zusatzinformation (z.B. Annotationen) als
UnterstĂŒtzung. Dieses Verfahren ermöglichte die erfolgreiche
Assemblierung des kompletten mitochondrialen Genoms von Eulimnogammarus verrucosus trotz
einer vorwiegend aus nukleÀrer DNA bestehenden genomischen
Bibliothek.
Zusammenfassend stellt diese Arbeit algorithmische Methoden vor,
welche die Analysen von Tiling Array, DNA-Seq, RNA-Seq und MethylC-Seq
Daten signifikant verbessern. Es werden zudem Standards fĂŒr den
Vergleich von Programmen zum Mappen von Daten der
Hochdurchsatz-Sequenzierung vorgeschlagen. DarĂŒber hinaus wird ein
neues Verfahren zur unterstĂŒtzten Genom-Assemblierung vorgestellt,
welches erfolgreich bei der de novo-Assemblierung eines
mitochondrialen Krustentier-Genoms eingesetzt wurde
Dissecting multiple sequence alignment methods : the analysis, design and development of generic multiple sequence alignment components in SeqAn
Multiple sequence alignments are an indispensable tool in bioinformatics. Many applications rely on accurate multiple alignments, including protein structure prediction, phylogeny and the modeling of binding sites. In this thesis we dissected and analyzed the crucial algorithms and data structures required to construct such a multiple alignment. Based upon that dissection, we present a novel graph-based multiple sequence alignment program and a new method for multi-read alignments occurring in assembly projects. The advantage of the graph-based alignment is that a single vertex can represent a single character, a large segment or even an abstract entity such as a gene. This gives rise to the opportunity to apply the consistencybased progressive alignment paradigm to alignments of genomic sequences. The proposed multi-read alignment method outperforms similar methods in terms of alignment quality and it is apparently one of the first methods that can readily be used for insert sequencing. An important aspect of this thesis was the design, the development and the integration of the essential multiple sequence alignment components in the SeqAn library. SeqAn is a software library for sequence analysis that provides the core algorithmic components required to analyze large-scale sequence data. SeqAn aims at bridging the current gap between algorithm theory and available practical implementations in bioinformatics. Hence, we always describe in conjunction to the theoretical development of the methods, the actual implementation of the data structures and algorithms in order to strengthen the use of SeqAn as an experimental platform for rapidly developing and testing applications. All presented methods are part of the open source SeqAn library that can be downloaded from our website, www.seqan.de
IMPROVING BWA-MEM WITH GPU PARALLEL COMPUTING
Due to the many advances made in designing algorithms, especially the ones used in bioinformatics, it is becoming harder and harder to improve their efficiencies. Therefore, hardware acceleration using General-Purpose computing on Graphics Processing Unit has become a popular choice. BWA-MEM is an important part of the BWA software package for sequence mapping. Because of its high speed and accuracy, we choose to parallelize the popular short DNA sequence mapper. BWA has been a prevalent single node tool in genome alignment, and it has been widely studied for acceleration for a long time since the first version of the BWA package came out. This thesis presents the Big Data GPGPU distributed BWA-MEM, a tool that combines GPGPU acceleration and distributed computing. The four hardware parallelization techniques used are CPU multi-threading, GPU paralleled, CPU distributed, and GPU distributed. The GPGPU distributed software typically outperforms other parallelization versions. The alignment is performed on a distributed network, and each node in the network executes a separate GPGPU paralleled version of the software. We parallelize the chain2aln function in three levels. In Level 1, the function ksw\_extend2, an algorithm based on Smith-Waterman, is parallelized to handle extension on one side of the seed. In Level 2, the function chain2aln is parallelized to handle chain extension, where all seeds within the same chain are extended. In Level 3, part of the function mem\_align1\_core is parallelized for extending multiple chains. Due to the program's complexity, the parallelization work was limited at the GPU version of ksw\_extend2 parallelization Level 3. However, we have successfully combined Spark with BWA-MEM and ksw\_extend2 at parallelization Level 1, which has shown that the proposed framework is possible. The paralleled Level 3 GPU version of ksw\_extend2 demonstrated noticeable speed improvement with the test data set
Third-generation RNA-sequencing analysis : graph alignment and transcript assembly with long reads
The information contained in the genome of an organism, its DNA, is expressed through transcription of its genes to RNA, in quantities determined by many internal and external factors. As such, studying the gene expression can give valuable information for e.g. clinical diagnostics.
A common analysis workflow of RNA-sequencing (RNA-seq) data consists of mapping the sequencing reads to a reference genome, followed by the transcript assembly and quantification based on these alignments. The advent of second-generation sequencing revolutionized the field by reducing the sequencing costs by 50,000-fold. Now another revolution is imminent with the third-generation sequencing platforms producing an order of magnitude higher read lengths. However, higher error rate, higher cost and lower throughput compared to the second-generation sequencing bring their own challenges. To compensate for the low throughput and high cost, hybrid approaches using both short second-generation and long third-generation reads have gathered recent interest.
The first part of this thesis focuses on the analysis of short-read RNA-seq data. As short-read mapping is an already well-researched field, we focus on giving a literature review of the topic. For transcript assembly we propose a novel (at the time of the publication) approach of using minimum-cost flows to solve the problem of covering a graph created from the read alignments with a set of paths with the minimum cost, under some cost model. Various network-flow-based solutions were proposed in parallel to, as well as after, ours.
The second part, where the main contributions of this thesis lie, focuses on the analysis of long-read RNA-seq data. The driving point of our research has been the Minimum Path Cover with Subpath Constraints (MPC-SC) model, where transcript assembly is modeled as a minimum path cover problem, with the addition that each of the chains of exons (subpath constraints) created from the long reads must be completely contained in a solution path. In addition to implementing this concept, we experimentally studied different approaches on how to find the exon chains in practice. The evaluated approaches included aligning the long reads to a graph created from short read alignments instead of the reference genome, which led to our final contribution: extending a co-linear chaining algorithm from between two sequences to between a sequence and a directed acyclic graph.Transkriptiossa organismin geenien mallin mukaan luodaan RNA-molekyyleja. Lukuisat tekijÀt, sekÀ solun sisÀiset ettÀ ulkoiset, mÀÀrittÀvÀt mitÀ geenejÀ transkriptoidaan, ja missÀ mÀÀrin. TÀmÀn prosessin tutkiminen antaa arvokasta tietoa esimerkiksi lÀÀketieteelliseen diagnostiikkaan.
Yksi yleisistÀ RNA-sekvensointidatan analyysitavoista koostuu kolmesta osasta: lukujaksojen (read sequences) linjaus referenssigenomiin, transkriptien kokoaminen, ja transkriptien ekspressiotasojen mÀÀrittÀminen. Toisen sukupolven sekvensointiteknologian kehityksen myötÀ sekvensoinnin hinta laski huomattavasti, mikÀ salli RNA-sekvensointidatan kÀytön yhÀ useampaan tarkoitukseen. Nyt kolmannen sukupolven sekvensointiteknologiat tarjoavat kertaluokkaa pidempiÀ lukujaksoja, mikÀ laajentaa analysointimahdollisuuksia. Kuitenkin suurempi virhemÀÀrÀ, korkeampi hinta ja pienempi mÀÀrÀ tuotettua dataa tuovat omat haasteensa. Toisen ja kolmannen sukupolven teknologioiden kÀyttÀminen yhdessÀ, ns. hybridilÀhestymistapa, on tutkimussuunta joka on kerÀnnyt paljon kiinnostusta viimeaikoina.
TÀmÀn tutkielman ensimmÀinen osa keskittyy toisen sukupolven, eli ns. lyhyiden RNA-lukujaksojen (short read), analyysiin. NÀiden lyhyiden lukujaksojen linjausta referenssigenomiin on tutkittu jo 2000-luvulla, joten tÀllÀ alueella keskitymme olemassaolevaan kirjallisuuteen. Transkriptien kokoamisen alalta esittelemme metodin, joka kÀyttÀÀ vÀhimmÀiskustannusvirtauksen (minimum-cost flow) mallia. VÀhimmÀiskustannusvirtauksen mallissa lukujaksoista luotu verkko peitetÀÀn joukolla polkuja, joiden kustannus on pienin mahdollinen. Virtausmalleja on kÀytetty myös muiden tutkijoiden kehittÀmissÀ analyysityökaluissa.
TÀmÀn tutkielman suurin kontribuutio on toisessa osassa, joka keskittyy ns. pitkien RNA-lukujaksojen (long read) analysointiin. Tutkimuksemme lÀhtökohtana on ollut malli, jossa pienimmÀn polkupeitteen (Minimum Path Cover) ongelmaan lisÀtÀÀn alipolkurajoitus (subpath constraint). Jokainen alipolkurajoitus vastaa eksoniketjua (exon chain), jotka jokin pitkÀ lukujakso peittÀÀ, ja jokaisen alipolkurajoituksen tÀytyy sisÀltyÀ kokonaan johonkin polkupeitteen polkuun. TÀmÀn konseptin toteuttamisen lisÀksi testasimme kokeellisesti erilaisia lÀhestymistapoja eksoniketjujen löytÀmiseksi. NÀihin testattaviin lÀhestymistapoihin kuului pitkien lukujaksojen linjaaminen suoraan lyhyistÀ lukujaksoista luotuun verkkoon referenssigenomin sijaan. TÀmÀ lÀhestymistapa johti tÀmÀn tutkielman viimeiseen kontribuutioon: kolineaarisen ketjun (co-linear chaining) algoritmin yleistÀminen kahden sekvenssin sijasta sekvenssiin ja suunnattuun syklittömÀÀn verkkoon
- âŠ