University of Zagreb. Faculty of Electrical Engineering and Computing.
Abstract
U ovom radu, implementiran je de novo asembler transkriptoma koji je baziran na preklapanje-razmještaj-konsenzus paradigmi. Napisan je u programskom jeziku C++ i imenovan je Ra što je skraćeno od RNA asembler. Faza preklapanja bazirana je na poboljšanom sufiksnom polju i reproducira egzaktna preklapanja između ulaznih očitanja. Faza razmještaja koristi nekoliko metoda za pojednostavljenje grafova koji su izgrađeni nad očitanjima i njihovim preklapanjima. Kako faza preklapanja pronalazi samo egzaktne parove, trenutno ne postoji potreba za fazom konsenzusa ali ona je svejedno dio implementacije i bazirana je na partial order alignment algoritmu. Provedeni testovi upućuju da su Ra asembleru potrebna dodatna poboljšanja kako bi mogao konkurirati drugim asemblerima transkriptoma. Izvorni kod dostupan je na https://github.com/rvaser/ra.In this thesis, a de novo transcriptome assembler was implemented based on the overlap-layout-consensus paradigm. It was written in the C++ programming language and was named Ra which is short for RNA assembler. Its overlap phase relies on the enhanced suffix arrays and reproduces exact overlaps between input reads. The layout phase uses several methods for graph simplification which includes trimming and bubble popping. Due to the exact overlap phase there is no need for a consensus phase at this moment but there exists one which is based on the partial order alignment algorithm. Conducted tests have shown that Ra needs improvements to compete with other transcriptome assemblers. Source code is available at https://github.com/rvaser/ra