Large-scale inference of gene gain and loss dynamics following gene duplication

Abstract

Genomi eukariota kao i prokariota kontinuirano prolaze kroz procese dobivanja i gubljenja gena. Procesi dobivanja gena mogu biti: duplikacija postojećeg gena u genomu, de novo evolucija iz nekodirajućih dijelova genoma ili horizontalni transfer gena iz jednog genoma u drugi. Uobičajen pristup razotkrivanju događaja dobitka ili gubitka gena u povijestima genoma je usporedba stabla gena i stabla vrsta (tree reconciliation). Problem s postojećim algoritmima za tu svrhu je što su im rezultati uvelike ovisni o ulaznim parametrima koji su neprecizni. U ovom radu izrađen je jednostavan neparametarski algoritam za određivanje duplikacija gena, mapiranje duplikacije na odgovarajuću granu u stablu vrsta, te određivanje broja gubitaka kopija koji su uslijedili nakon te duplikacije. Prednost našeg algoritma je što su mu jedini ulazni parametri ukorijenjena stabla gena i stablo vrsta. Koristeći algoritam analizirana je dinamika dobivanja i gubljenja gena u evoluciji genoma prokariota i eukariota. Rezultati upućuju na veliku razliku frekvencije horizontalnog transfera gena u evoluciji prokariotskih i eukariotskih genoma te sveukupnu rasprostranjenost događaja duplikacije i gubljenja gena u evoluciji.Gene turnover (gene gain and loss) is ever occurring process in genomes of both eukaryotes and prokaryotes. Forms of gene gain are: duplication of an existing gene in a genome, de novo evolution from noncoding regions of a genome or horizontal gene transfer from one genome to another. Uncovering gene gain and loss events in genomes’ histories is usually done by comparing gene trees with species trees, that is, tree reconciliation. A caveat in the existing reconciliation algorithms is that their resulting inference largely depends on the input parameters set by the user which can by themselves be very error-prone. Therefore, in this thesis, we developed a simple parameter-free algorithm for inferring duplication events, mapping them on the branches a rooted species tree and inferring losses that followed the inferred duplication event. Our algorithm only assumes a rooted species tree and rooted gene trees. Developed algorithm was used to analyze genome evolution in prokaryotes and eukaryotes. Obtained results suggest differences in horizontal gene transfer rates between prokaryotic genomes evolution and eukaryotic genomes evolution and overall prevalence of duplication and loss processes

    Similar works

    Full text

    thumbnail-image