Search CORE

12 research outputs found

Genome Assembly, from Practice to Theory: Safe, Complete and Linear-Time

Author: Cairo Massimo
Rizzi Romeo
Tomescu Alexandru
Zirondelli Elia C.
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 08/11/2020
Field of study

Peer reviewe

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Catalogo dei prodotti della ricerca

Helsingin yliopiston digitaalinen arkisto

A safe and complete algorithm for metagenomic assembly

Author: A Schrijver
Alexandru I. Tomescu
B Haider
C Kingsford
D Eppstein
DR Zerbino
E Boros
E Kapun
EW Myers
FM Pajouh
G Narzisi
GF Italiano
GW Tyson
HN Gabow
IP Lysov
J Butler
J Laserson
J Qin
JC Venter
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
JT Simpson
K Cechlárová
K-M Chao
M Costa
M Crochemore
M Vingron
M Vingron
MC Costa
N Nagarajan
N Nagarajan
Nidia Obscura Acosta
P Medvedev
P Medvedev
P Veiga
PA Pevzner
PJ Turnbaugh
R Li
R Zenklusen
RM Idury
S Boisvert
S Koren
T Namiki
V Lacko
V Mäkinen
Veli Mäkinen
Y Peng
Y Peng
Z Iqbal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Safety in s-t Paths, Trails and Walks

Author: Cairo Massimo
Khan Shahbaz
Rizzi Romeo
Schmidt Sebastian Stefan
Tomescu Alexandru
Publication venue
Publication date: 17/07/2020
Field of study

Given a directed graph G and a pair of nodes s and t, an s-t bridge of G is an edge whose removal breaks all s-t paths of G (and thus appears in all s-t paths). Computing all s-t bridges of G is a basic graph problem, solvable in linear time. In this paper, we consider a natural generalisation of this problem, with the notion of “safety” from bioinformatics. We say that a walk W is safe with respect to a set W' of s-t walks, if W is a subwalk of all walks in W'. We start by considering the maximal safe walks when consists of: all s-t paths, all s-t trails, or all s-t walks of G. We show that the solutions for the first two problems immediately follow from finding all s-t bridges after incorporating simple characterisations. However, solving the third problem requires non-trivial techniques for incorporating its characterisation. In particular, we show that there exists a compact representation computable in linear time, that allows outputting all maximal safe walks in time linear in their length. Our solutions also directly extend to multigraphs, except for the second problem, which requires a more involved approach. We further generalise these problems, by assuming that safety is defined only with respect to a subset of visible edges. Here we prove a dichotomy between the s-t paths and s-t trails cases, and the s-t walks case: the former two are NP-hard, while the latter is solvable with the same complexity as when all edges are visible. We also show that the same complexity results hold for the analogous generalisations of s-t articulation points (nodes appearing in all s-t paths). We thus obtain the best possible results for natural “safety”-generalisations of these two fundamental graph problems. Moreover, our algorithms are simple and do not employ any complex data structures, making them ideal for use in practice.Peer reviewe

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Helsingin yliopiston digitaalinen arkisto

Recent advances in inferring viral diversity from high-throughput sequencing data

Author: Beerenwinkel Niko
Posada-Cespedes Susana
Seifert David
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

Repository for Publications and Research Data

Elsevier - Publisher Connector

Novel computational techniques for mapping and classifying Next-Generation Sequencing data

Author: Břinda Karel
Publication venue
Publication date
Field of study

Since their emergence around 2006, Next-Generation Sequencing technologies have been revolutionizing biological and medical research. Quickly obtaining an extensive amount of short or long reads of DNA sequence from almost any biological sample enables detecting genomic variants, revealing the composition of species in a metagenome, deciphering cancer biology, decoding the evolution of living or extinct species, or understanding human migration patterns and human history in general. The pace at which the throughput of sequencing technologies is increasing surpasses the growth of storage and computer capacities, which creates new computational challenges in NGS data processing. In this thesis, we present novel computational techniques for read mapping and taxonomic classification. With more than a hundred of published mappers, read mapping might be considered fully solved. However, the vast majority of mappers follow the same paradigm and only little attention has been paid to non-standard mapping approaches. Here, we propound the so-called dynamic mapping that we show to significantly improve the resulting alignments compared to traditional mapping approaches. Dynamic mapping is based on exploiting the information from previously computed alignments, helping to improve the mapping of subsequent reads. We provide the first comprehensive overview of this method and demonstrate its qualities using Dynamic Mapping Simulator, a pipeline that compares various dynamic mapping scenarios to static mapping and iterative referencing. An important component of a dynamic mapper is an online consensus caller, i.e., a program collecting alignment statistics and guiding updates of the reference in the online fashion. We provide Ococo, the first online consensus caller that implements a smart statistics for individual genomic positions using compact bit counters. Beyond its application to dynamic mapping, Ococo can be employed as an online SNP caller in various analysis pipelines, enabling SNP calling from a stream without saving the alignments on disk. Metagenomic classification of NGS reads is another major topic studied in the thesis. Having a database with thousands of reference genomes placed on a taxonomic tree, the task is to rapidly assign a huge amount of NGS reads to tree nodes, and possibly estimate the relative abundance of involved species. In this thesis, we propose improved computational techniques for this task. In a series of experiments, we show that spaced seeds consistently improve the classification accuracy. We provide Seed-Kraken, a spaced seed extension of Kraken, the most popular classifier at present. Furthermore, we suggest ProPhyle, a new indexing strategy based on a BWT-index, obtaining a much smaller and more informative index compared to Kraken. We provide a modified version of BWA that improves the BWT-index for a quick k-mer look-up

ZENODO

MetaFlow : Metagenomics taxonomic analysis using network flows

Author: Sobih Ahmed
Publication venue: Helsingin yliopisto
Publication date: 01/01/2016
Field of study

Our planet is pervaded by hundreds of millions of microorganisms that are not visible to the naked eye. These microorganisms, also known as microbes, include bacteria, archaea, fungi, protists and viruses. Metagenomics allows for the study of microbial samples collected directly from the environment without prior culturing. A crucial step in metagenomics analysis is to unveil the structure of the microbial community in a specific environment; this step is called metagenomics taxonomic analysis (or community profiling). In this thesis we explain what is metagenomics taxonomic analysis, why it is important, and we present MetaFlow, a new tool for solving the metagenomics community profiling problem using high-throughput sequencing data. MetaFlow estimates the richness and the abundances at species taxonomic rank, based on coverage analysis across entire genomes, and it is the first method to apply network flows to solve this problem. Experiments showed that MetaFlow is more sensitive and precise than popular tools such as MetaPhlAn and mOTU, and its abundance estimation is better by 2-4 times. MetaFlow is available at https://github.com/alexandrutomescu/

Helsingin yliopiston digitaalinen arkisto

Konak-patojen protein etkileşiminin hesaplamalı yöntemler ile tahmini

Author: Kösesoy İrfan
Publication venue: 'Sakarya Universitesi Ilahiyat Fakultesi Dergisi'
Publication date: 01/01/2018
Field of study

06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Türler arası patojen-konak protein etkileşimlerin bilinmesi enfeksiyonel hastalıkların teşhis ve tedavisi için geliştirilecek çözüm stratejileri açısından hayati öneme sahiptir. Etkileşim tespitinde kullanılan deneysel yöntemlerin maliyetli olması ve uzun zaman almasından dolayı proteinler arası etkileşimlerin modellendiği hesaplamalı yöntemlerin bu alanda önemli bir yeri vardır. Hesaplamalı yöntemler, tespit süresinin kısaltılması ve maliyetin düşürülmesine ek olarak deneysel yöntemlerle yanlış tespit edilen etkileşimlerin kontrolünde de kullanılmaktadır. Veri seyrekliği, veri yetersizliği ve doğrulanmış negatif veri setinin olmaması, patojen-konak protein etkileşim tahmini için kullanılan hesaplamalı yöntemlerin ortak problemidir. Bu çalışmada amaç patojen-konak etkileşim tahmin doğruluğunu arttırmak ve veri yetersizliğinden kaynaklanan olumsuzlukları gidermektir. Bu kapsamda genişletilmiş ağ modeli ve lokasyon tabanlı kodlama yöntemleri önerildi. Genişletilmiş ağ modeli türler arası yeterli etkileşim verisinin olmadığı patojen konak etkileşimleri ile patojen ve konak proteinlere ait tür içi etkileşimlerin entegre edilmesi tahmin doğruluğunu arttırır hipotezinden esinlenerek geliştirildi. Lokasyon tabanlı kodlama, proteinlerin amino asit diziliminin kodlandığı bir öznitelik çıkarım yöntemidir. Makine öğrenmesi algoritmalarında patojen konak etkileşim tahmininde başarımı etkileyen faktörlerden biri kullanılan özniteliklerdir. Biyolojik veri tabanlarında proteinlere ait en fazla veri amino asit dizilim bilgisidir. Sadece amino asit dizilimini baz alarak geliştirilen güçlü bir öznitelik çıkarım yöntemi, patojen konak etkileşim tahmin doğruluğunu arttıracaktır. Ayrıca amino asit dizilim bilgisinin kullanılması sayesinde bilinen tüm etkileşimler için öznitelik vektörlerinin daha kolay çıkarılması sağlanır. Tezde protein kodlama ve protein etkileşim tahmini üzerine çalışan araştırmacıların kullanılabileceği, ücretsiz erişilebilen, kullanıcı dostu bir ara yüze sahip web tabanlı PROSES (Protein Sequencebased encoding system) yazılımı geliştirildi. Yazılım özellikle programlama bilgisi olmayan kişiler için faydalıdır. PROSES şu anda Yalova Üniversitesi web sunucusunda yer alan http://proses.yalova.edu.tr adresinde kullanılmaktadır.Knowledge of the pathogen-host protein interactions in the inter species has a vital prospect for a solution strategy to be developed against diagnosis and treatment of infectious diseases. Modeling interactions between proteins has necessitated the development of computational methods in this field, since detection of interactions by experimental methods is both time-consuming and costly. Computational methods are used in decreasing of the detection time and cost; in addition checking of the false detected interactions via experimental methods. Data scarcity, data inadequacy, and negative data sampling are the common problems of computational methods for used in prediction of pathogen-host protein interaction. In this study, the purpose is that prediction accuracy of the pathogen-host interaction increase and negativeness eliminate because of data inadequacy. Within thisframework, extended network model and location based encoding approaches are proposed. Firstly, the extended network model is created by inspired from the hypothesis of that integrating the known protein interactions within host and pathogen organisms improve the success of prediction of unknown pathogen-host interactions. Secondly, location based encoding is feature extraction method which is used for encoding of amino acid sequences. One of the important factors is feature which affects success in prediction of pathogen-host interaction within machine learning algorithms. In biological databases, the most data is the information of amino acid sequence regarding proteins. Prediction accuracy of pathogen-host interaction will be increased by that a robust feature extraction method is developed on the basis amino acidsequence. Furthermore, extraction of feature vectors for all the known interactions are provided in easier way by the sake of using the information of amino acid sequence. In this thesis, PROSES (Protein SequencebasedEncodingSystem) which is a user-friendly interface and freely accessible web server, has been designed for researchers, who are working on the field of protein encoding and prediction of protein interaction. The web server is especially useful for those who are not familiar with programming languages. PROSES is currently being used at http://proses.yalova.edu.tr which is storedin the web server of Yalova University

Sakarya Üniversitesi Kurumsal Açık Akademik Arşivi

Emerging therapies for acute myeloid leukaemia using hDHODH inhibitors able to restore in vitro and in vivo myeloid differentiation

Author: Al-Karadaghi Salam
Bonanni Davide
Boschi Donatella
Cignetti Alessandro
Circosta Paola
Gaidano Valentina
Giorgis Marta
Lolli Marco Lucio
Marraudino Marilena
Pippione Agnese Chiara
Saglio Giuseppe
Sainas Stefano
Publication venue
Publication date: 01/01/2019
Field of study

Institutional Research Information System University of Turin

Using MapReduce Streaming for Distributed Life Simulation on the Cloud

Author: Radenski Atanas
Publication venue: Chapman University Digital Commons
Publication date: 01/01/2013
Field of study

Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp

Chapman University Digital Commons

Handbook of Stemmatology

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

Stemmatology studies aspects of textual criticism that use genealogical methods. This handbook is the first to cover the entire field, encompassing both theoretical and practical aspects, ranging from traditional to digital methods. Authors from all the disciplines involved examine topics such as the material aspects of text traditions, methods of traditional textual criticism and their genesis, and modern digital approaches used in the field

OAPEN Library