Search CORE

7 research outputs found

Experiences with workflows for automating data-intensive bioinformatics

Author: Bongcam-Rudlof Erik
Carrasco Hernández Guillermo
Forer Lucas
Giovacchini Mario
Kallio Aleksi
Kanduła Maciej M
Korpelainen Eija
Krachunov Milko
Kreil David P.
Kulev Ognyan
Lampa Samuel
Pireddu Luca
Schönherr Sebastian
Siretskiy Alexey
Spjuth Ola
Valls Guimera Roman
Vassilev Dimitar
Łabaj Pavel P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.Pubblicat

Springer - Publisher Connector

P-arch

PubMed Central

Publikationsserver der Universitätsbibliothek Bodenkultur Wien

A Semi-Automated Approach for Anatomical Ontology Mapping

Author: Krachunov Milko
Petrov Peter
Vassilev Dimitar
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/06/2013
Field of study

This paper presents a study in the domain of semi-automated and fully-automated ontology mapping. A process for inferring additional cross-ontology links within the domain of anatomical ontologies is presented and evaluated on pairs from three model organisms. The results of experiments performed with various external knowledge sources and scoring schemes are discussed

Directory of Open Access Journals

Application of Machine Learning Models in Error and Variant Detection in High-Variation Genomics Datasets

Author: Dimitar Vassilev
Maria Nisheva
Milko Krachunov
Publication venue: 'MDPI AG'
Publication date: 01/11/2017
Field of study

For metagenomics datasets, datasets of complex polyploid genomes, and other high-variation genomics datasets, there are difficulties with the analysis, error detection and variant calling, stemming from the challenges of discerning sequencing errors from biological variation. Confirming base candidates with high frequency of occurrence is no longer a reliable measure because of the natural variation and the presence of rare bases. The paper discusses an approach to the application of machine learning models to classify bases into erroneous and rare variations after preselecting potential error candidates with a weighted frequency measure, which aims to focus on unexpected variations by using the inter-sequence pairwise similarity. Different similarity measures are used to account for different types of datasets. Four machine learning models are implemented and tested

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

In silico Prediction of C4-related Genes by Finding Duplications Causing Pattern Deviation and Comparative Analysis of Phylogenetic Trees

Author: Avdjieva Irena
Dimitar Vassilev
Milko Krachunov
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Publication date: 01/01/2018
Field of study

This study is focused on the development of a pattern-finding method for analyzing evolutionary trees to predict genes that may be involved in C4 photosynthesis. It relies on publicly available phylogenetic data which is processed with the authors’ own Python scripts and opensource software. The pattern recognition in the topology of the trees is an essential part of the process and the result is then validated by comparing the expression levels of the selected candidates. The same approach can be applied in studying the evolution of other important traits just by changing the type of pattern

Bulgarian Digital Mathematics Library at IMI-BAS

Machine Learning Models for Error Detection in Metagenomics and Polyploid Sequencing Data

Author: Dimitar Vassilev
Maria Nisheva
Milko Krachunov
Publication venue: 'MDPI AG'
Publication date: 01/03/2019
Field of study

Metagenomics studies, as well as genomics studies of polyploid species such as wheat, deal with the analysis of high variation data. Such data contain sequences from similar, but distinct genetic chains. This fact presents an obstacle to analysis and research. In particular, the detection of instrumentation errors during the digitalization of the sequences may be hindered, as they can be indistinguishable from the real biological variation inside the digital data. This can prevent the determination of the correct sequences, while at the same time make variant studies significantly more difficult. This paper details a collection of ML-based models used to distinguish a real variant from an erroneous one. The focus is on using this model directly, but experiments are also done in combination with other predictors that isolate a pool of error candidates

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Manageable Workflows for Processing Parallel Sequencing Data

Author: Krachunov Milko
Kulev Ognyan
Nisheva Maria
Simeonova Valeriya
Vassilev Dimitar
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Publication date: 01/01/2014
Field of study

ACM Computing Classification System (1998): D.2.11, D.1.3, D.3.1, J.3, C.2.4.Data analysis after parallel sequencing is a process that uses combinations of software tools that is often subject to experimentation and on-the-fly substitution, with the necessary file conversion. This article presents a developing system for creating and managing workflows aiding the tasks one encounters after parallel sequences, particularly in the area of metagenomics. The semantics, description language and software implementation aim to allow the creation of flexible, configurable workflows that are suitable for sharing and are easy to manipulate through software or by hand. The execution system design provides user-defined operations and interchangeability between an operation and a workflow. This allows significant extensibility, which can be further complemented with distributed computing and remote management interfaces

Bulgarian Digital Mathematics Library at IMI-BAS

A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models

Author: AA Margolin
AY Halevy
B Louie
BL Claus
Dimitar Vassilev
DR Zerbino
DS Vijayarani
E Rahm
F Chang
F Pedregosa
H Jiang
Iliyan Mihaylov
J Augen
Jeffrey D. Ullman
John D. Eblen
Joseph A. Cruz
K Feeney
L Dai
L-C Tranchevent
M Ashburner
M Francescatto
Maciej Kańduła
MH Galea
Milko Krachunov
Olivier Curé
P Groth
RJ Simes
S Dimitrieva
Sunil Gupta
T Catarci
TU Consortium
W Zhang
Z Lacroix
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref