Search CORE

15 research outputs found

RECORD: Reference-Assisted Genome Assembly for Closely Related Genomes

Author: Buza Krisztián Antal
Dojer Norbert
Wilczynski Bartek
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software

Directory of Open Access Journals

PubMed Central

Repository of the Academy's Library

Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs

Author: Dojer Norbert
Patelak Mateusz
Tiuryn Jerzy
Wilczynski Bartek
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult. Results We develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms. Conclusion We show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Applying dynamic Bayesian networks to perturbed gene expression data

Author: Dojer Norbert
Gambin Anna
Mizera Andrzej
Tiuryn Jerzy
Wilczyński Bartek
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. RESULTS: We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. CONCLUSION: We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Repository and Bibliography - Luxembourg

Recommended from our members

Nucleotide-resolution DNA double-strand breaks mapping by next-generation sequencing

Author: Bienko Magda
Chiarle Roberto
Crosetto Nicola
Dikic Ivan
Dojer Norbert
Ginalski Krzysztof
Karaca Elif
Mitra Abhishek
Pasero Philippe
Rowicka Maga
Silva Maria Joao
Skrzypczak Magdalena
Wang Qi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/03/2014
Field of study

We present a genome-wide method to map DNA double-strand breaks (DSBs) at nucleotide resolution by direct in situ breaks labeling, enrichment on streptavidin, and next-generation sequencing (BLESS). We comprehensively validated and tested BLESS using different human and mouse cells, DSBs-inducing agents, and sequencing platforms. BLESS was able to detect telomere ends, Sce endonuclease-induced DSBs, and complex genome-wide DSBs landscapes. As a proof of principle, we characterized the genomic landscape of sensitivity to replication stress in human cells, and identified over two thousand non-uniformly distributed aphidicolin-sensitive regions (ASRs) overrepresented in genes and enriched in satellite repeats. ASRs were also enriched in regions rearranged in human cancers, with many cancer-associated genes exhibiting high sensitivity to replication stress. Our method is suitable for genome-wide mapping of DSBs in various cells and experimental conditions with a specificity and resolution unachievable by current techniques

Harvard University - DASH

Comparative analysis of cis-regulation following stroke and seizures in subspaces of conserved eigensystems

Crossref

Springer - Publisher Connector

PubMed Central

Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data

Author: A Jolma
A Mathelier
AE Kel
B Wilczynski
Bartek Wilczynski
BE Bernstein
Bozena Kaminska
E Portales-Casamar
EP Xing
G Badis
GZ Hertz
I Krystkowiak
IV Kulakovskiy
Izabella Krystkowiak
JV Turatsinze
K Cartharius
K Cartharius
K Quandt
L Yang
M Pachkov
MF Berger
Michal Dabrowski
Norbert Dojer
P Flicek
PJA Cock
R Pique-Regi
R Worsley Hunt
S Rahmann
T Kaplan
TD Schneider
U Mudunuri
V Matys
X Xie
Y Zhao
Y Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

MSARC: Multiple sequence alignment by residue clustering

Author: A Löytynoja
AR Subramanian
AR Subramanian
B Hendrickson
BD Redelings
C Notredame
CB Do
CM Fiduccia
F Sievers
G Lunter
GH Gonnet
J Kececioglu
JD Thompson
JD Thompson
K Katoh
K Liu
KM Wong
Michał Modzelewski
Norbert Dojer
O Gotoh
RC Edgar
RK Bradley
S Guindon
S Miyazawa
SME Sahraeian
U Mückstein
Y Liu
YK Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Applying dynamic bayesian networks to perturbed gene expression data

Author: Anna Gambin
Jerzy Tiuryn
Norbert Dojer
Publication venue
Publication date
Field of study

Abstract Motivation: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the object of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks apply to time series microarray data. Results: We extend the framework of dynamic Bayesian networks in order to handle perturbations. A new discretization method, specialized for datasets from time series perturbations experiments, is also introduced. We compare networks inferred from realistic simulations data by our method and by dynamic Bayesian networks learning techniques. We conclude that application of our method substantially improves inferring. 1 Introduction As most genetic regulatory systems involve many components connected through complex networks of interactions, formal methods and computer tools for modeling and simulating are needed. Therefore, various formalisms were proposed to describe genetic regulatory systems, including Boolean networks and their generalizations, ordinary and partial differential equations, stochastic equations and Bayesian networks (see [4] for a review). While differential and stochastic equations describe the biophysical processes at a very refined level of detail and prove useful in simulations of well studied systems, Bayesian networks appear attractive in the field of inferring the regulatory network structure from gene expression data. The reason is that their learning techniques have solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way

CiteSeerX

Learning Bayesian networks from datasets joining continuous and discrete variables

Author: Beer
Bonn
Cooper
Cowell
Dabrowski
de Campos
Dojer
Dojer
Dojer
Elidan
Friedman
Heckerman
Imoto
Lam
Lauritzen
McGeachie
Monti
Norbert Dojer
Schwarz
Smith
Suzuki
Wilczynski
Wilczyński
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref