Search CORE

31,051 research outputs found

Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences

Author: Minary Peter
Zenil Hector
Publication venue
Publication date: 16/10/2018
Field of study

We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint (high) nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that complexity indices are informative of nucleosome occupancy. We compare against the gold standard (Kaplan model) and find similar and complementary results with the main difference that our sequence complexity approach. For example, for high occupancy, complexity-based scores outperform the Kaplan model for predicting binding representing a significant advancement in predicting the highest nucleosome occupancy following a training-free approach.Comment: 8 pages main text (4 figures), 12 total with Supplementary (1 figure

arXiv.org e-Print Archive

Oxford University Research Archive

In the search for the low-complexity sequences in prokaryotic and eukaryotic genomes: how to derive a coherent picture from global and local entropy measures

Author: Acquisti Claudia
Allegrini Paolo
Bogani Patrizia
Buiatti Marcello
Catanese Elena
Fronzoni Leone
Grigolini Paolo
Mersi Giuseppe
Palatella Luigi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2003
Field of study

We investigate on a possible way to connect the presence of Low-Complexity Sequences (LCS) in DNA genomes and the nonstationary properties of base correlations. Under the hypothesis that these variations signal a change in the DNA function, we use a new technique, called Non-Stationarity Entropic Index (NSEI) method, and we prove that this technique is an efficient way to detect functional changes with respect to a random baseline. The remarkable aspect is that NSEI does not imply any training data or fitting parameter, the only arbitrarity being the choice of a marker in the sequence. We make this choice on the basis of biological information about LCS distributions in genomes. We show that there exists a correlation between changing the amount in LCS and the ratio of long- to short-range correlation

arXiv.org e-Print Archive

Crossref

UNT Digital Library

A backward procedure for change-point detection with applications to copy number variation detection

Author: Hao Ning
Shin Seung Jun
Wu Yichao
Publication venue
Publication date: 18/08/2019
Field of study

Change-point detection regains much attention recently for analyzing array or sequencing data for copy number variation (CNV) detection. In such applications, the true signals are typically very short and buried in the long data sequence, which makes it challenging to identify the variations efficiently and accurately. In this article, we propose a new change-point detection method, a backward procedure, which is not only fast and simple enough to exploit high-dimensional data but also performs very well for detecting short signals. Although motivated by CNV detection, the backward procedure is generally applicable to assorted change-point problems that arise in a variety of scientific applications. It is illustrated by both simulated and real CNV data that the backward detection has clear advantages over other competing methods especially when the true signal is short

arXiv.org e-Print Archive

Crossref

University of Illinois at Chicago: UIC INDIGO (INtellectual property in DIGital form available online in an Open environment)

Local Binary Patterns as a Feature Descriptor in Alignment-free Visualisation of Metagenomic Data

Author: Kouchaki Samaneh
Robertson David L.
Tapinos Avraam
Tirunagari Santosh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

Shotgun sequencing has facilitated the analysis of complex microbial communities. However, clustering and visualising these communities without prior taxonomic information is a major challenge. Feature descriptor methods can be utilised to extract these taxonomic relations from the data. Here, we present a novel approach consisting of local binary patterns (LBP) coupled with randomised singular value decomposition (RSVD) and Barnes-Hut t-stochastic neighbor embedding (BH-tSNE) to highlight the underlying taxonomic structure of the metagenomic data. The effectiveness of our approach is demonstrated using several simulated and a real metagenomic datasets

Crossref

Enlighten

Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing

Author: Shen Jeremy J.
Zhang Nancy R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We propose a flexible change-point model for inhomogeneous Poisson Processes, which arise naturally from next-generation DNA sequencing, and derive score and generalized likelihood statistics for shifts in intensity functions. We construct a modified Bayesian information criterion (mBIC) to guide model selection, and point-wise approximate Bayesian confidence intervals for assessing the confidence in the segmentation. The model is applied to DNA Copy Number profiling with sequencing data and evaluated on simulated spike-in and real data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS517 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Proceedings of the 1st Computer Science Student Workshop: Koc University Istinye Campus, Istanbul, Turkey, February 21, 2010

Author
Publication venue: Sabancı University
Publication date: 01/01/2010
Field of study

Sabanci University Research Database

Recommended from our members

Clinical metagenomics.

Author: Chiu Charles Y
Miller Steven A
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

Clinical metagenomic next-generation sequencing (mNGS), the comprehensive analysis of microbial and host genetic material (DNA and RNA) in samples from patients, is rapidly moving from research to clinical laboratories. This emerging approach is changing how physicians diagnose and treat infectious disease, with applications spanning a wide range of areas, including antimicrobial resistance, the microbiome, human host gene expression (transcriptomics) and oncology. Here, we focus on the challenges of implementing mNGS in the clinical laboratory and address potential solutions for maximizing its impact on patient care and public health

eScholarship - University of California

Identification of direct residue contacts in protein-protein interaction by message passing

Author: Altschuh
Atchley
Bent
Burger
Eddy
Fiedler
Galperin
G bel
H. Szurmant
J. A. Hoch
Kass
Kortemme
Kortemme
Laub
Lockless
M. Weigt
Mascher
Mukhopadhyay
Ninfa
Pruitt
R. A. White
Schmeisser
Szurmant
S el
T. Hwa
Thattai
Toro-Roman
Wells
White
Zapf
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 09/01/2009
Field of study

Understanding the molecular determinants of specificity in protein-protein interaction is an outstanding challenge of postgenome biology. The availability of large protein databases generated from sequences of hundreds of bacterial genomes enables various statistical approaches to this problem. In this context covariance-based methods have been used to identify correlation between amino acid positions in interacting proteins. However, these methods have an important shortcoming, in that they cannot distinguish between directly and indirectly correlated residues. We developed a method that combines covariance analysis with global inference analysis, adopted from use in statistical physics. Applied to a set of >2,500 representatives of the bacterial two-component signal transduction system, the combination of covariance with global inference successfully and robustly identified residue pairs that are proximal in space without resorting to ad hoc tuning parameters, both for heterointeractions between sensor kinase (SK) and response regulator (RR) proteins and for homointeractions between RR proteins. The spectacular success of this approach illustrates the effectiveness of the global inference approach in identifying direct interaction based on sequence information alone. We expect this method to be applicable soon to interaction surfaces between proteins present in only 1 copy per genome as the number of sequenced genomes continues to expand. Use of this method could significantly increase the potential targets for therapeutic intervention, shed light on the mechanism of protein-protein interaction, and establish the foundation for the accurate prediction of interacting protein partners.Comment: Supplementary information available on http://www.pnas.org/content/106/1/67.abstrac

arXiv.org e-Print Archive

Crossref

PubMed Central

AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system

Author: Bessières P.
Bossy R.
Bryson K.
Chaillou S.
Gibrat J.-F.
Hoebeke M.
Loux V.
Maguin E.
Nicolas P.
Penaud S.
van de Guchte M.
Publication venue
Publication date: 01/07/2006
Field of study

We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license

UCL Discovery