Search CORE

14,691 research outputs found

Data Discovery and Anomaly Detection Using Atypicality: Theory

Author: Clayton Yates (572584)
Jason White (146854)
Jennifer Myers (4241683)
Kaixian Yu (2836718)
Karin Vallega (4241680)
Qing-Xiang Sang (3461384)
Publication venue
Publication date: 10/09/2017
Field of study

A central question in the era of 'big data' is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in the parts that deviate from the average, that are unusual, atypical. We define what we mean by 'atypical' in an axiomatic way as data that can be encoded with fewer bits in itself rather than using the code for the typical data. We show that this definition has good theoretical properties. We then develop an implementation based on universal source coding, and apply this to a number of real world data sets.Comment: 40 page

arXiv.org e-Print Archive

FigShare

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

PubMed Central

FigShare

Electronic transport in DNA

Author: Alberts
Asabaeva
Bakhshi
Barnett
Berlin
Bhalla
Bixon
Boon
Braun
Bruinsma
Cuniberti
Daniels
Daphne Klotsa
Davies
Dekker
Delaney
Dyson
Eilmes
Endres
Fink
Frahm
Garzon
Kelley
Kramer
Ladik
Lewis
MacKinnon
MacKinnon
MacKinnon
MacKinnon
Matthew S. Turner
Murphy
Nakao
Ndawana
Okahata
O’Neil
O’Neil
Pablo
Peng
Peyrard
Pichard
Pichard
Plyushchay
Porath
Porath
Rakitin
Retel
Roche
Roche
Rudolf A. Römer
Römer
Römer
Römer
Treadway
Wan
Wang
Wesolowski
Ye
Yu
Zhang
Zhang
Zhong
Publication venue: 'Elsevier BV'
Publication date: 04/04/2005
Field of study

We study the electronic properties of DNA by way of a tight-binding model applied to four particular DNA sequences. The charge transfer properties are presented in terms of localization lengths (crudely speaking, the length over which electrons travel). Various types of disorder, including random potentials, are employed to account for different real environments. We have performed calculations on poly(dG)-poly(dC), telomeric-DNA, random-ATGC DNA, and l-DNA. We find that random and l-DNA have localization lengths allowing for electron motion among a few dozen basepairs only. A novel enhancement of localization lengths is observed at particular energies for an increasing binary backbone disorder. We comment on the possible biological relevance of sequence-dependent charge transfer in DNA

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Warwick Research Archives Portal Repository

Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome

Author: Alvarez-Valin Fernando
Basterrech Sebastián
Guerberoff Gustavo
Mesa Andrea
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/10/2015
Field of study

The article presents an application of Hidden Markov Models (HMMs) for pattern recognition on genome sequences. We apply HMM for identifying genes encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa causative agents of sleeping sickness and several diseases in domestic and wild animals. These parasites have a peculiar strategy to evade the host's immune system that consists in periodically changing their predominant cellular surface protein (VSG). The motivation for using patterns recognition methods to identify these genes, instead of traditional homology based ones, is that the levels of sequence identity (amino acid and DNA sequence) amongst these genes is often below of what is considered reliable in these methods. Among pattern recognition approaches, HMM are particularly suitable to tackle this problem because they can handle more naturally the determination of gene edges. We evaluate the performance of the model using different number of states in the Markov model, as well as several performance metrics. The model is applied using public genomic data. Our empirical results show that the VSG genes on T. brucei can be safely identified (high sensitivity and low rate of false positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications, Springer. The article contains 23 pages, 4 figures, 8 tables and 51 reference

arXiv.org e-Print Archive

DSpace at VSB Technical University of Ostrava