1,889 research outputs found
Experimental Investigation of Frequency Chaos Game Representation for In Silico and Accurate Classification of Viral Pathogens from Genomic Sequences
This paper presents an experimental investigation to determine the efficacy and the appropriate order of Frequency Chaos Game Representation (FCGR) for accurate and in silico classification of pathogenic viruses. For this
study, we curated genomic sequences of selected viral pathogens from the virus pathogen database and analysis resource corpus. The viral genomes were encoded
using the first to seventh order FCGRs so as to produce training and testing genomic data features. Thereafter, four different kernels of naïve Bayes classifier were experimentally trained and tested with the generated FCGR genomic features. The performance result with the highest average classification accuracy of 98% was returned by the third and fourth order FCGRs. However, due to consideration
for memory utilization, computational efficiency vis-à-vis classification accuracy, the third order FCGR is deemed suitable for accurate classification of viral pathogens from genome sequences. This provides a promising foundation
for developing genomic based diagnostic toolkit that could be used to promptly address the global incidence of epidemics from pathogenic viruses
Mapping the Space of Genomic Signatures
We propose a computational method to measure and visualize interrelationships
among any number of DNA sequences allowing, for example, the examination of
hundreds or thousands of complete mitochondrial genomes. An "image distance" is
computed for each pair of graphical representations of DNA sequences, and the
distances are visualized as a Molecular Distance Map: Each point on the map
represents a DNA sequence, and the spatial proximity between any two points
reflects the degree of structural similarity between the corresponding
sequences. The graphical representation of DNA sequences utilized, Chaos Game
Representation (CGR), is genome- and species-specific and can thus act as a
genomic signature. Consequently, Molecular Distance Maps could inform species
identification, taxonomic classifications and, to a certain extent,
evolutionary history. The image distance employed, Structural Dissimilarity
Index (DSSIM), implicitly compares the occurrences of oligomers of length up to
(herein ) in DNA sequences. We computed DSSIM distances for more than
5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional
Scaling (MDS) to obtain Molecular Distance Maps that visually display the
sequence relatedness in various subsets, at different taxonomic levels. This
general-purpose method does not require DNA sequence homology and can thus be
used to compare similar or vastly different DNA sequences, genomic or
computer-generated, of the same or different lengths. We illustrate potential
uses of this approach by applying it to several taxonomic subsets: phylum
Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class
Amphibia, and order Primates. This analysis of an extensive dataset confirms
that the oligomer composition of full mtDNA sequences can be a source of
taxonomic information.Comment: 14 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1307.375
An investigation into inter- and intragenomic variations of graphic genomic signatures
We provide, on an extensive dataset and using several different distances,
confirmation of the hypothesis that CGR patterns are preserved along a genomic
DNA sequence, and are different for DNA sequences originating from genomes of
different species. This finding lends support to the theory that CGRs of
genomic sequences can act as graphic genomic signatures. In particular, we
compare the CGR patterns of over five hundred different 150,000 bp genomic
sequences originating from the genomes of six organisms, each belonging to one
of the kingdoms of life: H. sapiens, S. cerevisiae, A. thaliana, P. falciparum,
E. coli, and P. furiosus. We also provide preliminary evidence of this method's
applicability to closely related species by comparing H. sapiens (chromosome
21) sequences and over one hundred and fifty genomic sequences, also 150,000 bp
long, from P. troglodytes (Animalia; chromosome Y), for a total length of more
than 101 million basepairs analyzed. We compute pairwise distances between CGRs
of these genomic sequences using six different distances, and construct
Molecular Distance Maps that visualize all sequences as points in a
two-dimensional or three-dimensional space, to simultaneously display their
interrelationships. Our analysis confirms that CGR patterns of DNA sequences
from the same genome are in general quantitatively similar, while being
different for DNA sequences from genomes of different species. Our analysis of
the performance of the assessed distances uses three different quality measures
and suggests that several distances outperform the Euclidean distance, which
has so far been almost exclusively used for such studies. In particular we show
that, for this dataset, DSSIM (Structural Dissimilarity Index) and the
descriptor distance (introduced here) are best able to classify genomic
sequences.Comment: 14 pages, 6 figures, 5 table
Global transposable characteristics in the yeast complete DNA sequence
Global transposable characteristics in the complete DNA sequence of the
Saccharomyces cevevisiae yeast is determined by using the metric representation
and recurrence plot methods. In the form of the correlation distance of
nucleotide strings, 16 chromosome sequences of the yeast, which are divided
into 5 groups, display 4 kinds of the fundamental transposable characteristics:
a short period increasing, a long quasi-period increasing, a long major value
and hardly relevant.Comment: 19 pages, 5 figures, 5 table
Information profiles for DNA pattern discovery
Finite-context modeling is a powerful tool for compressing and hence for
representing DNA sequences. We describe an algorithm to detect genomic
regularities, within a blind discovery strategy. The algorithm uses information
profiles built using suitable combinations of finite-context models. We used
the genome of the fission yeast Schizosaccharomyces pombe strain 972 h- for
illustration, unveilling locations of low information content, which are
usually associated with DNA regions of potential biological interest.Comment: Full version of DCC 2014 paper "Information profiles for DNA pattern
discovery
A Quantitative Model for Human Olfactory Receptors
A wide variety of chemicals having distinct odors are smelled by humans. Odor perception initiates in the nose, where it is detected by a large family of olfactory receptors (ORs). Based on divergence of evolutionary model, a sequence of human ORs database has been proposed by D. Lancet et al (2000, 2006). It is quite impossible to infer whether a given sequence of nucleotides is a human OR or not, without any biological experimental validation. In our perspective, a proper quantitative understanding of these ORs is required to justify or nullify whether a given sequence is a human OR or not. In this paper, all human OR sequences have been quantified, and a set of clusters have been made using the quantitative results based on two different metrics. Using this proposed quantitative model, one can easily make probable justification or deterministic nullification whether a given sequence of nucleotides is a probable human OR homologue or not, without seeking any biological experiment. Of course a further biological experiment is essential to validate the probable human OR homologue
- …