Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

BFJ Manly; CI Castillo-Davis; David Johnson; DB Searls; DB Searls; DD Womble; E Badidi; F Antequera; J Krueger; J Theilhaber; JD Wren; JD Wren; JF Costello; JM Claverie; Jonathan D Wren; JR Quinlan; K Davies; K Nakai; L Stein; Le Gruenwald; LV Zhang; M Ashburner; M Gardiner-Garden; M Safran; P Clark; RS Michalski; S Foissac; S Muggleton; SP Shah; TV Venkatesh; V Bajic; W Frawley; WM Shui; WM Shui; Y Liu

Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

Authors: BFJ Manly
CI Castillo-Davis
David Johnson
DB Searls
DB Searls
DD Womble
E Badidi
F Antequera
J Krueger
J Theilhaber
JD Wren
JD Wren
JF Costello
JM Claverie
Jonathan D Wren
JR Quinlan
K Davies
K Nakai
L Stein
Le Gruenwald
LV Zhang
M Ashburner
M Gardiner-Garden
M Safran
P Clark
RS Michalski
S Foissac
S Muggleton
SP Shah
TV Venkatesh
V Bajic
W Frawley
WM Shui
WM Shui
Y Liu
Publication date: 1 January 2005
Publisher: BioMed Central
Doi

Abstract

There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Springer - Publisher Connector

Last time updated on 05/06/2019

Crossref

Last time updated on 01/04/2019

Springer - Publisher Connector

Last time updated on 28/04/2017

Directory of Open Access Journals

oai:doaj.org/article:e7c20c68a...

Last time updated on 17/12/2014