1 research outputs found
Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
Metagenomic projects using whole-genome shotgun (WGS)
sequencing produces many unassembled DNA sequences and small
contigs. The step of clustering these sequences, based on
biological and molecular features, is called binning. A
reported strategy for binning that combines oligonucleotide
frequency and self-organising maps (SOM) shows high potential.
We improve this strategy by identifying suitable training
features, implementing a better clustering algorithm, and
defining quantitative measures for assessing results. We
investigated the suitability of each of di-, tri-, tetra-, and
pentanucleotide frequencies. The results show that
dinucleotide frequency is not a sufficiently strong signature
for binning 10 kb long DNA sequences, compared to the other
three. Furthermore, we observed that increased order of
oligonucleotide frequency may deteriorate the assignment
result in some cases, which indicates the possible existence
of optimal species-specific oligonucleotide frequency. We
replaced SOM with growing self-organising map (GSOM) where
comparable results are obtained while gaining
7%â15%
speed improvement