4,220 research outputs found
Analyzing large-scale DNA Sequences on Multi-core Architectures
Rapid analysis of DNA sequences is important in preventing the evolution of
different viruses and bacteria during an early phase, early diagnosis of
genetic predispositions to certain diseases (cancer, cardiovascular diseases),
and in DNA forensics. However, real-world DNA sequences may comprise several
Gigabytes and the process of DNA analysis demands adequate computational
resources to be completed within a reasonable time. In this paper we present a
scalable approach for parallel DNA analysis that is based on Finite Automata,
and which is suitable for analyzing very large DNA segments. We evaluate our
approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog
(2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results
on a dual-socket shared-memory system with 24 physical cores show speed-ups of
up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel
approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and
Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201
DNA Sequence Representation by Use of Statistical Finite Automata
This project defines and intends to solve the problem of representing information carried by DNA sequences in terms of amino acids, through application of the theory of finite automata. Sequences can be compared against each other to find existing patterns, if any, which may include important genetic information. Comparison can state whether the DNA sequences belong to the same, related or entirely different species in the ‘Tree of Life’ (phylogeny). This is achieved by using extended and statistical finite automata. In order to solve this problem, the concepts of automata and their extension, i.e. Alergia algorithm have been used. In this specific case, we have used the chemical property - polarity of amino acids to analyze the DNA sequences
- …