9 research outputs found
Whole Genome Phylogenetic Tree Reconstruction Using Colored de Bruijn Graphs
We present kleuren, a novel assembly-free method to reconstruct phylogenetic
trees using the Colored de Bruijn Graph. kleuren works by constructing the
Colored de Bruijn Graph and then traversing it, finding bubble structures in
the graph that provide phylogenetic signal. The bubbles are then aligned and
concatenated to form a supermatrix, from which a phylogenetic tree is inferred.
We introduce the algorithms that kleuren uses to accomplish this task, and show
its performance on reconstructing the phylogenetic tree of 12 Drosophila
species. kleuren reconstructed the established phylogenetic tree accurately,
and is a viable tool for phylogenetic tree reconstruction using whole genome
sequences. Software package available at: https://github.com/Colelyman/kleurenComment: 6 pages, 3 figures, accepted at BIBE 2017. Minor modifications to the
text due to reviewer feedback and fixed typo
Alternatives to relational databases in precision medicine: comparison of NOSQL approaches for big data storage using supercomputers
Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available.
In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients’ clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main challenges to the development of Precision Medicine and, consequently, investigating a potential solution to the progressively increasing demands on health care
LIPIcs, Volume 244, ESA 2022, Complete Volume
LIPIcs, Volume 244, ESA 2022, Complete Volum
Advances in knowledge discovery and data mining Part II
19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p