15 research outputs found
Parallel approach to sliding window sums
Sliding window sums are widely used in bioinformatics applications, including
sequence assembly, k-mer generation, hashing and compression. New vector
algorithms which utilize the advanced vector extension (AVX) instructions
available on modern processors, or the parallel compute units on GPUs and
FPGAs, would provide a significant performance boost for the bioinformatics
applications. We develop a generic vectorized sliding sum algorithm with
speedup for window size w and number of processors P is O(P/w) for a generic
sliding sum. For a sum with commutative operator the speedup is improved to
O(P/log(w)). When applied to the genomic application of minimizer based k-mer
table generation using AVX instructions, we obtain a speedup of over 5X.Comment: 10 pages, 5 figure
Ensemble-based relationship discovery in relational databases.
We performed an investigation of how several data relationship discovery algorithms can be combined to improve performance. We investigated eight relationship discovery algorithms like Cosine similarity, Soundex similarity, Name similarity, Value range similarity, etc., to identify potential links between database tables in different ways using different categories of database information. We proposed voting system and hierarchical clustering ensemble methods to reduce the generalization error of each algorithm. Voting scheme uses a given weighting metric to combine the predictions of each algorithm. Hierarchical clustering groups predictions into clusters based on similarities and then combine a member from each cluster together. We run experiments to validate the performance of each algorithm and compare performance with our ensemble methods and the state-of-the-art algorithms (FaskFK, Randomness and HoPF) using Precision, Recall and F-Measure evaluation metrics over TPCH and AdvWork datasets. Results show that performance of each algorithm is limited, indicating the importance of combining them to consolidate their strengths