69 research outputs found
MC64: A web platform to test bioinformatics algorithms in a many-core architecture
New analytical methodologies, like the so-called "next-generation sequencing" (NGS), allow the sequencing of full genomes with high speed and reduced price. Yet, such technologies generate huge amounts of data that demand large raw computational power. Many-core technologies can be exploited to overcome the involved bioinformatics bottleneck. Indeed, such hardware is currently in active development. We have developed parallel bioinformatics algorithms for many-core microprocessors containing 64 cores each. Thus, the MC64 web platform allows executing high-performance alignments (Needleman-Wunsch, Smith-Waterman and ClustalW) of long sequences. The MC64 platform can be accessed via web browsers, allowing easy resource integration into third-party tools. Furthermore, the results obtained from the MC64 include time-performance statistics that can be compared with other platform
From the zero-field metal-insulator transition in two dimensions to the quantum Hall transition: a percolation-effective-medium theory
Effective-medium theory is applied to the percolation description of the
metal-insulator transition in two dimensions with emphasis on the continuous
connection between the zero-magnetic-field transition and the quantum Hall
transition. In this model the system consists of puddles connected via saddle
points, and there is loss of quantum coherence inside the puddles. The
effective conductance of the network is calculated using appropriate
integration over the distribution of conductances, leading to a determination
of the magnetic field dependence of the critical density. Excellent
quantitative agreement is obtained with the experimental data, which allows an
estimate of the puddle physical parameters
A Faster Algorithm for Two-Variable Integer Programming
We show that a 2-variable integer program, defined by constraints involving coefficients with at most bits can be solved with arithmetic operations on rational numbers of size~. This result closes the gap between the running time of two-variable integer programming with the sum of the running times of the Euclidean algorithm on -bit integers and the problem of checking feasibility of an integer point for ~constraints
An Information Theoretic Lower Bound for the Longest Common Subsequence Problem
Tech ReportWe shall derive a lower on the number of "less-than-equal-greater than" comparisons required to solve the longest common subsequence (LCS) problem.National Science Foundatio
A Lower Worst-Case Complexity for Searching a Dictionary
Tech ReportIt is shown that k(p+3)/2 + p-2 letter comparisons suffice to determine whether a word is a member of a lexicographically ordered dictionary containing 2<sup>p</sup>-1 words of length k. This offers a potential savings (compared to worst case complexity of binary search) that asymptotically approaches 50 percent.National Science Foundatio
On the Complexity of Vector Searching
Tech ReportThe vector searching problem is, given k-vector A (a k-vector) is a vector that has k components, over the integers) and given a set B of n distinct k-vectors, to determine whether or not A is a member of set B. Comparisons between components yielding "greater than-equal-less than" results are permitted. It is shown that if the vectors in B are unordered then nk comparisons are necessary and sufficient. In the case when the vectors in B are ordered, it is shown that [log n] + k comparisons are necessary and, for n≥4k, k[log(n/k)] + 2k-1 comparisons are sufficient
A Parallel Graph Algorithm for Finding Connected Components
Tech ReportA parallel program is presented that determines the connected components of an undirected graph in time 0(log<sup>2</sup>n) using n<sup>2</sup> processors. It is assumed that the processors have access to common memory. Simultaneous access to the same location is permitted for fetch, but not store, instructions
Recommended from our members
The time complexity of decision tree induction
Various factors affecting decision tree learning time are explored. The factors which consistently affect accuracy are those which directly or indirectly (as in the handling of continuous attributes) allow a greater number and variety of potential trees to be explored. Other factors, such as pruning and choice of heuristics, generally have little effect on accuracy, but significantly affect learning time. We prove that the time complexity of induction and post-processing is exponential in tree height in the worst case and, under fairly general conditions, in the average case. This puts a premium on designs which tend to produce shallower trees (e.g., multi-way rather than binary splits, and heuristics which prefer more balanced splits). Simple pruning is linear in tree height, contrasted to the exponential growth of more complex operations. The key factor influencing whether simple pruning will suffice is that the split selection and pruning heuristics should be the same and unbiased. The information gain and x^2 heuristics are biased towards unbalanced splits, and neither is an admissible test for pruning. Empirical results show that the hypergeometric function can be used for both split selection and pruning, and that the resulting trees are simpler, more quickly learned, and no less accurate than the trees resulting from other heuristics and more complex post-processing
- …