48,873 research outputs found
Modern Approaches to Exact Diagonalization and Selected Configuration Interaction with the Adaptive Sampling CI Method.
Recent advances in selected configuration interaction methods have made them competitive with the most accurate techniques available and, hence, creating an increasingly powerful tool for solving quantum Hamiltonians. In this work, we build on recent advances from the adaptive sampling configuration interaction (ASCI) algorithm. We show that a useful paradigm for generating efficient selected CI/exact diagonalization algorithms is driven by fast sorting algorithms, much in the same way iterative diagonalization is based on the paradigm of matrix vector multiplication. We present several new algorithms for all parts of performing a selected CI, which includes new ASCI search, dynamic bit masking, fast orbital rotations, fast diagonal matrix elements, and residue arrays. The ASCI search algorithm can be used in several different modes, which includes an integral driven search and a coefficient driven search. The algorithms presented here are fast and scalable, and we find that because they are built on fast sorting algorithms they are more efficient than all other approaches we considered. After introducing these techniques, we present ASCI results applied to a large range of systems and basis sets to demonstrate the types of simulations that can be practically treated at the full-CI level with modern methods and hardware, presenting double- and triple-Ī¶ benchmark data for the G1 data set. The largest of these calculations is Si2H6 which is a simulation of 34 electrons in 152 orbitals. We also present some preliminary results for fast deterministic perturbation theory simulations that use hash functions to maintain high efficiency for treating large basis sets
Simple, compact and robust approximate string dictionary
This paper is concerned with practical implementations of approximate string
dictionaries that allow edit errors. In this problem, we have as input a
dictionary of strings of total length over an alphabet of size
. Given a bound and a pattern of length , a query has to
return all the strings of the dictionary which are at edit distance at most
from , where the edit distance between two strings and is defined as
the minimum-cost sequence of edit operations that transform into . The
cost of a sequence of operations is defined as the sum of the costs of the
operations involved in the sequence. In this paper, we assume that each of
these operations has unit cost and consider only three operations: deletion of
one character, insertion of one character and substitution of a character by
another. We present a practical implementation of the data structure we
recently proposed and which works only for one error. We extend the scheme to
. Our implementation has many desirable properties: it has a very
fast and space-efficient building algorithm. The dictionary data structure is
compact and has fast and robust query time. Finally our data structure is
simple to implement as it only uses basic techniques from the literature,
mainly hashing (linear probing and hash signatures) and succinct data
structures (bitvectors supporting rank queries).Comment: Accepted to a journal (19 pages, 2 figures
Optimum Search Schemes for Approximate String Matching Using Bidirectional FM-Index
Finding approximate occurrences of a pattern in a text using a full-text
index is a central problem in bioinformatics and has been extensively
researched. Bidirectional indices have opened new possibilities in this regard
allowing the search to start from anywhere within the pattern and extend in
both directions. In particular, use of search schemes (partitioning the pattern
and searching the pieces in certain orders with given bounds on errors) can
yield significant speed-ups. However, finding optimal search schemes is a
difficult combinatorial optimization problem.
Here for the first time, we propose a mixed integer program (MIP) capable to
solve this optimization problem for Hamming distance with given number of
pieces. Our experiments show that the optimal search schemes found by our MIP
significantly improve the performance of search in bidirectional FM-index upon
previous ad-hoc solutions. For example, approximate matching of 101-bp Illumina
reads (with two errors) becomes 35 times faster than standard backtracking.
Moreover, despite being performed purely in the index, the running time of
search using our optimal schemes (for up to two errors) is comparable to the
best state-of-the-art aligners, which benefit from combining search in index
with in-text verification using dynamic programming. As a result, we anticipate
a full-fledged aligner that employs an intelligent combination of search in the
bidirectional FM-index using our optimal search schemes and in-text
verification using dynamic programming outperforms today's best aligners. The
development of such an aligner, called FAMOUS (Fast Approximate string Matching
using OptimUm search Schemes), is ongoing as our future work
Exploration of Reaction Pathways and Chemical Transformation Networks
For the investigation of chemical reaction networks, the identification of
all relevant intermediates and elementary reactions is mandatory. Many
algorithmic approaches exist that perform explorations efficiently and
automatedly. These approaches differ in their application range, the level of
completeness of the exploration, as well as the amount of heuristics and human
intervention required. Here, we describe and compare the different approaches
based on these criteria. Future directions leveraging the strengths of chemical
heuristics, human interaction, and physical rigor are discussed.Comment: 48 pages, 4 figure
Gene specific co-regulation discovery: an improved approach
[Abstract]: Discovering gene co-regulatory relationships is a new but important research problem in DNA microarray data analysis. The problem of gene specific co-regulation discovery is to, for a particular gene of interest, called the target gene, identify its strongly co-regulated genes and the condition subsets where such strong gene co-regulations are observed. The study on this problem can contribute to a better understanding and characterization of the target gene. The existing method, using the genetic algorithm (GA), is slow due to its expensive fitness evaluation and long individual representation. In this paper, we propose an improved method for finding gene specific co-regulations. Compared with the current method, our method features a notably improved effciency. We employ kNN Search Table to substantially speed up fitness evaluation in the GA. We also propose a more compact representation scheme for encoding individuals in the GA, which contributes to faster crossover and mutation operations. Experimental results with a real-life gene mi-croarray data set demonstrate the improved effciency of our technique compared with the current method
- ā¦