46 research outputs found

    Accelerating the Gillespie Ï„-Leaping Method Using Graphics Processing Units

    Get PDF
    The Gillespie Ï„-Leaping Method is an approximate algorithm that is faster than the exact Direct Method (DM) due to the progression of the simulation with larger time steps. However, the procedure to compute the time leap Ï„ is quite expensive. In this paper, we explore the acceleration of the Ï„-Leaping Method using Graphics Processing Unit (GPUs) for ultra-large networks ( reaction channels). We have developed data structures and algorithms that take advantage of the unique hardware architecture and available libraries. Our results show that we obtain a performance gain of over 60x when compared with the best conventional implementations

    Dense Descriptors for Optical Flow Estimation: A Comparative Study

    No full text
    Estimating the displacements of intensity patterns between sequential frames is a very well-studied problem, which is usually referred to as optical flow estimation. The first assumption among many of the methods in the field is the brightness constancy during movements of pixels between frames. This assumption is proven to be not true in general, and therefore, the use of photometric invariant constraints has been studied in the past. One other solution can be sought by use of structural descriptors rather than pixels for estimating the optical flow. Unlike sparse feature detection/description techniques and since the problem of optical flow estimation tries to find a dense flow field, a dense structural representation of individual pixels and their neighbors is computed and then used for matching and optical flow estimation. Here, a comparative study is carried out by extending the framework of SIFT-flow to include more dense descriptors, and comprehensive comparisons are given. Overall, the work can be considered as a baseline for stimulating more interest in the use of dense descriptors for optical flow estimation

    Dynamic Denoising and Gappy Data Reconstruction Based on Dynamic Mode Decomposition and Discrete Cosine Transform

    No full text
    Dynamic Mode Decomposition (DMD) is a data-driven method to analyze the dynamics, first applied to fluid dynamics. It extracts modes and their corresponding eigenvalues, where the modes are spatial fields that identify coherent structures in the flow and the eigenvalues describe the temporal growth/decay rates and oscillation frequencies for each mode. The recently introduced compressed sensing DMD (csDMD) reduces computation times and also has the ability to deal with sub-sampled datasets. In this paper, we present a similar technique based on discrete cosine transform to reconstruct the fully-sampled dataset (as opposed to DMD modes as in csDMD) from sub-sampled noisy and gappy data using l 1 minimization. The proposed method was benchmarked against csDMD in terms of denoising and gap-filling using three datasets. The first was the 2-D time-resolved plot of a double gyre oscillator which has about nine oscillatory modes. The second dataset was derived from a Duffing oscillator. This dataset has several modes associated with complex eigenvalues which makes them oscillatory. The third dataset was taken from the 2-D simulation of a wake behind a cylinder at Re = 100 and was used for investigating the effect of changing various parameters on reconstruction error. The Duffing and 2-D wake datasets were tested in presence of noise and rectangular gaps. While the performance for the double-gyre dataset is comparable to csDMD, the proposed method performs substantially better (lower reconstruction error) for the dataset derived from the Duffing equation and also, the 2-D wake dataset according to the defined reconstruction error metrics

    -NN using sorting.

    No full text
    <p>Here we illustrate the -NN search for 3 vectors. The distance matrix is stored along with the column and row indices in a row-major format. First, we sort the entire distance matrix with the distance as the key. The result is next sorted in a stable manner first with the column as the index and then as separately with the row as the index. We then pick the closest distances both for the columns and the rows results.</p

    Processing global -NNs.

    No full text
    <p>In this figure node is responsible for calculating the global -NNs of all vectors that are in . This is done by computing and merging local row -NNs of . Note that the local row -NNs w.r.t. have already been calculated when node 4 calculated local block-column -NNs w.r.t. . The merged results are stored in a heap. The -NNs w.r.t are cooperatively computed by all nodes. For example, node successively computes and merges -NNs w.r.t. . It then transmits the results to node , which receives results from other nodes and does a global merge.</p

    Summation kernel.

    No full text
    <p>Calculation of every row of involves and one element of per row. Therefore, each thread loads an element of into a register. These data are reused to compute all rows of . Next, one thread per block reads the corresponding element of into shared memory. Next, each thread reads an element of and adds to it the element of , which is in the register and the into shared memory to generate the corresponding element of .</p

    Performance benchmarks for multi-GPU execution.

    No full text
    <p>In this test we used 2<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0074113#pone.0074113-Arefin1" target="_blank">[24]</a> the 2 GPUs (Tesla 2050) were mounted on a single desktop machine. For our implementation, we use 2 nodes in our GPU cluster and opted to use only one GPU per node. The input data had the dimension , and the number of closest neighbors .</p

    Processing local -NNs within nodes.

    No full text
    <p>The sub-problem assigned to a node is finding the row and column -NNs w.r.t . is divided into partitions. All partitions are processed by GPU . The row -NNs are processed within GPU memory and the merged results are written to CPU RAM. The column -NNs are written to CPU RAM. Later, each of the local column -NNs are merged by a single GPU.</p
    corecore