83,505 research outputs found
Termination and Cost Analysis with COSTA and its User Interfaces
COSTA is a static analyzer for Java bytecode which is able to infer cost and termination information for large classes of programs. The analyzer takes as input a program and a resource of interest, in the form of a cost model, and aims at obtaining an upper bound on the execution cost with respect to the resource and at proving program termination. The costa system has reached a considerable degree of maturity in that (1) it includes state-of-the-art techniques for statically estimating the resource consumption and the termination behavior of programs, plus a number of specialized techniques which are required for achieving accurate results in the context of object-oriented programs, such as handling numeric fields in value analysis; (2) it provides several nontrivial notions of cost (resource consumption) including, in addition to the number of execution steps, the amount of memory allocated in the heap or the number of calls to some user-specified method; (3) it provides several user interfaces: a classical command line, a Web interface which allows experimenting remotely with the system without the need of installing it locally, and a recently developed Eclipse plugin which facilitates the usage of the analyzer, even during the development phase; (4) it can deal with both the Standard and Micro editions of Java. In the tool demonstration, we will show that costa is able to produce meaningful results for non-trivial programs, possibly using Java libraries. Such results can then be used in many applications, including program development, resource usage certification, program optimization, etc
A resource-frugal probabilistic dictionary and applications in (meta)genomics
Genomic and metagenomic fields, generating huge sets of short genomic
sequences, brought their own share of high performance problems. To extract
relevant pieces of information from the huge data sets generated by current
sequencing techniques, one must rely on extremely scalable methods and
solutions. Indexing billions of objects is a task considered too expensive
while being a fundamental need in this field. In this paper we propose a
straightforward indexing structure that scales to billions of element and we
propose two direct applications in genomics and metagenomics. We show that our
proposal solves problem instances for which no other known solution scales-up.
We believe that many tools and applications could benefit from either the
fundamental data structure we provide or from the applications developed from
this structure.Comment: Submitted to PSC 201
Recommended from our members
Automation of Determination of Optimal Intra-Compute Node Parallelism
Maximizing the productivity of modern multicore and manycore chips requires optimizing parallelism at the compute node level. This is, however, a complex multi-step process. It is an iterative method requiring determining optimal degrees of parallel scalability and optimizing memory access behavior. Further, there are multiple cases to be considered, programs which use only MPI or OpenMP and hybrid (MPI +OpenMP) programs. This paper presents a set of three coordinated workflows for determining the optimal parallelism at the program level for MPI programs and at the loop level for hybrid (MPI+OpenMP) cases. The paper also details mostly automated implementations of these workflows using the PerfExpert infrastructure. Finally the paper presents case studies demonstrating both the applicability and the effectiveness of optimizing parallelism at the compute node level. The results shown in the paper will provide valuable information to further advance in the full automation of the workflows. The software implementing the parallelism scalability optimization is open source and available for download.Texas Advanced Computing Center (TACC)Computer Science
Gerbil: A Fast and Memory-Efficient -mer Counter with GPU-Support
A basic task in bioinformatics is the counting of -mers in genome strings.
The -mer counting problem is to build a histogram of all substrings of
length in a given genome sequence. We present the open source -mer
counting software Gerbil that has been designed for the efficient counting of
-mers for . Given the technology trend towards long reads of
next-generation sequencers, support for large becomes increasingly
important. While existing -mer counting tools suffer from excessive memory
resource consumption or degrading performance for large , Gerbil is able to
efficiently support large without much loss of performance. Our software
implements a two-disk approach. In the first step, DNA reads are loaded from
disk and distributed to temporary files that are stored at a working disk. In a
second step, the temporary files are read again, split into -mers and
counted via a hash table approach. In addition, Gerbil can optionally use GPUs
to accelerate the counting step. For large , we outperform state-of-the-art
open source -mer counting tools for large genome data sets.Comment: A short version of this paper will appear in the proceedings of WABI
201
Recommended from our members
FreePSI: an alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome.
Alternative splicing plays an important role in many cellular processes of eukaryotic organisms. The exon-inclusion ratio, also known as percent spliced in, is often regarded as one of the most effective measures of alternative splicing events. The existing methods for estimating exon-inclusion ratios at the genome scale all require the existence of a reference transcriptome. In this paper, we propose an alignment-free method, FreePSI, to perform genome-wide estimation of exon-inclusion ratios from RNA-Seq data without relying on the guidance of a reference transcriptome. It uses a novel probabilistic generative model based on k-mer profiles to quantify the exon-inclusion ratios at the genome scale and an efficient expectation-maximization algorithm based on a divide-and-conquer strategy and ultrafast conjugate gradient projection descent method to solve the model. We compare FreePSI with the existing methods on simulated and real RNA-seq data in terms of both accuracy and efficiency and show that it is able to achieve very good performance even though a reference transcriptome is not provided. Our results suggest that FreePSI may have important applications in performing alternative splicing analysis for organisms that do not have quality reference transcriptomes. FreePSI is implemented in C++ and freely available to the public on GitHub
Recovering complete and draft population genomes from metagenome datasets.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution
An improved instruction-level power model for ARM11 microprocessor
The power and energy consumed by a chip has become the primary design constraint for embedded systems, which has led to a lot of work in hardware design techniques such as clock gating and power gating. The software can also affect the power usage of a chip, hence good software design can be used to reduce the power further. In this paper we present an instruction-level power model based on an ARM1176JZF-S processor to predict the power of software applications. Our model takes substantially less input data than existing high accuracy models and does not need to consider each instruction individually. We show that the power is related to both the distribution of instruction types and the operations per clock cycle (OPC) of the program. Our model does not need to consider the effect of two adjacent instructions, which saves a lot of calculation and measurements. Pipeline stall effects are also considered by OPC instead of cache miss, because there are a lot of other reasons that can cause the pipeline to stall. The model shows good performance with a maximum estimation error of -8.28\% and an average absolute estimation error is 4.88\% over six benchmarks. Finally, we prove that energy per operation (EPO) decreases with increasing operations per clock cycle, and we confirm the relationship empirically
- …