Search CORE

83,505 research outputs found

Termination and Cost Analysis with COSTA and its User Interfaces

Author: Albert Albiol Elvira
Arenas Sánchez Purificación
Genaim Samir
Gomez Zamalloa Miguel
Puebla Sánchez Alvaro Germán
Ramirez Deantes Diana
Roman Groman
Zanardini Damiano
Publication venue: Facultad de Informática (UPM)
Publication date: 01/12/2009
Field of study

COSTA is a static analyzer for Java bytecode which is able to infer cost and termination information for large classes of programs. The analyzer takes as input a program and a resource of interest, in the form of a cost model, and aims at obtaining an upper bound on the execution cost with respect to the resource and at proving program termination. The costa system has reached a considerable degree of maturity in that (1) it includes state-of-the-art techniques for statically estimating the resource consumption and the termination behavior of programs, plus a number of specialized techniques which are required for achieving accurate results in the context of object-oriented programs, such as handling numeric fields in value analysis; (2) it provides several nontrivial notions of cost (resource consumption) including, in addition to the number of execution steps, the amount of memory allocated in the heap or the number of calls to some user-specified method; (3) it provides several user interfaces: a classical command line, a Web interface which allows experimenting remotely with the system without the need of installing it locally, and a recently developed Eclipse plugin which facilitates the usage of the analyzer, even during the development phase; (4) it can deal with both the Standard and Micro editions of Java. In the tool demonstration, we will show that costa is able to produce meaningful results for non-trivial programs, possibly using Java libraries. Such results can then be used in many applications, including program development, resource usage certification, program optimization, etc

Archivo Digital UPM

A resource-frugal probabilistic dictionary and applications in (meta)genomics

Author: Bittner Lucie
Limasset Antoine
Marchet Camille
Peterlongo Pierre
Publication venue
Publication date: 26/05/2016
Field of study

Genomic and metagenomic fields, generating huge sets of short genomic sequences, brought their own share of high performance problems. To extract relevant pieces of information from the huge data sets generated by current sequencing techniques, one must rely on extremely scalable methods and solutions. Indexing billions of objects is a task considered too expensive while being a fundamental need in this field. In this paper we propose a straightforward indexing structure that scales to billions of element and we propose two direct applications in genomics and metagenomics. We show that our proposal solves problem instances for which no other known solution scales-up. We believe that many tools and applications could benefit from either the fundamental data structure we provide or from the applications developed from this structure.Comment: Submitted to PSC 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recommended from our members

Automation of Determination of Optimal Intra-Compute Node Parallelism

Author: Brown James C.
Gómez-Iglesias Antonio
Publication venue
Publication date: 01/01/2016
Field of study

Maximizing the productivity of modern multicore and manycore chips requires optimizing parallelism at the compute node level. This is, however, a complex multi-step process. It is an iterative method requiring determining optimal degrees of parallel scalability and optimizing memory access behavior. Further, there are multiple cases to be considered, programs which use only MPI or OpenMP and hybrid (MPI +OpenMP) programs. This paper presents a set of three coordinated workﬂows for determining the optimal parallelism at the program level for MPI programs and at the loop level for hybrid (MPI+OpenMP) cases. The paper also details mostly automated implementations of these workﬂows using the PerfExpert infrastructure. Finally the paper presents case studies demonstrating both the applicability and the effectiveness of optimizing parallelism at the compute node level. The results shown in the paper will provide valuable information to further advance in the full automation of the workﬂows. The software implementing the parallelism scalability optimization is open source and available for download.Texas Advanced Computing Center (TACC)Computer Science

Texas ScholarWorks

Gerbil: A Fast and Memory-Efficient $k$ -mer Counter with GPU-Support

Author: Erbert Marius
Müller-Hannemann Matthias
Rechner Steffen
Publication venue
Publication date: 22/07/2016
Field of study

A basic task in bioinformatics is the counting of

k

-mers in genome strings. The

k

-mer counting problem is to build a histogram of all substrings of length

k

in a given genome sequence. We present the open source

k

-mer counting software Gerbil that has been designed for the efficient counting of

k

-mers for

k\geq32

. Given the technology trend towards long reads of next-generation sequencers, support for large

k

becomes increasingly important. While existing

k

-mer counting tools suffer from excessive memory resource consumption or degrading performance for large

k

, Gerbil is able to efficiently support large

k

without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into

k

-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large

k

, we outperform state-of-the-art open source

k

-mer counting tools for large genome data sets.Comment: A short version of this paper will appear in the proceedings of WABI 201

arXiv.org e-Print Archive

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

FreePSI: an alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome.

Author: Jiang Tao
Ma Shining
Wang Dongfang
Zeng Jianyang
Zhou Jianyu
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Alternative splicing plays an important role in many cellular processes of eukaryotic organisms. The exon-inclusion ratio, also known as percent spliced in, is often regarded as one of the most effective measures of alternative splicing events. The existing methods for estimating exon-inclusion ratios at the genome scale all require the existence of a reference transcriptome. In this paper, we propose an alignment-free method, FreePSI, to perform genome-wide estimation of exon-inclusion ratios from RNA-Seq data without relying on the guidance of a reference transcriptome. It uses a novel probabilistic generative model based on k-mer profiles to quantify the exon-inclusion ratios at the genome scale and an efficient expectation-maximization algorithm based on a divide-and-conquer strategy and ultrafast conjugate gradient projection descent method to solve the model. We compare FreePSI with the existing methods on simulated and real RNA-seq data in terms of both accuracy and efficiency and show that it is able to achieve very good performance even though a reference transcriptome is not provided. Our results suggest that FreePSI may have important applications in performing alternative splicing analysis for organisms that do not have quality reference transcriptomes. FreePSI is implemented in C++ and freely available to the public on GitHub

eScholarship - University of California

Recovering complete and draft population genomes from metagenome datasets.

Author: Gilbert Jack A
Sangwan Naseer
Xia Fangfang
Publication venue: eScholarship, University of California
Publication date: 01/03/2016
Field of study

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

Woods Hole Open Access Server

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

An improved instruction-level power model for ARM11 microprocessor

Author: Wang Wei
Zwolinski Mark
Publication venue
Publication date: 23/01/2014
Field of study

The power and energy consumed by a chip has become the primary design constraint for embedded systems, which has led to a lot of work in hardware design techniques such as clock gating and power gating. The software can also affect the power usage of a chip, hence good software design can be used to reduce the power further. In this paper we present an instruction-level power model based on an ARM1176JZF-S processor to predict the power of software applications. Our model takes substantially less input data than existing high accuracy models and does not need to consider each instruction individually. We show that the power is related to both the distribution of instruction types and the operations per clock cycle (OPC) of the program. Our model does not need to consider the effect of two adjacent instructions, which saves a lot of calculation and measurements. Pipeline stall effects are also considered by OPC instead of cache miss, because there are a lot of other reasons that can cause the pipeline to stall. The model shows good performance with a maximum estimation error of -8.28\% and an average absolute estimation error is 4.88\% over six benchmarks. Finally, we prove that energy per operation (EPO) decreases with increasing operations per clock cycle, and we confirm the relationship empirically

Southampton (e-Prints Soton)