Search CORE

240,998 research outputs found

Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics

Author: Bazin Eric
Belkhir Khalid
Dutheil Julien
Gaillard Sylvain
Galtier Nicolas
Glémin Sylvain
Ranwez Vincent
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: A large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/ouput methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications. RESULTS: We present Bio++, a set of Object Oriented libraries written in C++. Available components include classes for data storage and handling (nucleotide/amino-acid/codon sequences, trees, distance matrices, population genetics datasets), various input/output formats, basic sequence manipulation (concatenation, transcription, translation, etc.), phylogenetic analysis (maximum parsimony, markov models, distance methods, likelihood computation and maximization), population genetics/genomics (diversity statistics, neutrality tests, various multi-locus analyses) and various algorithms for numerical calculus. CONCLUSION: Implementation of methods aims at being both efficient and user-friendly. A special concern was given to the library design to enable easy extension and new methods development. We defined a general hierarchy of classes that allow the developer to implement its own algorithms while remaining compatible with the rest of the libraries. Bio++ source code is distributed free of charge under the CeCILL general public licence from its website

Springer - Publisher Connector

Directory of Open Access Journals

The Longhorn Array Database (LAD): An Open-Source, MIAME compliant implementation of the Stanford Microarray Database (SMD)

Author: Iyer Vishwanath R
Killion Patrick J
Sherlock Gavin
Publication venue: BioMed Central
Publication date: 20/08/2003
Field of study

BACKGROUND: The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. DESCRIPTION: The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at CONCLUSIONS: Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data

Springer - Publisher Connector

PubMed Central

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

Author: Abraham Mark James
Hess Berk
Lindahl Erik
Murtola Teemu
Páll Szilárd
Schulz Roland
Smith Jeremy C.
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 01/01/2015
Field of study

AbstractGROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported

Publikationer från KTH

Elsevier - Publisher Connector

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Global Pattern Search at Scale

Author: Carrington Robert
Crouser R. Jordan
Edwards Lauren
Ferme Elizabeth
Hook Daniel
Kelley Stephen
Michel Elizabeth
Miller Benjamin
Milosavljevic Maja
Reuther Albert I
Schmidt Matthew C.
Publication venue: Smith ScholarWorks
Publication date: 14/04/2015
Field of study

In recent years, data collection has far outpaced the tools for data analysis in the area of non-traditional GEOINT analysis. Traditional tools are designed to analyze small-scale numerical data, but there are few good interactive tools for processing large amounts of unstructured data such as raw text. In addition to the complexities of data processing, presenting the data in a way that is meaningful to the end user poses another challenge. In our work, we focused on analyzing a corpus of 35,000 news articles and creating an interactive geovisualization tool to reveal patterns to human analysts. Our comprehensive tool, Global Pattern Search at Scale (GPSS), addresses three major problems in data analysis: free text analysis, high volumes of data, and interactive visualization. GPSS uses an Accumulo database for high-volume data storage, and a matrix of word counts and event detection algorithms to process the free text. For visualization, the tool displays an interactive web application to the user, featuring a map overlaid with document clusters and events, search and filtering options, a timeline, and a word cloud. In addition, the GPSS tool can be easily adapted to process and understand other large free-text datasets

Smith College: Smith ScholarWorks

Finite size scaling approach to dynamic storage allocation problem

Author: Denning
Fisher
Hamed Seyed-allaei
Nielsen
Robson
Publication venue: 'Elsevier BV'
Publication date: 02/07/2003
Field of study

It is demonstrated how dynamic storage allocation algorithms can be analyzed in terms of finite size scaling. The method is illustrated in the three simple cases of the it first-fit, next-fit and it best-fit algorithms, and the system works at full capacity. The analysis is done from two different points of view - running speed and employed memory. In both cases, and for all algorithms, it is shown that a simple scaling function exists and the relevant exponents are calculated. The method can be applied on similar problems as well.Comment: 9 pages, 4 figures, will apear in Physica

arXiv.org e-Print Archive

Crossref

Boosting Multi-Core Reachability Performance with Shared Hash Tables

Author: Laarman Alfons
van de Pol Jaco
Weber Michael
Publication venue
Publication date: 01/01/2010
Field of study

This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.Comment: preliminary repor

arXiv.org e-Print Archive

CiteSeerX

University of Twente Research Information

Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution

Author: McGregor Andrew
Seshadhri C.
Simpson Olivia
Publication venue
Publication date: 25/11/2015
Field of study

The degree distribution is one of the most fundamental graph properties of interest for real-world graphs. It has been widely observed in numerous domains that graphs typically have a tailed or scale-free degree distribution. While the average degree is usually quite small, the variance is quite high and there are vertices with degrees at all scales. We focus on the problem of approximating the degree distribution of a large streaming graph, with small storage. We design an algorithm headtail, whose main novelty is a new estimator of infrequent degrees using truncated geometric random variables. We give a mathematical analysis of headtail and show that it has excellent behavior in practice. We can process streams will millions of edges with storage less than 1% and get extremely accurate approximations for all scales in the degree distribution. We also introduce a new notion of Relative Hausdorff distance between tailed histograms. Existing notions of distances between distributions are not suitable, since they ignore infrequent degrees in the tail. The Relative Hausdorff distance measures deviations at all scales, and is a more suitable distance for comparing degree distributions. By tracking this new measure, we are able to give strong empirical evidence of the convergence of headtail

arXiv.org e-Print Archive

Crossref