Search CORE

534,053 research outputs found

Rust-Bio - a fast and safe bioinformatics library

Author: Köster Johannes
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/09/2015
Field of study

We present Rust-Bio, the first general purpose bioinformatics library for the innovative Rust programming language. Rust-Bio leverages the unique combination of speed, memory safety and high-level syntax offered by Rust to provide a fast and safe set of bioinformatics algorithms and data structures with a focus on sequence analysis

arXiv.org e-Print Archive

Crossref

CWI's Institutional Repository

Simulated single molecule microscopy with SMeagol

Author: Boucharin Alexis
Elf Johan
Fange David
Lindén Martin
Ćurić Vladimir
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

SMeagol is a software tool to simulate highly realistic microscopy data based on spatial systems biology models, in order to facilitate development, validation, and optimization of advanced analysis methods for live cell single molecule microscopy data. Availability and Implementation: SMeagol runs on Matlab R2014 and later, and uses compiled binaries in C for reaction-diffusion simulations. Documentation, source code, and binaries for recent versions of Mac OS, Windows, and Ubuntu Linux can be downloaded from http://smeagol.sourceforge.net.Comment: v2: 14 pages including supplementary text. Pre-copyedited, author-produced version of an application note published in Bioinformatics following peer review. The version of record, and additional supplementary material is available online at: https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btw10

arXiv.org e-Print Archive

Crossref

Publikationer från Uppsala Universitet

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Bioinformatics Databases: State of the Art and Research Perspectives

Author: Bry François
Kröger Peer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Bioinformatics or computational biology, i.e. the application of mathematical and computer science methods to solving problems in molecular biology that require large scale data, computation, and analysis, is a research area currently receiving a considerable attention. Databases play an essential role in molecular biology and consequently in bioinformatics. molecular biology data are often relatively cheap to produce, leading to a proliferation of databases: the number of bioinformatics databases accessible worldwide probably lies between 500 and 1.000. Not only molecular biology data, but also molecular biology literature and literature references are stored in databases. Bioinformatics databases are often very large (e.g. the sequence database GenBank contains more than 4 × 10 6 nucleotide sequences) and in general grows rapidly (e.g. about 8000 abstracts are added every month to the literature database PubMed). Bioinformatics databases are heterogeneous in their data, in their data modeling paradigms, in their management systems, and in the data analysis tools they supports. Furthermore, bioinformatics databases are often implemented, queried, updated, and managed using methods rarely applied for other databases. This presentation aims at introducing in current bioinformatics databases, stressing their aspects departing from conventional databases. A more detailed survey can be found in [1] upon which thi

CiteSeerX

Crossref

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

Author: Boulesteix Anne-Laure
Janitza Silke
Kruppa Jochen
König Inke R.
Publication venue
Publication date: 25/07/2012
Field of study

The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research

Crossref

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Computational Strategies for Scalable Genomics Analysis.

Author: Shi Lizhen
Wang Zhong
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for genomics analysis. Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data. In this review, we survey some of these exciting developments in the applications of parallel distributed computing and special hardware to genomics. We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency. Although this review is written for an audience from the genomics and bioinformatics fields, it may also be informative for the audience of computer science with interests in genomics applications

Multidisciplinary Digital Publishing Institute

eScholarship - University of California

PCA and K-Means decipher genome

Author: A Zinovyev
AN Gorban
AN Gorban
AN Gorban
AY Zinovyev
FHC Crick
HY Ou
J Jackson
R Staden
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

In this paper, we aim to give a tutorial for undergraduate students studying statistical methods and/or bioinformatics. The students will learn how data visualization can help in genomic sequence analysis. Students start with a fragment of genetic text of a bacterial genome and analyze its structure. By means of principal component analysis they ``discover'' that the information in the genome is encoded by non-overlapping triplets. Next, they learn how to find gene positions. This exercise on PCA and K-Means clustering enables active study of the basic bioinformatics notions. Appendix 1 contains program listings that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich ones. Animated 3D PCA plots are attached as separate gif files. Topology (cluster structure) and geometry (mutual positions of clusters) of these plots depends clearly on GC-content.Comment: 18 pages, with program listings for MatLab, PCA analysis of genomes and additional animated 3D PCA plot

arXiv.org e-Print Archive

CiteSeerX

Crossref