Search CORE

73,592 research outputs found

SigWinR; the SigWin-detector updated and ported to R

Author: Breit Timo M
Bruning Oskar
de Leeuw Wim C
Inda Márcia A
Rauwerda Han
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Our SigWin-detector discovers significantly enriched windows of (genomic) elements in any sequence of values (genes or other genomic elements in a DNA sequence) in a fast and reproducible way. However, since it is grid based, only (life) scientists with access to the grid can use this tool. Therefore and on request, we have developed the SigWinR package which makes the SigWin-detector available to a much wider audience. At the same time, we have introduced several improvements to its algorithm as well as its functionality, based on the feedback of SigWin-detector end users. Findings To allow usage of the SigWin-detector on a desktop computer, we have rewritten it as a package for R: SigWinR. R is a free and widely used multi platform software environment for statistical computing and graphics. The package can be installed and used on all platforms for which R is available. The improvements involve: a visualization of the input-sequence values supporting the interpretation of Ridgeograms; a visualization allowing for an easy interpretation of enriched or depleted regions in the sequence using windows of pre-defined size; an option that allows the analysis of circular sequences, which results in rectangular Ridgeograms; an application to identify regions of co-altered gene expression (ROCAGEs) with a real-life biological use-case; adaptation of the algorithm to allow analysis of non-regularly sampled data using a constant window size in physical space without resampling the data. To achieve this, support for analysis of windows with an even number of elements was added. Conclusion By porting the SigWin-detector as an R package, SigWinR, improving its algorithm and functionality combined with adequate performance, we have made SigWin-detector more useful as well as more easily accessible to scientists without a grid infrastructure.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UvA-DARE

International Migration, Integration and Social Cohesion online publications

SEWAL: an open-source platform for next-generation sequence analysis and visualization

Author: A. R. Ferre-D'Amare
Buhler
Carothers
Domingo
Domingo
Eddy
I. Rajapakse
J. N. Pitt
Kurtovic
Pitt
Poelwijk
Quail
Rusk
Schneider
Smith
Publication venue: Oxford University Press
Publication date
Field of study

Next-generation DNA sequencing platforms provide exciting new possibilities for in vitro genetic analysis of functional nucleic acids. However, the size of the resulting data sets presents computational and analytical challenges. We present an open-source software package that employs a locality-sensitive hashing algorithm to enumerate all unique sequences in an entire Illumina sequencing run (∼108 sequences). The algorithm results in quasilinear time processing of entire Illumina lanes (∼107 sequences) on a desktop computer in minutes. To facilitate visual analysis of sequencing data, the software produces three-dimensional scatter plots similar in concept to Sewall Wright and John Maynard Smith’s adaptive or fitness landscape. The software also contains functions that are particularly useful for doped selections such as mutation frequency analysis, information content calculation, multivariate statistical functions (including principal component analysis), sequence distance metrics, sequence searches and sequence comparisons across multiple Illumina data sets. Source code, executable files and links to sample data sets are available at http://www.sourceforge.net/projects/sewal

Crossref

PubMed Central

BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

Author: Ari Eszter
Horváth Arnold
Ittzés Péter
Jakó Éena
Podani János
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

Crossref

Repository of the Academy's Library

The missing graphical user interface for genomics

Author: Schatz Michael C
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

The Galaxy package empowers regular users to perform rich DNA sequence analysis through a much-needed and user-friendly graphical web interface. See research article http://genomebiology.com/2010/11/8/R86 RESEARCH HIGHLIGHT: With the advent of affordable and high-throughput DNA sequencing, sequencing is becoming an essential component in nearly every genetics lab. These data are being generated to probe sequence variations, to understand transcribed, regulated or methylated DNA elements, and to explore a host of other biological features across the tree of life and across a range of environments and conditions. Given this deluge of data, novices and experts alike are facing the daunting challenge of trying to analyze the raw sequence data computationally. With so many tools available and so many assays to analyze, how can one be expected to stay current with the state of the art? How can one be expected to learn to use each tool and construct robust end-to-end analysis pipelines, all while ensuring that input formats, command-line options, sequence databases and program libraries are set correctly? Finally, once the analysis is complete, how does one ensure the results are reproducible and transparent for others to scrutinize and study?In an article published in Genome Biology, Jeremy Goecks, Anton Nekrutenko, James Taylor and the rest of the Galaxy Team (Goecks et al. 1) make a great advance towards resolving these critical questions with the latest update to their Galaxy Project. The ambitious goal of Galaxy is to empower regular users to carry out their own computational analysis without having to be an expert in computational biology or computer science. Galaxy adds a desperately needed graphical user interface to genomics research, making data analysis universally accessible in a web browser, and freeing users from the minutiae of archaic command-line parameters, data formats and scripting languages. Data inputs and computational steps are selected from dynamic graphical menus, and the results are displayed in intuitive plots and summaries that encourage interactive workflows and the exploration of hypotheses. The underlying data analysis tools can be almost any piece of software, written in any language, but all their complexity is neatly hidden inside of Galaxy, allowing users to focus on scientific rather than technical questions

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

Structure, stability and elasticity of DNA nanotube

Author: Dwaraknath Anjan
Joshi Himanshu
Maiti Prabal K.
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 28/11/2014
Field of study

DNA nanotubes are tubular structures composed of DNA crossover molecules. We present a bottom up approach for construction and characterization of these structures. Various possible topologies of nanotubes are constructed such as 6-helix, 8-helix and tri-tubes with different sequences and lengths. We have used fully atomistic molecular dynamics simulations to study the structure, stability and elasticity of these structures. Several nanosecond long MD simulations give the microscopic details about DNA nanotubes. Based on the structural analysis of simulation data, we show that 6-helix nanotubes are stable and maintain their tubular structure; while 8-helix nanotubes are flattened to stabilize themselves. We also comment on the sequence dependence and effect of overhangs. These structures are approximately four times more rigid having stretch modulus of ~4000 pN compared to the stretch modulus of 1000 pN of DNA double helix molecule of same length and sequence. The stretch moduli of these nanotubes are also three times larger than those of PX/JX crossover DNA molecules which have stretch modulus in the range of 1500-2000 pN. The calculated persistence length is in the range of few microns which is close to the reported experimental results on certain class of the DNA nanotubes.Comment: Published in Physical Chemistry Chemical Physic

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

An Introduction to Programming for Bioscientists: A Python-based Primer

Author: Ekmekci Berk
McAnany Charles E.
Mura Cameron
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/05/2016
Field of study

Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Identification of a non-mammalian leptin-like gene:characterization and expression in the tiger salamander (Ambystoma tigrinum)

Author: Boswell Timothy
Burt David W
Dunn Ian C
Joseph Nerine
Sharp Peter J
Wilson Peter W
Publication venue: 'Elsevier BV'
Publication date: 01/04/2006
Field of study

Leptin is well established as a multifunctional cytokine in mammals. However, little is known about the evolution of the leptin gene in other vertebrates. A recently published set of ESTs from the tiger salamander (Ambystoma tigrinum) contains a sequence sharing 56% nucleotide sequence identity with the human leptin cDNA. To confirm that the EST is naturally expressed in the salamander, a 409 bp cDNA was amplified by RT-PCR of salamander testis and stomach mRNAs. The coding sequence of the cDNA is predicted to encode 169 amino acids, and the mature peptide to consist of 146 residues, as in mammals. Although the overall amino acid identity with mammalian leptins is only 29%, the salamander and mammalian peptides share common structural features. An intron was identified between coding exons providing evidence that the sequence is present in the salamander genome. Phylogenetic analysis showed a rate of molecular divergence consistent with the accepted view of vertebrate evolution. The pattern of tissue expression of the leptin-like cDNA differed between metamorphosed adult individuals of different sizes suggesting possible developmental regulation. Expression was most prominent in the skin and testis, but was also detected in tissues in which leptin mRNA is present in mammals, including the fat body, stomach, and muscle. The characterization of a salamander leptin-like gene provides a basis for understanding how the structure and functions of leptin have altered during the evolution of tetrapod vertebrates

Edinburgh Research Explorer

University of Queensland eSpace

BEAST: Bayesian evolutionary analysis by sampling trees

Author: Drummond Alexei J.
Rambaut Andrew
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Abstract Background The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented. Results BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at <url>http://beast-mcmc.googlecode.com/</url> under the GNU LGPL license. Conclusion BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer