1,970 research outputs found

    Development of Integrative Bioinformatics Applications using Cloud Computing resources and Knowledge Organization Systems (KOS).

    Get PDF
    Use of semantic web abstractions, in particular of domain neural Knowledge Organization Systems (KOS), to manage distributed, cloud based, integrative bioinformatics infrastructure. This presentation derives from recent publication:

Almeida JS, Deus HF, Maass W. (2010) S3DB core: a framework for RDF generation and management in bioinformatics infrastructures. BMC Bioinformatics. 2010 Jul 20;11(1):387. [PMID 20646315].

These PowerPoint slides were presented at Semantic Web Applications and Tools for Life Sciences December 10th, 2010, Berlin, Germany (http://www.swat4ls.org/2010/progr.php), keynote 9-10 am

    Development of Integrative Bioinformatics Applications using Cloud Computing resources and Knowledge Organization Systems (KOS).

    Get PDF
    Use of semantic web abstractions, in particular of domain neural Knowledge Organization Systems (KOS), to manage distributed, cloud based, integrative bioinformatics infrastructure. This presentation derives from recent publication:

Almeida JS, Deus HF, Maass W. (2010) S3DB core: a framework for RDF generation and management in bioinformatics infrastructures. BMC Bioinformatics. 2010 Jul 20;11(1):387. [PMID 20646315].

These PowerPoint slides were presented at Semantic Web Applications and Tools for Life Sciences December 10th, 2010, Berlin, Germany (http://www.swat4ls.org/2010/progr.php), keynote 9-10 am

    Computational ecosystems for data-driven medical genomics

    Get PDF
    In the path towards personalized medicine, the integrative bioinformatics infrastructure is a critical enabling resource. Until large-scale reference data became available, the attributes of the computational infrastructure were postulated by many, but have mostly remained unverified. Now that large-scale initiatives such as The Cancer Genome Atlas (TCGA) are in full swing, the opportunity is at hand to find out what analytical approaches and computational architectures are really effective. A recent report did just that: first a software development environment was assembled as part of an informatics research program, and only then was the analysis of TCGA's glioblastoma multiforme multi-omic data pursued at the multi-omic scale. The results of this complex analysis are the focus of the report highlighted here. However, what is reported in the analysis is also the validating corollary for an infrastructure development effort guided by the iterative identification of sound design criteria for the architecture of the integrative computational infrastructure. The work is at least as valuable as the data analysis results themselves: computational ecosystems with their own high-level abstractions rather than rigid pipelines with prescriptive recipes appear to be the critical feature of an effective infrastructure. Only then can analytical workflows benefit from experimentation just like any other component of the biomedical research program

    Computing distribution of scale independent motifs in biological sequences

    Get PDF
    The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques

    Universal sequence map (USM) of arbitrary discrete sequences

    Get PDF
    BACKGROUND: For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. RESULTS: We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. CONCLUSIONS: USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules

    Efficient Boolean implementation of universal sequence maps (bUSM)

    Get PDF
    BACKGROUND: Recently, Almeida and Vinga offered a new approach for the representation of arbitrary discrete sequences, referred to as Universal Sequence Maps (USM), and discussed its applicability to genomic sequence analysis. Their work generalizes and extends Chaos Game Representation (CGR) of DNA for arbitrary discrete sequences. RESULTS: We have considered issues associated with the practical implementation of USMs and offer a variation on the algorithm that: 1) eliminates the overestimation of similar segment lengths, 2) permits the identification of arbitrarily long similar segments in the context of finite word length coordinate representations, 3) uses more computationally efficient operations, and 4) provides a simple conversion for recovering the USM coordinates. Computational performance comparisons and examples are provided. CONCLUSIONS: We have shown that the desirable properties of the USM encoding of nucleotide sequences can be retained in a practical implementation of the algorithm. In addition, the proposed implementation enables determination of local sequence identity at increased speed

    mGrid: A load-balanced distributed computing environment for the remote execution of the user-defined Matlab code

    Get PDF
    BACKGROUND: Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. RESULTS: mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. CONCLUSION: Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet

    Nearest neighbor embedding with different time delays

    Full text link
    A nearest neighbor based selection of time delays for phase space reconstruction is proposed and compared to the standard use of time delayed mutual information. The possibility of using different time delays for consecutive dimensions is considered. A case study of numerically generated solutions of the Lorenz system is used for illustration. The effect of contamination with various levels of additive Gaussian white noise is discussed.Comment: 4 pages, 5 figures, updated to final versio

    Multivariate phase space reconstruction by nearest neighbor embedding with different time delays

    Full text link
    A recently proposed nearest neighbor based selection of time delays for phase space reconstruction is extended to multivariate time series, with an iterative selection of variables and time delays. A case study of numerically generated solutions of the x- and z coordinates of the Lorenz system, and an application to heart rate and respiration data, are used for illustration.Comment: 4 pages, 3 figure
    • …
    corecore