16 research outputs found

    The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

    Get PDF
    We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’)

    Bioinformatics: new tools and applications in life science and personalized medicine

    Get PDF
    While we have a basic understanding of the functioning of the gene when coding sequences of specific proteins, we feel the lack of information on the role that DNA has on specific diseases or functions of thousands of proteins that are produced. Bioinformatics combines the methods used in the collection, storage, identification, analysis, and correlation of this huge and complex information. All this work produces an “ocean” of information that can only be “sailed” with the help of computerized methods. The goal is to provide scientists with the right means to explain normal biological processes, dysfunctions of these processes which give rise to disease, and approaches that allow the discovery of new medical cures. Recently, sequencing platforms, a large scale of genomes and transcriptomes, have created new challenges not only to the genomics but especially for bioinformatics. The intent of this article is to compile a list of tools and information resources used by scientists to treat information from the massive sequencing of recent platforms to new generations and the applications of this information in different areas of life sciences including medicine.The authors are grateful to the Foundation for Science and Technology (FCT, Portugal) and FEDER under Programme PT2020 for financial support to CIMO (UID/AGR/00690/2019).info:eu-repo/semantics/publishedVersio

    Bioinformatics process management: information flow via a computational journal

    Get PDF
    This paper presents the Bioinformatics Computational Journal (BCJ), a framework for conducting and managing computational experiments in bioinformatics and computational biology. These experiments often involve series of computations, data searches, filters, and annotations which can benefit from a structured environment. Systems to manage computational experiments exist, ranging from libraries with standard data models to elaborate schemes to chain together input and output between applications. Yet, although such frameworks are available, their use is not widespread–ad hoc scripts are often required to bind applications together. The BCJ explores another solution to this problem through a computer based environment suitable for on-site use, which builds on the traditional laboratory notebook paradigm. It provides an intuitive, extensible paradigm designed for expressive composition of applications. Extensive features facilitate sharing data, computational methods, and entire experiments. By focusing on the bioinformatics and computational biology domain, the scope of the computational framework was narrowed, permitting us to implement a capable set of features for this domain. This report discusses the features determined critical by our system and other projects, along with design issues. We illustrate the use of our implementation of the BCJ on two domain-specific examples

    FAST: FAST Analysis of Sequences Toolbox.

    Get PDF
    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought

    Genome information management and integrated data analysis with HaloLex

    Get PDF
    HaloLex is a software system for the central management, integration, curation, and web-based visualization of genomic and other -omics data for any given microorganism. The system has been employed for the manual curation of three haloarchaeal genomes, namely Halobacterium salinarum (strain R1), Natronomonas pharaonis, and Haloquadratum walsbyi. HaloLex, in particular, enables the integrated analysis of genome-wide proteomic results with the underlying genomic data. This has proven indispensable to generate reliable gene predictions for GC-rich genomes, which, due to their characteristically low abundance of stop codons, are known to be hard targets for standard gene finders, especially concerning start codon assignment. The proteomic identification of more than 600 N-terminal peptides has greatly increased the reliability of the start codon assignment for Halobacterium salinarum. Application of homology-based methods to the published genome of Haloarcula marismortui allowed to detect 47 previously unidentified genes (a problem that is particularly serious for short protein sequences) and to correct more than 300 start codon misassignments

    The Personal Sequence Database: a suite of tools to create and maintain web-accessible sequence databases

    Get PDF
    Background: Large molecular sequence databases are fundamental resources for modern\ud bioscientists. Whether for project-specific purposes or sharing data with colleagues, it is often\ud advantageous to maintain smaller sequence databases. However, this is usually not an easy task for\ud the average bench scientist.\ud \ud Results: We present the Personal Sequence Database (PSD), a suite of tools to create and\ud maintain small- to medium-sized web-accessible sequence databases. All interactions with PSD\ud tools occur via the internet with a web browser. Users may define sequence groups within their\ud database that can be maintained privately or published to the web for public use. A sequence group\ud can be downloaded, browsed, searched by keyword or searched for sequence similarities using\ud BLAST. Publishing a sequence group extends these capabilities to colleagues and collaborators. In\ud addition to being able to manage their own sequence databases, users can enroll sequences in\ud BLASTAgent, a BLAST hit tracking system, to monitor NCBI databases for new entries displaying\ud a specified level of nucleotide or amino acid similarity.\ud \ud Conclusion: The PSD offers a valuable set of resources unavailable elsewhere. In addition to\ud managing sequence data and BLAST search results, it facilitates data sharing with colleagues,\ud collaborators and public users. The PSD is hosted by the authors and is available at http://\ud bioinfo.cgrb.oregonstate.edu/psd/

    Transcriptomic changes arising during light-induced sporulation in Physarum polycephalum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Physarum polycephalum </it>is a free-living amoebozoan protist displaying a complex life cycle, including alternation between single- and multinucleate stages through sporulation, a simple form of cell differentiation. Sporulation in <it>Physarum </it>can be experimentally induced by several external factors, and <it>Physarum </it>displays many biochemical features typical for metazoan cells, including metazoan-type signaling pathways, which makes this organism a model to study cell cycle, cell differentiation and cellular reprogramming.</p> <p>Results</p> <p>In order to identify the genes associated to the light-induced sporulation in <it>Physarum</it>, especially those related to signal transduction, we isolated RNA before and after photoinduction from sporulation- competent cells, and used these RNAs to synthesize cDNAs, which were then analyzed using the 454 sequencing technology. We obtained 16,669 cDNAs that were annotated at every computational level. 13,169 transcripts included hit count data, from which 2,772 displayed significant differential expression (upregulated: 1,623; downregulated: 1,149). Transcripts with valid annotations and significant differential expression were later integrated into putative networks using interaction information from orthologs.</p> <p>Conclusions</p> <p>Gene ontology analysis suggested that most significantly downregulated genes are linked to DNA repair, cell division, inhibition of cell migration, and calcium release, while highly upregulated genes were involved in cell death, cell polarization, maintenance of integrity, and differentiation. In addition, cell death- associated transcripts were overrepresented between the upregulated transcripts. These changes are associated to a network of actin-binding proteins encoded by genes that are differentially regulated before and after light induction.</p
    corecore