1,386 research outputs found

    XML schemas for common bioinformatic data types and their application in workflow systems

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats.</p> <p>Results</p> <p>Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at <url>http://bioschemas.sourceforge.net</url>, the BioDOM library can be obtained at <url>http://biodom.sourceforge.net</url>.</p> <p>Conclusion</p> <p>The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.</p

    PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools.</p> <p>Description</p> <p>We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified.</p> <p>Conclusion</p> <p>Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5.</p

    ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files

    Get PDF
    Additional file 2. Recreated 3D geometry optimized structures of 29 molecules as visualized in the original program (Gauss View)

    Bioinformatics tools and database resources for systems genetics analysis in mice—a short review and an evaluation of future needs

    Get PDF
    During a meeting of the SYSGENET working group ‘Bioinformatics’, currently available software tools and databases for systems genetics in mice were reviewed and the needs for future developments discussed. The group evaluated interoperability and performed initial feasibility studies. To aid future compatibility of software and exchange of already developed software modules, a strong recommendation was made by the group to integrate HAPPY and R/qtl analysis toolboxes, GeneNetwork and XGAP database platforms, and TIQS and xQTL processing platforms. R should be used as the principal computer language for QTL data analysis in all platforms and a ‘cloud’ should be used for software dissemination to the community. Furthermore, the working group recommended that all data models and software source code should be made visible in public repositories to allow a coordinated effort on the use of common data structures and file formats

    Flow: Statistics, visualization and informatics for flow cytometry

    Get PDF
    Flow is an open source software application for clinical and experimental researchers to perform exploratory data analysis, clustering and annotation of flow cytometric data. Flow is an extensible system that offers the ease of use commonly found in commercial flow cytometry software packages and the statistical power of academic packages like the R BioConductor project

    NetPanorama: A Declarative Grammar for Network Construction, Transformation, and Visualization

    Full text link
    This paper introduces NetPanorama, a domain-specific language and declarative grammar for interactive network visualizations. Exploring complex networks with multivariate, geographical, or temporal information often require bespoke visualization designs, such as adjacency matrices, arc-diagrams, small multiples, timelines, or geographic map visualizations. However, creating these requires implementing data loading, data transformations, visualization, and interactivity, which is time-consuming and slows down the iterative exploration of this huge design space. With NetPanorama, a developer specifies a network visualization design as a pipeline of parameterizable steps. Our specification and reference implementation aims to facilitate visualization development and reuse; allow for easy design exploration and iteration; and make data transformation and visual mapping decisions transparent. Documentation, source code, examples, and an interactive online editor can be found online: https://netpanorama.netlify.app

    Using neural networks based on epigenomic maps for predicting the transcriptional regulation measured by CRISPR/Cas9

    Full text link
    [EN] Because of the great impact that the genomic editing with CRISPR/CAS9 has had in the recent years, and the great advances that it brings to biotechnology a great need of information has arisen. However researches struggle to find a definate pattern with these experiments making a very long process of trial and error to find an optimal solution for a particular experiment. With this project we intend to optimize the genomic edition with the newest advance CRISPR/Cas9, to find the optimal insertion site we design a mathematical model based on neural networks. During this process we had to deal with huge amount of information from the genome so we had to develop a way to filter and handle it efficiently. For this project we are going to focus in Arabidopsis Thaliana which is a very common plant in genomic edition and has many resources available online.Barberá Mourelle, A. (2016). Using neural networks based on epigenomic maps for predicting the transcriptional regulation measured by CRISPR/Cas9. http://hdl.handle.net/10251/69318.TFG

    Метод описания топологической структуры вычислительных кластеров, основанный на операциях произведений подграфов

    Get PDF
    Topological structure of communication networks in supercomputers with grow in size and complexity of installation, respectively becomes more difficult. There are many methods to describe it, but such descriptions are cumbersome, which makes them difficult to manipulate. The article proposes an approach to describing the communication environment of a supercomputer, when the communication network is described as a constructor. The elements of the constructor are typical topological structures often found in various computing systems. For this purpose, a language for describing the topological structure has been developed. It based on the operation products of subgraphs. The language is ideologically similar in its principles to the NetML and OMNeT++ languages. Special attention is paid to exceptions in the regularity of networks of real supercomputers; in order to add the possibility of describing this fact, special constructions have been introduced into the language. A library has been developed in the C programming language with purpose to facilitate work with the language intoduced in this article. Also a special wrapper over C library has been written in Python3, which then can be used to visualize graphs described by the language. The expressive power of language has been demonstrated in the description computing clusters: Tianhe-2A, AI Bridging Cloud Infrastructure and Lomonosov-2. The method has been tested and compared with GraphViz DOT it is showed multiple reductions in the Record volume required to save topology for some of the major Top500 systems.Топологическая структура коммуникационных сетей суперкомпьютерных систем при увеличении размера и сложности суперкомпьютеров соответственно усложняется. Для ее описания существует множество методов, однако такие описания являются громоздкими, что усложняет манипулирование ими. В статье предложен подход к описанию коммуникационной среды суперкомпьютера, когда коммуникационная сеть описывается как конструктор, где элементами конструктора являются типовые топологические структуры, часто встречающиеся в различных вычислительных системах. С этой целью разработан язык описания топологической структуры, основанный на операции произведения подграфов. Язык идейно схож в своих принципах с языками NetML и OMNeT++. Отдельное внимание в работе уделяется исключениям в регулярности сетей реальных суперкомпьютеров; с целью добавления возможности описания данного факта в язык внесены специальные конструкции. Для поддержки работы с языком описания разработана библиотека на языке программирования Си и специальная оболочка над ней написанная на языке Python3, которая затем может использоваться для визуализации описываемых языком графов. Выразительная мощность языка была продемонстрирована на описании вычислительных кластеров: Tianhe-2A, AI Bridging Cloud Infrastructure и Ломоносов-2. Метод был проверен и сравнен с GraphViz DOT показано многократное сокращение необходимых объема записи для некоторых крупных систем из Top500

    A network simplification approach to ease topological studies about the food-web architecture

    Get PDF
    Food webs studies are intrinsically complex and time-consuming. Network data about trophic interaction across different large locations and ecosystems are scarce in comparison with general ecological data, especially if we consider terrestrial habitats. Here we present a complex network strategy to ease the gathering of the information by simplifying the collection of data with a taxonomic key. We test how well the topology of three different food webs retain their structure at the resolution of the nodes across distinct levels of simplification, and we estimate how community detection could be impacted by this strategy. The first level of simplification retains most of the general topological indices; betweenness and trophic levels seem to be consistent and robust even at the higher levels of simplification. This result suggests that generalisation and standardisation, as a good practice in food webs science, could benefit the community, both increasing the amount of open data available and the comparison among them, thus providing support especially for scientists that are new in this field and for exploratory analysis
    corecore