1,450 research outputs found

    Encoding, Storing and Searching of Analytical Properties and Assigned Metabolite Structures

    Get PDF
    Informationen über Metabolite und andere kleine organische Moleküle sind von entscheidender Bedeutung in vielen verschiedenen Bereichen der Naturwissenschaften. Sie spielen z.B. eine entscheidende Rolle in metabolischen Netzwerken und das Wissen über ihre Eigenschaften, hilft komplexe biologische Prozesse und komplette biologische Systeme zu verstehen. Da in biologischen und chemischen Laboren täglich Daten anfallen, welche diese Moleküle beschreiben, existiert eine umfassende Datengrundlage, die sich kontinuierlich erweitert. Um Wissenschaftlern die Verarbeitung, den Austausch, die Archivierung und die Suche innerhalb dieser Informationen unter Erhaltung der semantischen Zusammenhänge zu ermöglichen, sind komplexe Softwaresysteme und Datenformate nötig. Das Ziel dieses Projektes bestand darin, Anwendungen und Algorithmen zu entwickeln, welche für die effiziente Kodierung, Sammlung, Normalisierung und Analyse molekularer Daten genutzt werden können. Diese sollen Wissenschaftler bei der Strukturaufklärung, der Dereplikation, der Analyse von molekularen Wechselwirkungen und bei der Veröffentlichung des so gewonnenen Wissens unterstützen. Da die direkte Beschreibung der Struktur und der Funktionsweise einer unbekannten Verbindung sehr schwierig und aufwändig ist, wird dies hauptsächlich indirekt, mit Hilfe beschreibender Eigenschaften erreicht. Diese werden dann zur Vorhersage struktureller und funktioneller Charakteristika genutzt. In diesem Zusammenhang wurden Programmmodule entwickelt, welche sowohl die Visualisierung von Struktur- und Spektroskopiedaten, die gegliederte Darstellung und Veränderung von Metadaten und Eigenschaften, als auch den Import und Export von verschiedenen Datenformaten erlauben. Diese wurden durch Methoden erweitert, welche es ermöglichen, die gewonnenen Informationen weitergehend zu analysieren und Struktur- und Spektroskopiedaten einander zuzuweisen. Außerdem wurde ein System zur strukturierten Archivierung und Verwaltung großer Mengen molekularer Daten und spektroskopischer Informationen, unter Beibehaltung der semantischen Zusammenhänge, sowohl im Dateisystem, als auch in Datenbanken, entwickelt. Um die verlustfreie Speicherung zu gewährleisten, wurde ein offenes und standardisiertes Datenformat definiert (CMLSpect). Dieses erweitert das existierende CML (Chemical Markup Language) Vokabular und erlaubt damit die einfache Handhabung von verknüpften Struktur- und Spektroskopiedaten. Die entwickelten Anwendungen wurden in das Bioclipse System für Bio- und Chemoinformatik eingebunden und bieten dem Nutzer damit eine hochqualitative Benutzeroberfläche und dem Entwickler eine leicht zu erweiternde modulare Programmarchitektur

    KinImmerse: Macromolecular VR for NMR ensembles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In molecular applications, virtual reality (VR) and immersive virtual environments have generally been used and valued for the visual and interactive experience – to enhance intuition and communicate excitement – rather than as part of the actual research process. In contrast, this work develops a software infrastructure for research use and illustrates such use on a specific case.</p> <p>Methods</p> <p>The Syzygy open-source toolkit for VR software was used to write the KinImmerse program, which translates the molecular capabilities of the kinemage graphics format into software for display and manipulation in the DiVE (Duke immersive Virtual Environment) or other VR system. KinImmerse is supported by the flexible display construction and editing features in the KiNG kinemage viewer and it implements new forms of user interaction in the DiVE.</p> <p>Results</p> <p>In addition to molecular visualizations and navigation, KinImmerse provides a set of research tools for manipulation, identification, co-centering of multiple models, free-form 3D annotation, and output of results. The molecular research test case analyzes the local neighborhood around an individual atom within an ensemble of nuclear magnetic resonance (NMR) models, enabling immersive visual comparison of the local conformation with the local NMR experimental data, including target curves for residual dipolar couplings (RDCs).</p> <p>Conclusion</p> <p>The promise of KinImmerse for production-level molecular research in the DiVE is shown by the locally co-centered RDC visualization developed there, which gave new insights now being pursued in wider data analysis.</p

    Geometric modeling, simulation, and visualization methods for plasmid DNA molecules

    Get PDF
    Plasmid DNA molecules are a special type of DNA molecules that are used, among other applications, in DNA vaccination and gene therapy. These molecules are characterized by, when in their natural state, presenting a closed-circular conformation and by being supercoiled. The production of plasmid DNA using bacteria as hosts implies a purification step where the plasmid DNA molecules are separated from the DNA of the host and other contaminants. This purification process, and all the physical and chemical variations involved, such as temperature changes, may affect the plasmid DNA molecules conformation by uncoiling or even by open them, which makes them useless for therapeutic applications. Because of that, researchers are always searching for new purification techniques that maximize the amount of supercoiled plasmid DNA that is produced. Computer simulations and 3D visualization of plasmid DNA can bring many advantages because they allow researchers to actually see what can happen to the molecules under certain conditions. In this sense, it was necessary to develop reliable and accurate geometric models specific for plasmid DNA simulations. This dissertation presents a new assembling algorithm for B-DNA specifically developed for plasmid DNA assembling. This new assembling algorithm is completely adaptive in the sense that it allows researchers to assemble any plasmid DNA base-pair sequence along any arbitrary conformation that fits the length of the plasmid DNA molecule. This is specially suitable for plasmid DNA simulations, where conformations are generated by simulation procedures and there is the need to assemble the given base-pair sequence over that conformation, what can not be done by conventional predictive DNA assembling methods. Unlike traditional molecular visualization methods that are based on the atomic structure, this new assembling algorithm uses color coded 3D molecular surfaces of the nucleotides as the building blocks for DNA assembling. This new approach, not only reduces the amount of graphical objects and, consequently, makes the rendering faster, but also makes it easier to visually identify the nucleotides in the DNA strands. The algorithm used to triangulate the molecular surfaces of the nucleotides building blocks is also a novelty presented as part of this dissertation. This new triangulation algorithm for Gaussian molecular surfaces introduces a new mechanism that divides the atomic structure of molecules into boxes and spheres. This new space division method is faster because it confines the local calculation of the molecular surface to a specific region of influence of the atomic structure, not taking into account atoms that do not influence the triangulation of the molecular surface in that region. This new method also guarantees the continuity of the molecular surface. Having in mind that the aim of this dissertation is to present a complete set of methods for plasmid DNA visualization and simulation, it is also proposed a new deformation algorithm to be used for plasmid DNA Monte Carlo simulations. This new deformation algorithm uses a 3D polyline to represent the plasmid DNA conformation and performs small deformations on that polyline, keeping the segments length and connectivity. Experiments have been performed in order to compare this new deformation method with deformation methods traditionally used by Monte Carlo plasmid DNA simulations These experiments shown that the new method is more efficient in the sense that its trial acceptance ratio is higher and it converges sooner and faster to the elastic energy equilibrium state of the plasmid DNA molecule. In sum, this dissertation successfully presents an end-to-end set of models and algorithms for plasmid DNA geometric modelling, visualization and simulation

    Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains

    Get PDF
    Problemas cada vez mais complexos estão a ser tratados na àrea das ciências da vida. A aquisição de todos os dados que possam estar relacionados com o problema em questão é primordial. Igualmente importante é saber como os dados estão relacionados uns com os outros e com o próprio problema. Por outro lado, existem grandes quantidades de dados e informações disponíveis na Web. Os investigadores já estão a utilizar Data Mining e Machine Learning como ferramentas valiosas nas suas investigações, embora o procedimento habitual seja procurar a informação baseada nos modelos indutivos. Até agora, apesar dos grandes sucessos já alcançados com a utilização de Data Mining e Machine Learning, não é fácil integrar esta vasta quantidade de informação disponível no processo indutivo, com algoritmos proposicionais. A nossa principal motivação é abordar o problema da integração de informação de domínio no processo indutivo de técnicas proposicionais de Data Mining e Machine Learning, enriquecendo os dados de treino a serem utilizados em sistemas de programação de lógica indutiva. Os algoritmos proposicionais de Machine Learning são muito dependentes dos atributos dos dados. Ainda é difícil identificar quais os atributos mais adequados para uma determinada tarefa na investigação. É também difícil extrair informação relevante da enorme quantidade de dados disponíveis. Vamos concentrar os dados disponíveis, derivar características que os algoritmos de ILP podem utilizar para induzir descrições, resolvendo os problemas. Estamos a criar uma plataforma web para obter informação relevante para problemas de Bioinformática (particularmente Genómica) e Quimioinformática. Esta vai buscar os dados a repositórios públicos de dados genómicos, proteicos e químicos. Após o enriquecimento dos dados, sistemas Prolog utilizam programação lógica indutiva para induzir regras e resolver casos específicos de Bioinformática e Cheminformática. Para avaliar o impacto do enriquecimento dos dados com ILP, comparamos com os resultados obtidos na resolução dos mesmos casos utilizando algoritmos proposicionais.Increasingly more complex problems are being addressed in life sciences. Acquiring all the data that may be related to the problem in question is paramount. Equally important is to know how the data is related to each other and to the problem itself. On the other hand, there are large amounts of data and information available on the Web. Researchers are already using Data Mining and Machine Learning as a valuable tool in their researches, albeit the usual procedure is to look for the information based on induction models. So far, despite the great successes already achieved using Data Mining and Machine Learning, it is not easy to integrate this vast amount of available information in the inductive process with propositional algorithms. Our main motivation is to address the problem of integrating domain information into the inductive process of propositional Data Mining and Machine Learning techniques by enriching the training data to be used in inductive logic programming systems. The algorithms of propositional machine learning are very dependent on data attributes. It still is hard to identify which attributes are more suitable for a particular task in the research. It is also hard to extract relevant information from the enormous quantity of data available. We will concentrate the available data, derive features that ILP algorithms can use to induce descriptions, solving the problems. We are creating a web platform to obtain relevant bioinformatics (particularly Genomics) and Cheminformatics problems. It fetches the data from public repositories with genomics, protein and chemical data. After the data enrichment, Prolog systems use inductive logic programming to induce rules and solve specific Bioinformatics and Cheminformatics case studies. To assess the impact of the data enrichment with ILP, we compare with the results obtained solving the same cases using propositional algorithms

    DCMS: A data analytics and management system for molecular simulation

    Get PDF
    Molecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression

    Visualization of Longitudinal Phenotypes in the Norwegian Mother and Child Cohort Study

    Get PDF
    The Norwegian Mother and Child Cohort Study (MoBa) is a pregnancy cohort study with over 100,000 children enrolled. Data was gathered through questionnaires mailed to the mothers, but also in the form of biological samples where more than 15,000 trios (mother, father, and child) have been genotyped so far. Data collected by MoBa is sensitive and its access is therefore restricted to protect the privacy of the study participants. This can make it difficult (or even impossible) to access the data, not only for parents and the general public, but also for scientists and medical professionals. To solve this issue, it is necessary to provide access to the data in a manner that is high-resolution without compromising participant privacy. The MoBa data is multidimensional and contains longitudinal information on several phenotypes (such as height and weight) for the children, as well as data on certain variables for the parents. Based on the recorded variables, the MoBa cohort can be divided into various subgroups that can be studied separately or compared with each other. Furthermore, the genotyping data can be viewed at different scales: (i) genetic variants can be considered individually, (ii) in the context of their genomic location, or (iii) the entire genome can be considered as a whole. Finally, a good presentation of the data has to account for and take advantage of the complexity of the MoBa data. Hundreds of gigabytes of summary statistics can be generated from the genotyping data from MoBa. Depending on the use case, only a small subset of this data is relevant to present to a user at a given time point. In order to present these subsets to the user quickly upon request, a bioinformatics system that can find and dispatch data in a short amount of time must be implemented. This thesis demonstrates how the issues related to large-scale sensitive data access and dissemination can be solved through a publicly available web application able to handle the associated data volumes efficiently.Masteroppgåve i informatikkINF39

    CyBy2 : a strongly typed, purely functional framework for chemical data management

    Get PDF
    We present the development of CyBy2, a versatile framework for chemical data management written in purely functional style in Scala, a modern multi-paradigm programming language. Together with the core libraries we provide a fully functional example implementation of a HTTP server together with a single page web client with powerful querying and visualization capabilities, providing essential functionality for people working in the field of organic and medicinal chemistry. The main focus of CyBy2 are the diverse needs of different research groups in the field and therefore the flexibility required from the underlying data model. Techniques for writing type level specifications giving strong guarantees about the correctness of the implementation are described, together with the resulting gain in confidence during refactoring. Finally we talk about the advantages of using a single code base from which the server, the client and the software's documentation pages are being generated. We conclude with a comparison with existing open source solutions. All code described in this article is published under version 3 of the GNU General Public License and available from GitHub including an example implementation of both backend and frontend together with documentation how to download and compile the software (available at https://github.com/stefan-hoeck/cyby2)

    Florida Technological University College of Natural Sciences research activities, July 1, 1977-June 30, 1978

    Get PDF
    Florida Technological University, College of Natural Sciences, research activities including funded research, non-sponsored research projects, presentations of professional papers, and publications for the year July 1, 1977-June 30, 1978

    A decade with vamdc: Results and ambitions

    Get PDF
    This paper presents an overview of the current status of the Virtual Atomic and Molecular Data Centre (VAMDC) e-infrastructure, including the current status of the VAMDC-connected (or to be connected) databases, updates on the latest technological development within the infrastructure and a presentation of some application tools that make use of the VAMDC e-infrastructure. We analyse the past 10 years of VAMDC development and operation, and assess their impact both on the field of atomic and molecular (A&amp;M) physics itself and on heterogeneous data management in international cooperation. The highly sophisticated VAMDC infrastructure and the related databases developed over this long term make them a perfect resource of sustainable data for future applications in many fields of research. However, we also discuss the current limitations that prevent VAMDC from becoming the main publishing platform and the main source of A&amp;M data for user communities, and present possible solutions under investigation by the consortium. Several user application examples are presented, illustrating the benefits of VAMDC in current research applications, which often need the A&amp;M data from more than one database. Finally, we present our vision for the future of VAMDC.</jats:p

    Wave Form, wave function.

    Get PDF
    The project Wave form, wave function is conceived as an examination of the relational dynamics of form and function in contemporary implementations of electronic media in the visual arts. Creative work comprising installation of digital and analogue media equipment, projection of live rendered and pre-programmed immersive computer graphics, high energy kinetic and video sculpture - in relational configurations, leads the research. The electronic media being intrinsically signals based, consideration is given to a broad definition of the signal encompassing electronic analogue waveforms, digital encodings, programmatic flow control structures and semiotic and language based signal exchange. The electronic media are considered as rhetorical devices that use an expanded language of visual and procedural rhetoric in their processes. The project is premised on a position that considers scientific realism to be a questionable basis for understanding. Quantum physics has demonstrated the entanglements of matter and energy, of object and observer, as relational and transmissible, somewhat magical processes. In this context aspects of form and function in the produced artwork are discussed as poietic work, the process of engaging in ongoing cultural discourse that is world building. A poetic license is allowed in translating between the literal and literary as Scientific Realist and socially constructed models of reality are compared. Noesis, knowing and being in the world, is examined for how contemporary artists employ technoesis, that is cultural production through technological media. Such work is considered as sympoietic, evoking symbiotic, hybrid modes of poiesis. Working with contemporary electronic media in the visual arts entails a grasp of the nature of the medium that extends to the metaphysical
    corecore