14 research outputs found

    Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline.

    Get PDF
    Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges--management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu

    Efficient, Distributed and Interactive Neuroimaging Data Analysis Using the LONI Pipeline

    Get PDF
    The LONI Pipeline is a graphical environment for construction, validation and execution of advanced neuroimaging data analysis protocols (Rex et al., 2003). It enables automated data format conversion, allows Grid utilization, facilitates data provenance, and provides a significant library of computational tools. There are two main advantages of the LONI Pipeline over other graphical analysis workflow architectures. It is built as a distributed Grid computing environment and permits efficient tool integration, protocol validation and broad resource distribution. To integrate existing data and computational tools within the LONI Pipeline environment, no modification of the resources themselves is required. The LONI Pipeline provides several types of process submissions based on the underlying server hardware infrastructure. Only workflow instructions and references to data, executable scripts and binary instructions are stored within the LONI Pipeline environment. This makes it portable, computationally efficient, distributed and independent of the individual binary processes involved in pipeline data-analysis workflows. We have expanded the LONI Pipeline (V.4.2) to include server-to-server (peer-to-peer) communication and a 3-tier failover infrastructure (Grid hardware, Sun Grid Engine/Distributed Resource Management Application API middleware, and the Pipeline server). Additionally, the LONI Pipeline provides three layers of background-server executions for all users/sites/systems. These new LONI Pipeline features facilitate resource-interoperability, decentralized computing, construction and validation of efficient and robust neuroimaging data-analysis workflows. Using brain imaging data from the Alzheimer's Disease Neuroimaging Initiative (Mueller et al., 2005), we demonstrate integration of disparate resources, graphical construction of complex neuroimaging analysis protocols and distributed parallel computing. The LONI Pipeline, its features, specifications, documentation and usage are available online (http://Pipeline.loni.ucla.edu)

    Development of the RIOT Web Service and Information Technologies to enable mechanism reduction for HCCI simulations.

    Get PDF
    Abstract. New approaches are being explored to facilitate multidisciplinary collaborative research of Homogenous Charge Compression Ignition (HCCI) combustion processes. In this paper, collaborative sharing of the Range Identification and Optimization Toolkit (RIOT) and related data and models is discussed. RIOT is a developmental approach to reduce the computational of detailed chemical kinetic mechanisms, enabling their use in modeling kinetically controlled combustion applications such as HCCI. These approaches are being developed and piloted as a part of the Collaboratory for Multiscale Chemical Sciences (CMCS) project. The capabilities of the RIOT code are shared through a portlet in the CMCS portal that allows easy specification and processing of RIOT inputs, remote execution of RIOT, tracking of data pedigree, and translation of RIOT outputs to a table view and to a commonly-used mechanism format. Introduction The urgent need for high-efficiency, low-emission energy utilization technologies for transportation, power generation, and manufacturing processes presents difficult challenges to the combustion research community. The needed predictive understanding requires systematic knowledge across the full range of physical scales involved in combustion processes -from the properties and interactions of individual molecules to the dynamics and products of turbulent multi-phase reacting flows. Innovative experimental techniques and computational approaches are revolutionizing the rate at which chemical science research can produce the new information necessary to advance our combustion knowledge. But the increased volume and complexity of this information often makes it even more difficult to derive the systems-level knowledge we need. Combustion researchers have responded by forming interdisciplinary communities intent on sharing information and coordinating research priorities. Such efforts face many barriers, however, including lack of data accessibility and interoperability, missing metadata and pedigree information, efficient approaches for sharing data and analysis tools, and the challenges of working together across geography, disciplines, and a very diverse spectrum of applications and funding. This challenge is especially difficult for those developing, sharing and/or using detailed chemical models of combustion to treat the oxidation of practical fuels. This is a very complex problem, and the development of new chemistry models requires a series of steps that involve acquiring and keeping track of a large amount of data and its pedigree. Also, this data is developed using a diverse range of codes and experiments spanning ab initio chemistry codes, laboratory kinetics and flame experiments, all the way to reacting flow simulations on massively parallel computers. Each of these processes typically requires different data formats, and often the data and/or analysis codes are only accessible by personally contacting the creator. Chemical models are usually shared in a legacy file format, such as Chemki

    Collaboratory for Multi-scale Chemical Science DOE grant FG02-01ER25444

    Get PDF
    Motivation for the Project Progress on the many multi-scale problems in the chemical sciences is significantly hindered by the difficulties researchers working at each scale have in accessing and translating the best available information and methods from the other scales. Very often there are "gaps" between scales which cannot be bridged at present, often because there is an unresolved technical or mathematical issue in addition to the pervasive lack of translation software and problems with connecting the mismatched data models used at each scale. Problems are particularly severe for complex systems involving combustion and pyrolysis chemistry. For example, simulations used to design high-efficiency, low-emission homogeneous-charge compression-ignition (HCCI) engines typically contain thousands of different chemical species and reactions. The engine designer running the macroscopic simulation is typically not an expert in chemistry -the macroscopic engine scale is quite complicated enough -so he or she needs all the important microscopic chemical details to be handled more or less automatically by software, and in a way that the chemistry models can be easily updated as additional information becomes available. All these microscopic chemistry details must be documented electronically in a way that is easy visible to the chemistry community, and these chemistry databases must be extensible, to make it practical to capture the benefits of the very large, but also very thinly spread (i.e. each chemist is expert in only a few types of molecules and reactions, under a limited range of conditions), expertise in the chemistry community. The numerical methods used by the engine designer were not designed to handle all this chemical detail, so intermediate preprocessing model-reduction software is needed to reduce the size of the chemical model. It is crucial that the approximation errors introduced in this step be properly controlled, so we do not lose significant accuracy in the final simulation results. Again, all the assumptions and calculations involved in this model-reduction process need to be documented, to facilitate future progress and to allow the engine model to be updated as more information on the combustion chemistry becomes available

    Open Babel: An open chemical toolbox

    Get PDF
    Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendorneutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license fro

    Reporting serendipity in biomedical research literature : a mixed-methods analysis

    Get PDF
    As serendipity is an unexpected, anomalous, or inconsistent observation that culminates in a valuable, positive outcome (McCay-Peet & Toms, 2018, pp. 4–6), it can be inferred that effectively supporting serendipity will result in a greater incidence of the desired positive outcomes (McCay-Peet & Toms, 2018, p. 22). In order to effectively support serendipity, however, we must first understand the overall process or experience of serendipity and the factors influencing its attainment. Currently, our understanding and models of the serendipitous experience are based almost exclusively on example collections, compilations of examples of serendipity that authors and researchers have collected as they encounter them (Gries, 2009, p. 9). Unfortunately, reliance on such collections can lead to an over-representation of more vivid and dramatic examples and possible underrepresentation of more common, but less noticeable, exemplars. By applying the principles of corpus research, which involves electronic compilation of examples in existing documents, we can alleviate this problem and obtain a more balanced and representative understanding of serendipitous experiences (Gries, 2009). This three-article dissertation describes the phenomenon of serendipity, as it is recorded in biomedical research articles indexed in the PubMed Central database, in a way that might inform the development of machine compilation systems for the support of serendipity. Within this study, serendipity is generally defined as a process or experience that begins with encountering some type of information. That information is subsequently analyzed and further pursued by an individual with related knowledge, skills, and understanding, and, finally, allows them to realize a valuable outcome. The information encounter that initiates the serendipity experience exhibits qualities of unexpectedness as well as value for the user. In this mixed method study, qualitative content analysis, supported by natural language processing, and concurrent with statistical analysis, is applied to gain a robust understanding of the phenomenon of serendipity that may reveal features of serendipitous experience useful to the development of recommender system algorithms.Includes bibliographical reference

    Curation of Laboratory Experimental Data as Part of the Overall Data Lifecycle

    Full text link
    corecore